LineFlow: Simple NLP Dataset Handler for PyTorch or Any Framework

Smaller Code, Less Pain For an NLP task, you might need to tokenize text or build the vocabulary in the pre-processing. And you probably have experienced that the pre-processing code is as messy as your desk. Forgive me if your desk is clean 🙂 I have such experience too. That’s why I create LineFlow to … Read moreLineFlow: Simple NLP Dataset Handler for PyTorch or Any Framework

Sensing the Air Quality

A low-cost IoT Air Quality Monitor based on RaspberryPi 4 Santiago, Chile during a winter environmental emergency I have the privilege of living in one of the most beautiful countries in the world, but unfortunately, not “all are flowers”. Chile during winter season suffers a lot with air contamination, mainly due to particulate materials as … Read moreSensing the Air Quality

Helping a Reader with Python Web Scraping Refactored.

Bhargava Reddy Morampalli, a microbiologist from India, read my first post on web scraping from my old blog. If you didn’t get a chance to check out that post you can read it here. Python Web Scraping Refactored My first article on my old blog was on a web scraping example. Web scraping is one … Read moreHelping a Reader with Python Web Scraping Refactored.

Bayesian Basketball : were the Toronto Raptors really the best team during NBA 2019 season ?

Let’s go back in time and see if we can end up with a different winner for the NBA 2019 title. How ?By using Bayesian simulations. credit : NYTimes [This article was inspired by the work of Baio and Blangiardo (2010), Daniel Weitzenfeld’s great blog article, and Peadar Coyle’s tutorial on Hierarchical models.] Bayesian simulation … Read moreBayesian Basketball : were the Toronto Raptors really the best team during NBA 2019 season ?

One-tailed or two-tailed test, that is the question

Source: pixabay Learn the difference between two variants of statistical tests and how to implement them in Python In data science/econometrics we see statistical tests in many places: correlation analysis, ANOVA, A/B testing, linear regression results, etc. Therefore, for the practitioners, it is very important to thoroughly understand their meaning and know why a given … Read moreOne-tailed or two-tailed test, that is the question

Boston Job Market for Data Analysts and Scientists : August 2019 Update

Most Hiring Companies, Top Tools & Tech, and More Introduction This is an August 2019 update of my original project where I simply aim to explore the job market for data analysts and data scientists in the Greater Boston Area. These visuals were produced only from job listings posted on Indeed with the search term … Read moreBoston Job Market for Data Analysts and Scientists : August 2019 Update

Detecting and modeling outliers with PyOD

As the name suggests, outliers are datapoint which differs significantly from the rest of your observations. In other words, they are far away from the average path of your data. In statistics and Machine Learning, detecting outliers is a pivotal step, since they might affect the performance of your model. Namely, imagine you want to … Read moreDetecting and modeling outliers with PyOD

The Ultimate Guide to using the Python regex module

The first thing we need to learn while using regex is how to create patterns. I will go through some most commonly used patterns one by one. As you would think, the simplest pattern is a simple string. pattern = r’times’string = “It was the best of times, it was the worst of times.”print(len(re.findall(pattern,string))) But … Read moreThe Ultimate Guide to using the Python regex module

Anomalies in Global Suicide Data

Mental Health Search Interest on Google Trends Every Mental Health Awareness Day (October 10), there is a peak in search interest for “mental health” on Google Trends. However, this past October, there was the highest search interest ever seen. Mental health in the United States is growing as a part of the global conversation – … Read moreAnomalies in Global Suicide Data

Best Investment Portfolio Via Monte-Carlo Simulation In Python

There exists a risk-free rate which is the rate that an investor earns on his/her investment without taking any risk, such as in buying government treasury bills. There is a tradeoff between risk and return. If an investor is expecting to invest in a riskier investment option than the risk-free rate then he/she is expecting … Read moreBest Investment Portfolio Via Monte-Carlo Simulation In Python

Searching for Food Deserts in Los Angeles County

img source: robrogers.com For a recent data science project, I collaborated with several other Lambda School students to search for food deserts in L.A. County. A general definition for what qualifies as a food desert is an area that does not have access, within one mile, to a grocery store/market providing fresh, healthy food options, … Read moreSearching for Food Deserts in Los Angeles County

Data Science in Production

Source: https://pixabay.com/photos/factory-industry-sugar-3713310/ Building Scalable Model Pipelines with Python One of my biggest regrets as a data scientist is that I avoided learning Python for too long. I always figured that other languages provided parity in terms of accomplishing data science tasks, but now that I’ve made the leap to Python there is no looking back. … Read moreData Science in Production

5 Tips To Create A More Reliable Web Crawler

To Boost your web crawler’s efficiency! When I am crawling websites, web crawlers being blocked by websites could be described as the most annoying situation. To become really great in web crawling, you not only should be able to write the xpath or css selectors quickly but also how you design your crawlers matters a … Read more5 Tips To Create A More Reliable Web Crawler

Simple Linear Regression with Python

In my previous article, I talked about Simple Linear Regression as a statistical model to predict continuous target values. I also showed the optimization strategy the algorithm employs to compute the regression’s coefficients α and β. Here, I’m going to provide a practical explanation of what I’ve been talking about, and I’m going to do … Read moreSimple Linear Regression with Python

Recency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google…

The last time we analyzed our online shopper date set using the cohort analysis method. We discovered some interesting observations around our cohort data set. While cohort analysis provides us with customer behavior overtime and understand retention rates, we also want to be able to segment our data by their behavior as well. Today, we … Read moreRecency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google…

Visualizing NYC Bike Data on interactive and animated maps with Folium plugins

This plugin helps us animate a path on a map. In this case, we don’t have the exact path each trip follows, so we will create lines from origin to destination. Before starting to work with our data, let’s take a look at what settings this plugin needs. From the live demo, we can see … Read moreVisualizing NYC Bike Data on interactive and animated maps with Folium plugins

How To Learn Data Science – My path

The following apps are very helpful and I use it * Quora * Medium * Blind * Reddit * Linkedin * Udemy * Coursera * Youtube * Meetup * Datacamp 1. Reddit: I have subscribed to the following Reddit’s and it is very helpful Dataengineering Dataisbeautiful Datasets Learndatascience Learnprogramming Learnpython Machinelearning Learnmachinelearning Python Rstats Computervision … Read moreHow To Learn Data Science – My path

P-Value In Action: Is It Safe to Say That Parallax Correction Really Improve The Accuracy of…

source: https://planetary.s3.amazonaws.com/assets/images/spacecraft/2014/20140227_nasa_gpm.jpg What really is p-value? It really takes a long time for me to figure out the concept of this value. From my experience, I believe the best method to understand about p-value is through a real example. So that’s why in this post, I will explain about p-value using a real example that … Read moreP-Value In Action: Is It Safe to Say That Parallax Correction Really Improve The Accuracy of…

Machine learning on categorical variables

How to properly run and evaluate models Photo by v2osk on Unsplash At first blush, categorical variables aren’t that different from numerical ones. But once you start digging deeper and implement your machine learning (and preprocessing) ideas in code, you will stop every minute asking questions such as “Do I do feature engineering on both … Read moreMachine learning on categorical variables

AI Powered Search for Extra-terrestrial Intelligence — Analyzing Radio Telescopic Data

AI for Social Good Series — Part 2.1 Understanding Radio-Telescope Signal Data from SETI In this two-part series of articles, we will look at how Artificial intelligence (AI) coupled with the power of open-source tools and frameworks can be used to solve a very interesting problem in a non-conventional domain — the quest for finding … Read moreAI Powered Search for Extra-terrestrial Intelligence — Analyzing Radio Telescopic Data

Easy Steps To Plot Geographic Data on a Map — Python

Assume that you are working in a startup and you need to conduct spatial data analysis and prediction to users’ geographical data. Or your company runs a lot of delivery operations and your job again to analyze, visualize and maybe predict the drivers or users’ geographical data. So, visualizing your data (predicted ones maybe) on … Read moreEasy Steps To Plot Geographic Data on a Map — Python

Simulating stock prices in Python using Geometric Brownian Motion

1. What GBM does I use E.ON’s stock prices as an example throughout the article when explaining the related concepts. E.ON is an electric utility company based in Germany and it is one of the biggest in Europe. I retrieve its stock prices(in Euros) from Xetra Exchange through Python package of Quandl. Here is a … Read moreSimulating stock prices in Python using Geometric Brownian Motion

Using Standard Deviation in Python

The population mean and standard deviation of a dataset can be calculated using Numpy library in Python. The following code shows the work: import numpy as npdataset=[13, 22, 26, 38, 36, 42,49, 50, 77, 81, 98, 110]print(‘Mean:’, np.mean(dataset))print(‘Standard Deviation:’, np.std(dataset))Mean:53.5Standard Deviation: 29.694275542602483 Two datasets below show the high temperatures (in degrees Fahrenheit) for two cities … Read moreUsing Standard Deviation in Python

Feature Engineering and Algorithm Accuracy for the Titanic Dataset

One of the most popular dataset for Machine Learning correspond to the Titanic accident Here we are playing with features within this dataset, trying to discover the effect of the choice of differente features in the accuracy of some basic ML algorithms. These features correspond to the head of the data the head of the … Read moreFeature Engineering and Algorithm Accuracy for the Titanic Dataset

How safe are the streets of Santiago?

Let’s answer it with Python and GeoPandas! Costanera Center, Santiago / Benja Gremler Some time ago I wrote an article, explaining how to work with geographic maps in Python, using the “hard way” (mainly Shapely and Pandas): Mapping Geography Data in Python. Now it is time to do it again, but this time, explaining how … Read moreHow safe are the streets of Santiago?

Data Visualization For Everyone Pt 2

Part 2: Creating & Curating Visualization Part 1 My background is in the AEC ( Architecture, Engineering, & Construction) industry — one which has historically lagged behind most every other industry in adoption and application of new and transformative technologies. That challenge presents itself today still — with the modern shift towards data-driven corporate frameworks. … Read moreData Visualization For Everyone Pt 2

Python and R for Data Wrangling: Examples for Both, Including Speed-Up Considerations.

Skill-up by becoming a bilingual data scientist. Learn speed-up code tips. Write bilingual notebooks with interoperable Python and R cells. © Artur/AdobeStock A couple of years back, you would write your data analysis program, exclusively in one of these two languages: Python or R. Both languages offer great functionality from data exploration to modeling and … Read morePython and R for Data Wrangling: Examples for Both, Including Speed-Up Considerations.

Frawd detection using Benford’s Law (Python Code)

For this article i choose two particular datasets which came to publicity from recent elections. The first one is American Presidential Election 2016 and the second one came from Rusian Presidential Elections 2018. For my first project i get my data from Impractical Python Project. I took into consideration only the votes for Donald Trump … Read moreFrawd detection using Benford’s Law (Python Code)

Python Strings from scratch !!!

3. Slicing a string Slicing a string helps to get a set of characters from a string. This is really helpful when we want to access a particular set of characters in a string. Below are some slicing variants that are useful. string = “programming”string’programming’ Getting one character of the string print(string[0:1])p Getting the first … Read morePython Strings from scratch !!!

To dance or not to dance? — The Machine Learning approach.

I had a question: Can I predict whether or not I can dance to a song based on the song’s attributes? So, I set out to find some answers — but before I share my journey with you, let’s discuss some key concepts that will come out throughout the project. Danceability describes the degree to … Read moreTo dance or not to dance? — The Machine Learning approach.

Interactive Choropleth Maps With Plotly

You also need a mapbox account for this tutorial. Mapbox provides a flexible geodata API. Using the Mapbox API, we can map our individual data to a scalable world map. You can create an account at www.mapbox.com. You need a individual token to use the mapbox services which be be found under the account settings: … Read moreInteractive Choropleth Maps With Plotly

Getting Hands-On with Databases

Gaining Experience with One of the Most Sought After Skills for Data Scientists Data science is a field that has been experiencing explosive growth in the last few years and was deemed the “sexiest job of the 21st century” by Harvard Business Review in 2012. With high salaries and an ever-growing hype around AI, more … Read moreGetting Hands-On with Databases

Quantifying Political Momentum with Data

Now that I understand my data a little more it was time to do some heavy analysis. Two common forms of time series analysis are moving averages and percent change from the previous month. Moving averages are common in financial forecasting because it helps reduce volatility in the data by smoothing out the graph. While … Read moreQuantifying Political Momentum with Data

Analysis of an art survey using Pandas

Pandas is a Python open source library for data science that allows us to easily work with structured data, such as csv files, SQL tables, or Excel spreadsheets. In this article, we use Pandas to analyze the results of an art survey carried out by students of statistics at the Comenius university in Bratislava. Students … Read moreAnalysis of an art survey using Pandas

K-Means Clustering for Unsupervised Machine Learning

The Beginner’s Guide to Unsupervised Learning Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized every aspect of our life and disrupted how we do business, unlike any other technology in the the history of mankind. Such disruption brings many challenges for professionals and businesses. In this article, I will provide an introduction to one … Read moreK-Means Clustering for Unsupervised Machine Learning

Visualizing Different NFL Player Styles

CC by 2.0 Different players have different strengths and weaknesses — is there a way to visualize them? Back in the early 2000’s, the New York Giants had an exciting running back duo. Tiki Barber (“Lightning”) went for about 1000 yards rushing and 550 yards receiving a year. That’s impressive on its own, but even … Read moreVisualizing Different NFL Player Styles

Intro to Reading and Writing Spreadsheets with Python

First we are going to check if Python is installed and install another library that will help us deal with spreadsheets. A library is a collection of code that has implemented (usually) hard things to do in a simpler way. We need to first open up the Terminal which will let us interact with our … Read moreIntro to Reading and Writing Spreadsheets with Python

The Treasures of Python’s built in Libraries

Discover the treasures of Python! Discover the treasures of Python! Python is a beautiful language. Simple to use yet powerfully expressive. But are you using everything that it has to offer? Every well experienced developer knows that knowing the hidden treasures of their programming language of choice helps them get around many common bugs and … Read moreThe Treasures of Python’s built in Libraries

Introduction to Amazon Lambda, Layers and boto3 using Python3

A serverless approach for Data Scientists Photo by Daniel Eledut on Unsplash Amazon Lambda is probably the most famous serverless service available today offering low cost and practically no cloud infrastructure governance needed. It offers a relatively simple and straightforward platform for implementing functions on different languages like Python, Node.js, Java, C# and many more. … Read moreIntroduction to Amazon Lambda, Layers and boto3 using Python3

Experimentation in Data Science

When AB testing doesn’t cut it Today I am going to talk about experimentation in data science, why it is so important and some of the different techniques that we might consider using when AB testing is not appropriate. Experiments are designed to identify causal relationships between variables and this is a really important concept … Read moreExperimentation in Data Science

The Intuition Behind Facial Detection: The Viola-Jones Algorithm

Only recently have our smartphones been able to use a human face as a password to unlock the device. Just like fingerprints, faces are unique with millions of tiny features that differentiate one from the other. It may not always be obvious to us humans, but machines synthesize and evaluate every small piece of data, … Read moreThe Intuition Behind Facial Detection: The Viola-Jones Algorithm

Feature Engineer Optimization in HyperparameterHunter 3.0

Before we jump in with HyperparameterHunter, let’s take a quick look at our data: SKLearn’s Boston Housing Regression Dataset. We’ll be using the “DIS” column as the target, just like SKLearn’s target transformation example. This dataset has a manageable 506 samples, with 13 features excluding the target. 2.1. Baseline Because the goal of feature engineering … Read moreFeature Engineer Optimization in HyperparameterHunter 3.0

Reasons Why You Shouldn’t Consider Data Science as an option anymore. Wait, did I say shouldn’t!

“Information is the oil of the 21st century, and analytics is the combustion engine.” The power of big data and data science is radically changing the world. Since the time we have entered the era of big data, data science has become one of the fastest-growing, multi-million dollar companies. Nowadays everything has been inundated with … Read moreReasons Why You Shouldn’t Consider Data Science as an option anymore. Wait, did I say shouldn’t!