How to Build Your First Python Package

Make your awesome code conveniently available to the world, because you’re awesome too Original image by Vadim_P from Pixabay About two years ago I published my very first data-science related blogpost. It was about Categorical Correlations, and I honestly thought no-one will find it useful. It was just experimental, and for myself. 1.7K claps later, … Read more How to Build Your First Python Package

How To Track COVID-19 Cases in the United States (in Python)

Now that our patient data is scraped, cleaned, & available to the front-end, we can tie this whole thing together! So how will we show it to our users? Earlier we cleaned the data for our news articles, but let’s pivot and work on the visualizer for our map data. This way we can show … Read more How To Track COVID-19 Cases in the United States (in Python)

A Complete Beginners Guide to Matrix Multiplication for Data Science with Python Numpy

Now to the fun part. Multiply a 2D matrix by a 2D matrix. There are a few things to keep in mind. Order matters now. AB != BA Matrices can be multiplied if the number of columns in the 1st equals the number of rows in the 2nd Multiplication is the dot product of rows … Read more A Complete Beginners Guide to Matrix Multiplication for Data Science with Python Numpy

Feature selection? You are probably doing it wrong

which translates into the following Python code: Finally, we can generate X and y, and then split them in a training and a test set. Now that we have the data, it’s time to pick some predictive models and try them in “SelectFromModel” mode. Let’s take 8 models: There’s a good chance you’ve heard all … Read more Feature selection? You are probably doing it wrong

How Pandemics Impact Financial Markets

Something similar actually happened in the past too. We are talking about the Spanish flu which started during World War one and affected nearly 56% of the world population. The other two are the Asian flu and Hong Kong flu which impacted around half a million. For comparison, I have given the numbers for coronavirus. … Read more How Pandemics Impact Financial Markets

Top 5 Beautiful Soup Functions That Will Make Your Life Easier

The basic process goes something like this: Get the data and then process it any way you want. That is why today I want to show you some of the top functions that Beautiful Soup has to offer. If you are also interested in other libraries like Selenium, here are other examples you should look … Read more Top 5 Beautiful Soup Functions That Will Make Your Life Easier

Foundations for the Statistical Analysis of Climate Change — Probability Distributions

Let’s take a quick look again at our data. The values in the date column correspond with the year and the month, while the Monthly_Anom column values are the monthly temperature anomalies. Read more about why global temperatures are shown as anomalies here. Continuous Distributions Below is the distribution plot for our data. The anomalies … Read more Foundations for the Statistical Analysis of Climate Change — Probability Distributions

How and Why to use f strings in Python3?

Let me explain this with a simple example. Suppose you have some variables, and you want to print them within a statement. name = ‘Andy’age = 20print(?)—————————————————————-Output: I am Andy. I am 20 years old You can do this in various ways: a) Concatenate: A very naive way to do is to simply use + … Read more How and Why to use f strings in Python3?

Merging Dictionaries in Python 3.9

Python 3.9 introduces a new and clean(!) way to merge dictionaries using a union operator |. Pretty neat. dnew = d1 | d2# dnew == {‘name’: ‘Tom’, ‘age’: 20, ‘gpa’: 4.0, ‘is_single’: True} This union operator actually is not new in Python. It can be used for ‘merging’ two sets. A set is an unordered … Read more Merging Dictionaries in Python 3.9

Memoization in Python

Memoization is a term introduced by Donald Michie in 1968, which comes from the latin word memorandum (to be remembered). Memoization is a method used in computer science to speed up calculations by storing (remembering) past calculations. If repeated function calls are made with the same parameters, we can store the previous values instead of … Read more Memoization in Python

How To Model Time Series Data With Linear Regression

We all learnt linear regression in school, and the concept of linear regression seems quite simple. Given a scatter plot of the dependent variable y versus the independent variable x, we can find a line that fits the data well. But wait a moment, how can we measure whether a line fits the data well … Read more How To Model Time Series Data With Linear Regression

Find and plot your optimal path using Plotly and NetworkX in Python

Many libraries can be used to plot a path using Google Maps API, but this leads to reduced flexibility. Also, if you use a set of lines to draw a path then, in a lack of better words, it doesn’t look good. Let me give you an example: Generated using Plotly Also, on many occasions, … Read more Find and plot your optimal path using Plotly and NetworkX in Python

Build A Keyword Extraction API with Spacy, Flask, and FuzzyWuzzy

Often when dealing with long sequences of text you’ll want to break those sequences up and extract individual keywords to perform a search, or query a database. If the input text is natural language you most likely don’t want to query your database with every single word — instead, you probably want to choose a … Read more Build A Keyword Extraction API with Spacy, Flask, and FuzzyWuzzy

Italian covid-19 Analysis with python

Photo by Gerd Altmann from Pixabay This tutorial analyses data about COVID-19 released by the Italian Protezione Civile and builds a predictor for the end of the epidemics. The general concepts behind this predictor are described in the following article: https://medium.com/@angelica.loduca/predicting-the-end-of-the-coronavirus-epidemics-in-italy-8da9811f7740. The code can be downloaded from my github repository: https://github.com/alod83/data-science/tree/master/DataAnalysis/covid-19. The main objective of … Read more Italian covid-19 Analysis with python

Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

This series is not meant to quickly show you some plots with lots of colorful curves that are supposed to convince you that my model can perfectly predict coronavirus cases to a tee all over the world; Rather, I’ll explain all the background necessary for you to understand these models, form your own opinion of … Read more Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

ImportError: No module named ‘XYZ’

The Inspection The thing to check is which python is the Jupyter Notebook using. So type the following command in the Jupyter notebook to pull out the executable paths. import syssys.path Here are what I got, ‘/Users/yufeng/anaconda3/envs/py33/lib/python36.zip’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6′,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/lib-dynload’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/aeosa’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/IPython/extensions’,’/Users/yufeng/.ipython’ However, if I type the same command in the system’s Python, here are what I got, ‘/Users/yufeng/anaconda3/lib/python37.zip’, ‘/Users/yufeng/anaconda3/lib/python3.7’, … Read more ImportError: No module named ‘XYZ’

Solving your first linear program in Python

The ‘why’, ‘what’ and ‘how’ of linear programming in Python. Figuring out a cake recipe I do not remember. You might have come across the term ‘linear programming’ at some point in data science or research. I will try to explain what it is and how one can implement a linear program in Python. Why … Read more Solving your first linear program in Python

Perceptron: Explanation, Implementation and a Visual Example

The perceptron is the building block of artificial neural networks, it is a simplified model of the biological neurons in our brain. A perceptron is the simplest neural network, one that is comprised of just one neuron. The perceptron algorithm was invented in 1958 by Frank Rosenblatt. Below is an illustration of a biological neuron: … Read more Perceptron: Explanation, Implementation and a Visual Example

Spinning up Jupyter Notebooks as ECS Service in AWS With Terraform

The data scientists in our team need to run time consuming Python scripts very often. Depending on the repetition of the task, we decide whether to Dockerize it and run it on AWS or not. If a script needs to be run multiple times, we put effort in rewriting/restructuring the code and wrap it into … Read more Spinning up Jupyter Notebooks as ECS Service in AWS With Terraform

Array Oriented Programming with Python NumPy

After understanding broadcasting, another important concept is to manipulate the shape. Let’s see a few techniques: Reshape It is common practice to create a NumPy array as 1D and then reshape it to multiD later, or vice versa, keeping the total number of elements the same. 📌 The reshape returns a new array, which is … Read more Array Oriented Programming with Python NumPy

What is PyTorch?

Think about Numpy, but with strong GPU acceleration PyTorch is a library for Python programs that facilitates building deep learning projects. We like Python because is easy to read and understand. PyTorch emphasizes flexibility and allows deep learning models to be expressed in idiomatic Python. In a simple sentence, think about Numpy, but with strong … Read more What is PyTorch?

Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!)

Let’s begin by creating our first visualization that will demonstrate the number of total cases over time in various countries: Creating our First Visualization. Source: Nik Piepenbreier Let’s explore what we did her in a bit more detail: In Section 6, we created a dictionary that contains hex values for different countries. Storing this in … Read more Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!)

Beginners Guide to Transition from SAS to Python

SAS is a specialized data analytics programming language that has been around since 1976. That was 14 years before Python first appeared as a general purpose programming language in 1990 and 32 years before Pandas was first released in 2008 and transformed Python into an open source data analytics power house. While SAS is still … Read more Beginners Guide to Transition from SAS to Python

Difference between type() and isinstance() in Python

First things first. We need to understand subclasses and inheritance. Take the following code as an example: We define class Rectangle(Shape) as a subclass of Shape, and Square(Rectangle) as a subclass of Rectangle. Subclasses inherit methods, properties, and other functionalities of their superclasses. We define the hierarchy in a way such that an object of … Read more Difference between type() and isinstance() in Python

Co-variance: An intuitive explanation!

A comprehensive but simple guide which focus more on the idea behind the formula rather than the math itself — start building the block with expectation, mean, variance to finally understand the large picture i.e. co-variance co-variance calculation in all its glory! Introduction Contrary to the popular belief, a formula is much more than just … Read more Co-variance: An intuitive explanation!

Avoid These Rookie Python Mistakes

Though the “ Not Implemented” error is likely one of the least common errors on this list, I think it’s important to issue a reminder. Raising NotImplemented in Python will not raise a NotImplemented error, but instead will raise a Type error. Here is a function I wrote to illustrate this: def implementtest(num):if num == … Read more Avoid These Rookie Python Mistakes

Five Cool Python Looping Tips

(python logo courtesy of http://python.org) One tool I have found really valuable in my experience is the ability to loop through two arrays at once. This is something noticeably more difficult in other languages, and something I really appreciate the ease of in Python. In order to loop through two arrays at once, we simply … Read more Five Cool Python Looping Tips

Categorical Encoding Techniques

Categorical data is a common type of non-numerical data that contains label values and not numbers. Some examples include: Colors: Red, Green, Blue Cities: New York, Austin, Denver Gender: Male, Female Place: First, Second, Third According to Wikipedia, “a categorical variable is a variable that can take on one of a limited, and usually fixed … Read more Categorical Encoding Techniques

Generalized Poisson Regression for Real World Datasets

We will use a set of regression variables from the data set, namely, Day, Day of the Week(Derived from Date), Month(Derived from Date), High Temp, Low Temp and Precipitation to ‘explain’ the variance in the observed counts on the Brooklyn Bridge. The regression matrix and the vector of observed bicyclist counts The training algorithm of … Read more Generalized Poisson Regression for Real World Datasets

Static Typing in Python

Apart from function arguments and return values, you can also annotate variables with a certain data type. You can also annotate variables without initialising them with any values! major: str=’Tom’ # type:str, this comment is no longer necessaryi: int It is better to annotate variables using this built-in syntax instead of comments, as comments are … Read more Static Typing in Python

How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool

With Python Widget, you can Upgrade your Visualization in 3 Lines of Codes Imagine we are analyzing the trend population of countries in the world from 1960 to 2018 to predict the population in the next coming years. To visualize the trend of the country, we choose Plotly for its simple and beautiful visualization. Start … Read more How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool

COVID-19 Growth Modeling and Forecasting with Prophet

COVID-19 is a hot topic these days. Healthcare workers are the first line of defense. If you are in IT you are part of the fight against the virus. I thought I should do my part and implement a method to forecast coronavirus growth and dates when the number of infections could stabilize. Forecasting new … Read more COVID-19 Growth Modeling and Forecasting with Prophet

5 Visualisations to Level Up Your Data Story

Going beyond histograms and box plots with Plotly. Storytelling is an essential skill for us data scientists. To convey our ideas and be persuasive, we need effective communication. And aesthetic visualisations are a great tool for that. In this post, we’ll cover 5 visualisation techniques beyond the classics that can make your data story more … Read more 5 Visualisations to Level Up Your Data Story

A Complete Walk-Through in Python (1 of 5)

Introduction and Numeric Data Types Official Logo from Python.org Programming is an art of teaching a computer to complete a certain task using some instructions which can be understood by the programmer as well as the computer. Python is one of the programming languages with a highly simplified way of writing computer programs. It is … Read more A Complete Walk-Through in Python (1 of 5)

Python: Identifying Twitter Influencers through Network Analysis

Tweepy, iGraph, and some seriously useful user defined classes credit: Pixabay For this tutorial-styled blog post we’re going to use the Tweepy API to collect tweets from Tesla (official) and Elon Musk, create edges connecting users delineated in user-mentions attribute (as vertices) using a bit of data cleaning in conjunction with iGraph and employ Eigenvector … Read more Python: Identifying Twitter Influencers through Network Analysis

An Introduction to Making Scientific Publication Plots with Python

An introduction to how to use Python to plot data for scientific publications Photo by Isaac Smith on Unsplash I have been using Python to do scientific computations and make all of my plots for several years now. My primary motivations have been that (1) Python is open source, and (2) the amount of hard … Read more An Introduction to Making Scientific Publication Plots with Python

Predicting Weekly Hotel Cancellations with ARIMA

Hotel cancellations can cause issues for many businesses in the industry. Not only is there the lost revenue as a result of the customer cancelling, but this can also cause difficulty in coordinating bookings and adjusting revenue management practices. Data analytics can help to solve this issue in terms of identifying the customers who are … Read more Predicting Weekly Hotel Cancellations with ARIMA

Predicting Apple Stock in times of Coronavirus

Simplicity is key. On February 13th I posted an article about “Outstanding Results Predicting Apple Stock through time using Continual ML”. It was suggested to try it in a “disfavorable” scenario, so I re-run an improved the exercise including the last few weeks to see how it performed during these harsh times. Step 1: Define … Read more Predicting Apple Stock in times of Coronavirus

A Simple Way to Optimize “Something” in Python

Why is optimizing a budget important? It allows us to control ads spending, determine how much to spend and maximize desired outcome (visits, clicks etc). As an example, given a budget of $10,000 and its constraints, we are able to determine what is the optimal budget allocation for each marketing channel such that we maximize … Read more A Simple Way to Optimize “Something” in Python

Python SQLite Tutorial — The Ultimate Guide

SQL and Python have quickly become quintessential skills for anyone taking on serious data analysis! This Python SQLite tutorial is the only guide you need to get up and running with SQLite in Python. In this post, we’ll cover off: Loading the library Creating and connecting to your database Creating database tables Adding data Querying … Read more Python SQLite Tutorial — The Ultimate Guide

Creating Adversarial Examples with JAX from the scratch

In this tutorial, we will see how to create Adversarial Examples that fool neural networks using JAX. Photo by Debbie Molle on Unsplash Firstly, let’s see some definitions. What are the Adversarial Examples? Simply put, Adversarial Examples are inputs to a neural network that are optimized to fool the algorithm i.e. result in misclassification of … Read more Creating Adversarial Examples with JAX from the scratch

Plotly Front to Back: Scatter Charts and Bubble Charts

Now, the basic idea behind the scatter plot is to visualize the relationship between a set of variables. By relationship, we mean the dependence between variables — or movement — what happens to the second variable if the first one moves by some amount — or correlation to say it in the simplest way. Scatter … Read more Plotly Front to Back: Scatter Charts and Bubble Charts

Iterators & Iterables in Python

We can see that the ‘__iter__()’ method is present in the list of methods and attributes for our object . In general, any object with the ‘__iter__()’ method can be looped over. Further, when we use for-loops on iterables, we call the ‘__iter__()’ method. When the ‘__iter__()’ method is called it returns an iterator. Now, … Read more Iterators & Iterables in Python