Identify your Data’s Distribution

Is your Distribution’s assumption correct? Let’s find it out. Photo by Luke Cheeser on Unsplash Every day we come across a variety of Data like Sensor Data, Sales Data, Customer Data, Traffic Data, etc. Further, depending on the use case, we do a variety of processing and try out several algorithms on it. Have you … Read more Identify your Data’s Distribution

5 AI Pitfalls for Business & How to Avoid Them

If things go wrong with an AI project, the best case is usually wasted investment or missed opportunity. For example, failing to keep up with competitors. In worse cases, AI may inflict damage to some aspect of your business. Casualties can include sales growth, customer satisfaction, brand or operational efficiency. Business members of the AI … Read more 5 AI Pitfalls for Business & How to Avoid Them

How to use AWS Lambda and CloudWatch for beginners

Let’s build a simple serverless workflow using AWS services! I found a cool website (https://covid19api.com/) where we can easily access COVID19 data using free API. This gave me an idea to create simple function to grab the data using AWS Lambda and save it to S3. The script will be executed daily automatically using CloudWatch. … Read more How to use AWS Lambda and CloudWatch for beginners

How to Create a Data Science Portfolio — by a Data Scientist

Good day reader,I hope everyone is staying safe and washing their hands. It’s really during times like these where mental and physical health is extremely important to keep us going. As I mentioned in my previous article,data science does not crash with the economy.The data industry is still in demand, some would even argue it … Read more How to Create a Data Science Portfolio — by a Data Scientist

Building a Career in Data Science with Emily Robinson

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Data science exists at the intersection of a number of genuinely technical topics, from statistics to programming to machine learning. … Read more Building a Career in Data Science with Emily Robinson

Fight COVID-19 with machine learning

9 ways machine learning helps us fight the viral pandemic Viral pandemics are a serious threat. COVID-19 is not the first, and it won’t be the last. But, like never before, we are collecting and sharing what we learn about the virus. Hundreds of research teams around the world are combining their efforts to collect … Read more Fight COVID-19 with machine learning

How to ULTRALEARN Data Science

Supercharge your data science learning journey I just finished Ultralearning by Scott Young, and I thought that the concepts in this book could help many people who are looking to learn data science. Scott used this approach to learn the entire MIT undergrad computer science coursework in a single year (it usually takes four) and … Read more How to ULTRALEARN Data Science

Italian covid-19 Analysis with python

Photo by Gerd Altmann from Pixabay This tutorial analyses data about COVID-19 released by the Italian Protezione Civile and builds a predictor for the end of the epidemics. The general concepts behind this predictor are described in the following article: https://medium.com/@angelica.loduca/predicting-the-end-of-the-coronavirus-epidemics-in-italy-8da9811f7740. The code can be downloaded from my github repository: https://github.com/alod83/data-science/tree/master/DataAnalysis/covid-19. The main objective of … Read more Italian covid-19 Analysis with python

Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

This series is not meant to quickly show you some plots with lots of colorful curves that are supposed to convince you that my model can perfectly predict coronavirus cases to a tee all over the world; Rather, I’ll explain all the background necessary for you to understand these models, form your own opinion of … Read more Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

Tutorial: ggplot2 Heatmaps and Traffic Deaths in Thailand

Photo by Dan Freeman on Unsplash So in this tutorial, we’ll be making a heatmap of the most dangerous countries to drive in, as measured by the number of traffic deaths per 100,000 residents. We’ll use R and ggplot2 to visualize our results. Is It Really That Dangerous To Drive in Thailand? It’s often said … Read more Tutorial: ggplot2 Heatmaps and Traffic Deaths in Thailand

ImportError: No module named ‘XYZ’

The Inspection The thing to check is which python is the Jupyter Notebook using. So type the following command in the Jupyter notebook to pull out the executable paths. import syssys.path Here are what I got, ‘/Users/yufeng/anaconda3/envs/py33/lib/python36.zip’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6′,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/lib-dynload’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/aeosa’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/IPython/extensions’,’/Users/yufeng/.ipython’ However, if I type the same command in the system’s Python, here are what I got, ‘/Users/yufeng/anaconda3/lib/python37.zip’, ‘/Users/yufeng/anaconda3/lib/python3.7’, … Read more ImportError: No module named ‘XYZ’

Algorithmic Complexity

Community finding algorithms in real networks Zachary’s karate club network with 2 communities identified. Due to the size of real networks, it is sometimes unfeasible to use brute-force algorithms to define communities. Algorithms used to handle these problems, in the best-case scenario, run in polynomial time. Although, most of the times, it is necessary to … Read more Algorithmic Complexity

Reproducible Machine Learning

A step towards making ML research open and accessible Photo credit: geralt via Pixabay The NeurIPS (Neural Information Processing Systems) 2019 conference marked the third year of their annual reproducibility challenge and the first time with a reproducibility chair in their program committee. So, what is reproducibility in machine learning? Reproducibility is the ability to … Read more Reproducible Machine Learning

tSNE Degrades to PCA

At large Perplexity This is the sixteenth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. In my previous post, tSNE vs. UMAP: Global Structure, I touched the limit of large perplexity as … Read more tSNE Degrades to PCA

Where are the most vulnerable people in the UK?

As with the UK and the rest of the world, we have been observing the impacts of COVID-19 as the pandemic continues to unfold, with a continuous rise in cases and deaths. This was one of the headlines in the news following the government’s request for volunteers once the UK lockdown was imposed by Boris … Read more Where are the most vulnerable people in the UK?

Considerations on the importance of data and science in data science

Surely, the most frequent type of visual is that comparing how COVID-19 is affecting different regions, either within a country or comparing different countries (usually using China or Wuchan as a reference -after all, it is where it all started). While this kind of plots could provide answers to questions such as how a specific … Read more Considerations on the importance of data and science in data science

Explainable, data-efficient text classification

Improving ULMFiT with the right kind of attention In this article you can find: an introduction describing core ideas, history and applications of Transfer Learning, a review of recent developments in natural language processing — including the ULMFiT algorithm (utilizing pre-trained language models based on recurrent neural networks), a novel network architecture — modification of … Read more Explainable, data-efficient text classification

Solving your first linear program in Python

The ‘why’, ‘what’ and ‘how’ of linear programming in Python. Figuring out a cake recipe I do not remember. You might have come across the term ‘linear programming’ at some point in data science or research. I will try to explain what it is and how one can implement a linear program in Python. Why … Read more Solving your first linear program in Python

Perceptron: Explanation, Implementation and a Visual Example

The perceptron is the building block of artificial neural networks, it is a simplified model of the biological neurons in our brain. A perceptron is the simplest neural network, one that is comprised of just one neuron. The perceptron algorithm was invented in 1958 by Frank Rosenblatt. Below is an illustration of a biological neuron: … Read more Perceptron: Explanation, Implementation and a Visual Example

Spinning up Jupyter Notebooks as ECS Service in AWS With Terraform

The data scientists in our team need to run time consuming Python scripts very often. Depending on the repetition of the task, we decide whether to Dockerize it and run it on AWS or not. If a script needs to be run multiple times, we put effort in rewriting/restructuring the code and wrap it into … Read more Spinning up Jupyter Notebooks as ECS Service in AWS With Terraform

Introduction to Machine Translation

Our dear planet is enrichened by more than 7,000 languages, and thanks to Technology, we live in a world that is more and more globalized. Translation has become a pillar of communication allowing people to make all sorts of connections. To really get the importance of translations, here is a key figure: in 2015 Google … Read more Introduction to Machine Translation

Attendance Estimation with Azure ML

TLDR; This post provides an E2E Jupyter notebook for training crowd counting models with C-3-Framework and AzureML SDK for Python Often as AI Cloud Developer Advocate I find myself talking in front of various audiences. Now that I have some time to work on some new demos while working from home I’ve been thinking of … Read more Attendance Estimation with Azure ML

Array Oriented Programming with Python NumPy

After understanding broadcasting, another important concept is to manipulate the shape. Let’s see a few techniques: Reshape It is common practice to create a NumPy array as 1D and then reshape it to multiD later, or vice versa, keeping the total number of elements the same. 📌 The reshape returns a new array, which is … Read more Array Oriented Programming with Python NumPy

Create a Free Linux Virtual Machine on your Computer for Data Science Projects using VirtualBox…

In this day and age, cloud computing power is prevalent and cheap. One doesn’t need to look very hard online to find free or affordable hosting options for app development, databases, or data science projects. Regardless of online-availability, there are many reasons, like security, expenses and curiosity, to set up custom environments on your own … Read more Create a Free Linux Virtual Machine on your Computer for Data Science Projects using VirtualBox…

Scrape Tabular Data with Python

Scrape the tables Now, we are going to scrape those tables from the page using Beautifulsoup. The standard way of getting all the tables from the page is, page = requests.get(URL)soup = BeautifulSoup(page.content, ‘html.parser’)tables = soup.find_all(“table”) where requests.get(URL) is basically getting the information from the page and BeautifulSoup(page.content, ‘html.parser’) is to parse the information. We … Read more Scrape Tabular Data with Python

5 Papers on CNNs Every Data Scientist Should Read

Advancements in CPU and GPU technology, easier access to large data repositories, and the convolutional neural network have led to great leaps in the field of computer vision. From facial recognition to cancer detection, CNN-based frameworks have the potential to benefit human society in countless ways. In this article, we introduce 5 papers on CNNs … Read more 5 Papers on CNNs Every Data Scientist Should Read

What is PyTorch?

Think about Numpy, but with strong GPU acceleration PyTorch is a library for Python programs that facilitates building deep learning projects. We like Python because is easy to read and understand. PyTorch emphasizes flexibility and allows deep learning models to be expressed in idiomatic Python. In a simple sentence, think about Numpy, but with strong … Read more What is PyTorch?

Data Science Reading List for April 2020

This month’s Data Science reading list. Note: I am not affiliated with any of the authors in this article. These are simply books I’ve recently enjoyed that I’m excited to share with you. There are no referrals or a cent going in my pocket. Enjoy! Welcome to this week’s reading list! The themes for this … Read more Data Science Reading List for April 2020

Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!)

Let’s begin by creating our first visualization that will demonstrate the number of total cases over time in various countries: Creating our First Visualization. Source: Nik Piepenbreier Let’s explore what we did her in a bit more detail: In Section 6, we created a dictionary that contains hex values for different countries. Storing this in … Read more Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!)

Beginners Guide to Transition from SAS to Python

SAS is a specialized data analytics programming language that has been around since 1976. That was 14 years before Python first appeared as a general purpose programming language in 1990 and 32 years before Pandas was first released in 2008 and transformed Python into an open source data analytics power house. While SAS is still … Read more Beginners Guide to Transition from SAS to Python

Tutorial: Plotting in R for Python Refugees

The plotting library we’ll be using is ggplot2. Based on the grammar of graphics, it operates on the idea that every graph can be built from three components: A data set A coordinate system Geoms, or visual marks that represent data points To display values, the variables in the data need to be mapped to … Read more Tutorial: Plotting in R for Python Refugees

There won’t be an AI winter this time

Machine learning isn’t a “Skynet or bust” proposition Source: Pexels Disclaimer: My observations are influenced by the fact that I work on Cortex, an open source machine learning deployment platform. Every few weeks, a new article predicting an imminent AI winter gets circulated. The arguments generally follow the same lines: The power of deep learning … Read more There won’t be an AI winter this time

Extracting Coefficients of OpenCV Face Detection DNN model

Photo from pixabay.com The latest OpenCV includes a Deep Neural Network (DNN) module, which comes with a nice pre-trained face detection convolutional neural network (CNN). The new model enhances the face detection performance compared to the traditional models, such as Haar. The framework used to train the new model is Caffe. However, some people have … Read more Extracting Coefficients of OpenCV Face Detection DNN model

How to Teach Your Panda SQL in 10 Minutes

A quick overview of integrating SQL queries to your Pandas DataFrames Photo by Ilona Froehlich on Unsplash Managing your data in a comprehensible and organized fashion is an absolute essential skill for any data scientist or analyst. Pandas DataFrames allow for clean views and operations on very large datasets, while it also is perhaps one … Read more How to Teach Your Panda SQL in 10 Minutes

Can a Monkey Do Just as Well in the Stock Market as a Technical Analyst?

Could you pick it out? It’s difficult! 😅 If you’d like to replicate the above experiment, the code is here. I like to use this example as a visual when introducing the Random Walk Theory because initially, it seems absurd — until you realize just how incredibly random numbers can resemble stock prices. Random numbers … Read more Can a Monkey Do Just as Well in the Stock Market as a Technical Analyst?

Density-based algorithms

The pure apprehension of two density-based algorithms: DBSCAN and OPTICS Image credit: Fabrice Jazbinsek Clustering methods like partitional methods or hierarchical clusters are not effective in finding clusters of arbitrary shapes. The density-based clustering method is efficient in finding the clusters of arbitrary shapes also prevents outliers and noise. Object clustering when using a density-based … Read more Density-based algorithms

Difference between type() and isinstance() in Python

First things first. We need to understand subclasses and inheritance. Take the following code as an example: We define class Rectangle(Shape) as a subclass of Shape, and Square(Rectangle) as a subclass of Rectangle. Subclasses inherit methods, properties, and other functionalities of their superclasses. We define the hierarchy in a way such that an object of … Read more Difference between type() and isinstance() in Python

Co-variance: An intuitive explanation!

A comprehensive but simple guide which focus more on the idea behind the formula rather than the math itself — start building the block with expectation, mean, variance to finally understand the large picture i.e. co-variance co-variance calculation in all its glory! Introduction Contrary to the popular belief, a formula is much more than just … Read more Co-variance: An intuitive explanation!

5 Points you should know to start using Google Cloud Platform

A comprehensive guide for understanding Google Cloud Platform and start using it right away!! In recent days I have come across many Data analyst job roles requiring GCP experience. Let’s try to understand what the Google Cloud Platform (GCP) means and why it is in high demand by many businesses. Google Image by techrepublic GCP … Read more 5 Points you should know to start using Google Cloud Platform

An Intuitive Explanation of Kernels in Support Vector Machine (SVM)

Simple Example: We have a 3-dimensional vector x = (x1, x2, x3). We define this operation f(x) as such: f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3). In other words, it wants to multiply every pairs in x, and produce a 9-dimensional vector. Let’s plug it in some numbers and make it … Read more An Intuitive Explanation of Kernels in Support Vector Machine (SVM)

Generative vs Discriminative Probabilistic Graphical Models

Generative and discriminative models are widely used machine learning models. For example, Logistic Regression, Support Vector Machine and Conditional Random Fields are popular discriminative models; Naive Bayes, Bayesian Networks and Hidden Markov models are commonly used generative models. Probabilistic graphical models (PGMs) are a rich framework for encoding probability distributions over complex domains like joint … Read more Generative vs Discriminative Probabilistic Graphical Models

Monitor Your Dependencies! Stop Being A Blind Data-Scientist.

Dominos as an analogy for dependencies, pixabay.com. Reasons for monitoring your model dependencies. In my previous article “Monitor! Stop Being A Blind Data Scientist”, I mentioned the many use cases and the monumental importance of monitoring & alerts on our field, specifically from a data-science-researcher point-of-view. I looked at use-cases and reviewed several companies that … Read more Monitor Your Dependencies! Stop Being A Blind Data-Scientist.