Cross Validation — Why & How

So, you have been working on an imbalanced data set for a few days now and trying out different machine learning models, training them on a part of your data set, testing their accuracy and you are ecstatic to see the score going above 0.95 every-time. Do you really think you have achieved 95% accuracy … Read more

Using Tensorflow Serving GRPC

How to write a GRPC Client for the wrapped model Once you have your Tensorflow or Keras based model trained, one needs to think on how to use it in,deploy it in production. You may want to Dockerize it as a micro-service, implementing a custom GRPC (or REST- or not) interface. Then deploy this to server … Read more

A Dog Detector and Breed Classifier

In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms. In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still … Read more

Build a Pipeline for Harvesting Medium Top Author Data

Nuts and Bolts One key requirement was to make deployment of my Luigi workflow very simple. I wanted to assume only one thing about the deployment environment; that the Docker daemon would be available. With Docker, I wouldn’t need to be concerned with Python version mismatches or other environmental discrepancies. It took me a little while … Read more


Shocking result of Edges-to-Photo Image-to-Image translation using the Pix2Pix GAN Algorithm This article will explain the fundamental mechanisms of a popular paper on Image-to-Image translation with Conditional GANs, Pix2Pix, following is a link to the paper: Article Outline I. Introduction II. Dual Objective Function with Adversarial and L1 Loss III. U-Net Generator IV. PatchGAN Discriminator … Read more

Probability — Fundamentals of Machine Learning (Part 1)

The Mathematics of Probability In the beginning, I suggested that probability theory is a mathematical framework. As with any mathematical framework there is some vocabulary and important axioms needed to fully leverage the theory as a tool for machine learning. Probability is all about the possibility of various outcomes. The set of all possible outcomes … Read more

Why visual literacy is essential to good data visualization

We know data literacy matters. But visual literacy matters too. Here’s why. Photo by Markus Spiske on Unsplash Data is all around us, and the way people work has changed because of it. Companies are now investing more in roles like Chief Data Officer, building their data science teams and talking about things like “data literacy” in … Read more

My journey applying AI to horse racing

My journey into machine learning began in the summer of 2016. It all started at a barbecue party at the home of my fiancé’s aunt and uncle’s in northern Stockholm. I was sitting outside at a garden table together with the older men of her family. These are old and tough Finish men, her granddad … Read more

A.I enhanced molecular discovery and optimization

Awesome! But how do we get there? Researchers at the forefront of their fields have been trying to use the existing tools we have on hand to solve this problem. There is a pattern in the modus operandi of current research, and the same general process applies to any A.I based science project. Researchers are carpenters, … Read more

Everything you need to know about Scatter Plots for Data Visualisation

If you’re a Data Scientist there’s no doubt that you’ve worked with scatter plots before. Despite their simplicity, scatter plots are a powerful tool for visualising data. There’s a lot of options, flexibility, and representational power that comes with the simple change of a few parameters like color, size, shape, and regression plotting. Here you’ll … Read more

Making Music with Machine Learning

Image from Music is not just an art, music is an expression of the human condition. When an artist is making a song you can often hear the emotions, experiences, and energy they have in that moment. Music connects people all over the world and is shared across cultures. So there is no way … Read more

Exploration of the Social News TV: The Communication Behavior of #ajnewsgrid

What is NewsGrid? NewsGrid is a young news program broadcast globally by Al Jazeera since 2016. It is Al Jazeera’s first interactive news hour. The show is produced in three parts, top stories of the day presented by one presenter, stories create huge social reaction on Twitter presented by a social media presenter, and the … Read more

Hey, Who Moved the Goalposts?

Part of 10 reasons why Software Development projects fail series The most successful software development projects have a timeline and a series of milestones to accomplish that project within a set period of time. Those milestones are critical, because they help to divide a large project into a series of much smaller projects, and they … Read more

Binary Tree: The Diameter.

Dynamic programming sequences sub-problems together, having each sub-problem lead to the solution. With dynamic programming we no longer have to visits paths we’ve been down before, instead we can prune the shorter branches and track the diameter at each step. With the dynamic approach the algorithm travels down the tree and counts lengths on the … Read more

Revisiting Adam Smith’s Invisible Hand in the Data Economy

Fundamental paradigms of the free market should also be scrutinized by data science An unobservable market force that helps the demand and supply of goods in a free market to reach equilibrium automatically and efficiently is what we call the invisible hand. But I am a data scientist, I don’t deal in unobservable forces, no observations … Read more

Set Theory — Cardinality & Power Sets

With basic notation & operations cleared in articles one & two in this series, we’ve now built a fundamental understanding of Set Theory. This third article further compounds this knowledge by zoning in on the most important property of any given set: the total number of unique elements it contains. Also known as the cardinality, … Read more

Fast, static D3 maps built with Turf.js and the command-line

Combining Mike Bostock’s command-line cartography tutorial with the flexibility of Node.js Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center Recently, I needed to build a handful of U.S. state bubble maps to be embedded in a story for San Antonio Express-News. I wanted to use D3 but was concerned about slow asset … Read more

Introduction to Unsupervised Learning

Understand principal component analysis (PCA) and clustering methods Photo by Oscar Keys on Unsplash Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding … Read more

The Basics of Cryptography

With Applications in R Source Have you ever wondered how companies securely store your passwords? Or how your credit card information is kept private when making online purchases? The answer is cryptography. The vast majority of internet sites now use some form of cryptography to ensure the privacy of its users. Even information such as emails … Read more

Assessing NHL award winners using K-means

Data sets The final data-set used is a combination of traditional and advanced player metrics. Traditional statistics concern metrics like goals and assists (total being known as points), plus-minus, penalty minutes and time on ice, whilst advanced player metrics deal more with player behavior and puck possession. Using Python’s beautifulsoup library, I scraped more traditional … Read more

Reinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

In this article I want to share my project on implementing reinforcement learning and deep reinforcement learning methods on a Tic Tac Toe game. The article contains: 1. Rigorous definition of the game as a Markov decision process. 2. How to implement the reinforcement learning method, called TD(0), to create an agent that plays the … Read more

On Canonical Companies

In Information Technology and tech startups, we talk about “systems of record”. It’s a system that is the source for a particular data that may also exist in other systems. That “system or record” is the ultimate source of truth. It’s the canonical record of data. We often talk about these systems as the place … Read more

H2O for Inexperienced Users

Some background: I am a senior in highschool, and the summer of 2018, I interned at With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer … Read more

Software 2.0 — Playing with Neural Networks (Part 1)

In this article we are going to discuss about neural networks (from scratch), the innovative concept, which has taken the world by storm. I will assume that the reader is already familiar with the following concepts: Cost function (MSE and Cross Entropy) Gradient Descent Logistic regression Activation Function Binary Classification Particularly, this article will try … Read more

Sentiment Analysis with Word Bags and Word Sequences

For generic text, word bag approaches are very efficient at text classification. For a binary text classification task studied here, LSTM working with word sequences is on par in quality with SVM using tf-idf vectors. But performance is a different matter… The bag-of-words approach to turning documents into numerical vectors ignores the sequence of words … Read more

Data Visualization in Music

Last fall I went to an Edward Tufte lecture where he began with a very effective, very sweet video of music visualized in a sequence: Tufte knows this is a great and charming start to a lecture. He knows it provides a welcome change from the outside world; an elegant fusion of music, color, and … Read more

The Science Behind AlphaStar

How DeepMind Uses Reinforcement Learning to Beat Human Pros in StarCraft II Long term strategic planning has long been considered a unique quality of the human mind that would be very difficult to imitate by artificial intelligence(AI) agents. Conceptually, strategic thinking involves evaluating a large number of data points in the present in order to … Read more

Main benefits of using a Chatbot for your business

I’m going to tell you about the future — messenger chatbots. What is Messenger Chatbot and why it is crucial for your business? Basically, It’s a digital assistant, most of the times based on AI that has the direction of various commands which looks like a natural sound conversation with your customers. Now you have the opportunity … Read more

Uncertainty estimation for Neural Network — Dropout as Bayesian Approximation

Uncertainty Estimation One of the key distinction about Bayesian is that parameters are distributions instead of fixed weights. Error = Model Uncertainty + Model misspecification + inherent noise The Bayesian neural network decomposes uncertainty into model uncertainty, model misspecification, and inherent noise. MCDropout MCDropout One of the key here in Bayesian is that everything is … Read more

Machine Learning Techniques applied to Stock Price Prediction

Image generated using Neural Style Transfer. Machine learning has many applications, one of which is to forecast time series. One of the most interesting (or perhaps most profitable) time series to predict are, arguably, stock prices. Recently I read a blog post applying machine learning techniques to stock price prediction. You can read it here. … Read more

Sliding Puzzle – Solving Search Problem with Iterative Deepening A*

Now as we are more familiar with the game, let’s solve it! Search Algorithms Let’s begin our Graph Traversal journey with visualizing and setting our problem. “A goal properly set is halfway reached.” Zig Ziglar Problem Given a board state, find a combination of moves that leads to the final state. Graph Representation Now that we … Read more

What follows AlphaStar for Academic AI Researchers?

DeepMind continues making progress, but the path forward for AI researchers in academia is unclear. Ten years ago I challenged AI researchers across the globe to build a professional-level bot for StarCraft 1. The Brood War API was recently released, and for the first time academics and professionals could test out AI systems on a highly-competitive … Read more

Using AI For Good

How to Help Developing Countries with Artificial Intelligence CE KanBlockedUnblockFollowFollowing Jan 27 Recently, I have come across quite a few articles stating how artificial intelligence may threaten the developing world by eliminating the need for repetitive, labor-intensive manufacturing roles. Automation of factories can potentially lead to higher unemployment rates in poorer nations, thereby disrupting local … Read more

Hierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

Photo by sabina fratila on Unsplash In the first part of this series, we explored the basics of using a Bayesian-based machine learning model framework, PyMC3, to construct a simple Linear Regression model on Ford GoBike data. In this example problem, we aimed to forecast the number of riders that would use the bike share tomorrow … Read more

Handling imbalanced datasets in machine learning

Reworking the problem is better Up to now the conclusion is pretty disappointing: if the dataset is representative of the true data, if we can’t get any additional feature and if we target a classifier with the best possible accuracy, then a “naive behaviour” (answering always the same class) is not necessarily a problem and should … Read more

Interactive Controls for Jupyter Notebooks

How to use interactive IPython widgets to enhance data exploration and analysis There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. Despite knowing this, I still find myself repeatedly executing cells just to make the slightest change, for example, choosing … Read more

Degrees of Freedom and Sudoko

Intuitive explanation of Degrees of Freedom and How Degrees of Freedom affects Sudoku Source : Pixabay A lot of aspiring Data Scientists take courses on statistics and get befuddled with the concept of Degrees of Freedom. Some memorize it by rote as ‘n-1′. But there is a intuitive reason why it is ‘n-1’. The Intuitive … Read more

10 Tips for Choosing the Optimal Number of Clusters

Matt.0BlockedUnblockFollowFollowing Jan 27 Photo by Pakata Goh on Unsplash Clustering is one of the most common unsupervised machine learning problems. Similarity between observations is defined using some inter-observation distance measures or correlation-based distance measures. There are 5 classes of clustering methods: + Hierarchical Clustering+ Partitioning Methods (k-means, PAM, CLARA)+ Density-Based Clustering+ Model-based Clustering+ Fuzzy Clustering My … Read more

A Gentle Introduction to Deep Learning : Part 3

PCA & Linear Algebra(Advance) Photo by Antoine Dautry “You can’t build great building on a weak foundation”. This quote truly justifies what I am trying to do here, you cannot learn the true form of machine learning or deep learning until you don’t have the knowledge of some of the important mathematical concepts like linear algebra … Read more

Data Augmentation for Natural Language Processing

Lessons learned from a hate speech detection task to improve supervised NLP models Note: this post is mainly targeted at an audience unfamiliar with Natural Language Processing and will hence cover some basics concepts before moving on to data augmentation Source: Harvard Political Review Natural Language Processing (NLP) has become increasingly popular in both academia and … Read more

Learning to Drive Smoothly in Minutes

Learning to Drive in Minutes — The Updated Approach Although technique may work in principle, it has some issues that needs to be addressed to apply it to a self-driving RC car. First, because the feature extractor (VAE) is trained after each episode, the distribution of features is not stationary. That is to say, the features are … Read more

The New Dawn of AI: Federated Learning

The emerging AI market model is dominated by tech giants such as Google, Amazon and Microsoft, who offer cloud-based AI solutions and APIs. This model offers users little control over the usage of AI products and their own data that is collected from their devices, locations etc. In the long run, such a centralized model … Read more

Analytics Building Blocks: Regression

A modularized notebook to tune and compare 11 regression algorithms with minimal coding in a control panel fasion This article summarizes and explains key modules of my regression block (One of the simple modularized notebooks I am developing to execute common analysis tasks). The notebook is intended to facilitate quicker experimentation for the users with … Read more