Maximum Likelihood Estimation: How it Works and Implementing in Python

Previously, I wrote an article about estimating distributions using nonparametric estimators, where I discussed the various methods of estimating statistical properties of data generated from an unknown distribution. This article covers a very powerful method of estimating parameters of a probability distribution given the data, called the Maximum Likelihood Estimator. This article is part of … Read more

Categories Featured ExcerptFavorite

A Data Analysis of Riding The Bus

What should I expect before a round of the popular drinking game? Recommended equipment for Ride The Bus College. It’s a time for things like exploring your personality, finding your values, and making lifelong friends. Those are all well and good, but college is also a time for drinking games! There’s plenty of time in the … Read more

Building a molecular charge classifier

The intersection of Chemistry and A.I A.I has seen unprecedented growth in the past couple years. Although machine learning architectures like Neural Networks (NN) have been known for a long time thanks to breakthroughs from top researchers like Geoffrey Hinton, only recently have NNs become powerful tools in an A.I specialist’s toolbox. This is credited mainly … Read more

A gentle journey from linear regression to neural networks

Deep Learning What are we talking about? A quick search on Google give us the following definition of “deep learning” : “the ensemble of deep learning methods is a part of a broader family of machine learning methods that aims at modelling data with a high level of abstraction”. Here, we should understand that deep learning consists … Read more

Categories Featured ExcerptFavorite

Because it’s Saturday: Go Your Own Way

I was delivering a workshop for AI Live yesterday so I didn’t get the chance to do my Friday post, but I’m here at SatRDays DC and the playlist on the audio while we’re waiting for things to start is amazing. Fleetwood Mac’s Go Your Own Way just came on, which reminded me of this … Read more

Categories Featured ExcerptFavorite

Data network effects for an artificial intelligence startup

Artificial intelligence (AI) ecosystem matures and it is becoming increasingly difficult to impress customers, investors, and potential acquirers by just attaching an .ai domain to whatever you are doing. Therefore, the significance of building a defensible business model in the long run becomes obvious. In this post, I explore how an AI startup may unlock various … Read more

Categories Featured ExcerptFavorite

R some blog 2018-12-08 04:19:00

Motivation The dplyr functions select and mutate nowadays are commonly applied to perform data.frame column operations, frequently combined with magrittrs forward %>% pipe. While working well interactively, however, these methods often would require additional checking if used in “serious” code, for example, to catch column name clashes. In principle, the container package provides a dict-class … Read more

Categories Featured, R ExcerptFavorite

How To Ask The Right Questions As A Data Scientist

How to define a problem statement by asking the right questions? (Source) Admit it or not, defining a problem statement (or data science problem) is one of the most important steps in data science pipeline. A problem well defined is a problem half-solved — Charles Kettering In the following part, we’ll go through the four … Read more

Categories Featured ExcerptFavorite

Day 08 – little helper intersect2

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories Featured, R ExcerptFavorite

Feel discouraged on the sparse data in your hand? Give Factorization Machine a shot (2)

By laying a solid foundation of Matrix Factorization, your exploration on a series of advanced models derived from the concept of matrix factorization will be much more smoother, such as LDA, LSI, PLSA and Tensor Factorization and etc. The models derived from the concept of Matrix Factorization In last session, we talked about the basic … Read more

Categories Featured ExcerptFavorite

Python Virtual Environment

Conda How to set up a virtual environments using conda for the Anaconda Python distribution A virtual environment is a named, isolated, working copy of Python that that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environmets … Read more

“Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

TLDR: Begin with N of 10, increase by 10 until p < 0.05 or max N reached. This design has inflated type-I error. Lower p-value threshold needed to ensure specified type-I error rate. The number of interim analyses and max N affect the type-I error rate. Threshold can be identified using simulation. A recent Facebook … Read more

Categories Featured, R ExcerptFavorite

Shortcoming of Under-sampling Algorithms: CCMUT and E-CCMUT

What, Why, Possible Solution and Ultimate Utility In one of my previous articles, “Under-sampling : A Performance Booster on Imbalanced Data”: I have applied Cluster Centroid based Majority Under-sampling Technique (CCMUT) on Adult Census Data and proved the Model Performance Improvement w.r.t State-of-the-Art Model, “A Statistical Approach to Adult Census Income Level Prediction”[1]. But there are … Read more

Categories Featured ExcerptFavorite

“Artist” in Matplotlib — something I wanted to know before spending tremendous hours on googling…

Originally published at and modified a bit to fit Medium’s editing system. It’s true that matplotlib is a fantastic visualizing tool in Python. But it’s also true that tweaking details in matplotlib is a real pain. You may easily lose hours to find out how to change a small part of your plot. Sometimes … Read more

Categories Featured ExcerptFavorite

Avoiding Parking Tickets in San Francisco Using Data Analytics

Although still not a perfect predictor, this model was more accurate than the first. The streets identified as best showed much less variability than those of the worst as well. We could also reduce the amount of tickets by over 50% if we chose the best population compared to the worst. Interestingly, parking density was … Read more

Categories Featured ExcerptFavorite

Comparative study on Classic Machine learning Algorithms

2. Logistic Regression Just like linear regression, Logistic regression is the right algorithm to start with classification algorithms. Eventhough, the name ‘Regression’ comes up, it is not a regression model, but a classification model. It uses a logistic function to frame binary output model. The output of the logistic regression will be a probability (0≤x≤1), … Read more

How should we define AI?

In our very first section, we’ll become familiar with the concept of AI by looking into its definition and some examples. As you have probably noticed, AI is currently a “hot topic”: media coverage and public discussion about AI is almost impossible to avoid. However, you may also have noticed that AI means different things … Read more

F# Advent Calendar — A Christmas Classifier

The ML.NET Model The model is defined in Program.fs The dataLoader specifies the schema of the input data. Input Data Schema The dataLoader is then used to load the training and test data views. Load Training and Test Data The dataPipeline specifies the transforms that should be applied to the input tsv. Since this is a … Read more

Categories Featured ExcerptFavorite

Gender Diversity in the R and Python Communities

Many (if not most) tech communities have far more representation from men than from women (and even fewer from nonbinary folk). This is a shame, because everybody uses software, and these projects would self-evidently benefit from the talent and expertise from across the entire community. Some projects are doing better than others, though, and data … Read more

Categories Featured ExcerptFavorite

I Can Be Your Heroku, Baby

Deploying a Python app in Heroku! Do you like Data Science? <Shakes head up and down> Do you like Data Science DIY deployment? <Shakes head left and right> Me neither. One of the most frustrating parts of early data science learning or personal work is deploying an app through free cloud applications. Your code is juuust … Read more

Roadmap for Conquering Computer Vision

It has become quite a tradition to write blogs on giving guidelines to ace Machine learning. I have had a hard time finding any such roadmap and to-do list for computer vision. As a vision enthusiast and consultant, I have found a lot of people asking about a concrete roadmap (in terms of skills, courses … Read more

Categories Featured ExcerptFavorite

How to determine the best model?

Machine learning models play a critical role in many aspects of today’s business. The use of a predictive model can improve the business bottom line, and a slightly improved model can result in an increase of millions of dollars. Although you may not know all the popular algorithms (and more powerful algorithms in the future), … Read more

Image Processing Class (EGBE443) #3 — Point Operation

The implement of the point operation affect on the histogram. Raising the brightness shift the histogram to right and increasing the contrast of the image expand the histogram. These point operations map the intensity by the mapping function contained the constant which is image content such as the highest intensity and the lowest intensity. Automatic … Read more

Part 2: Gradient descent and backpropagation

Dec 3, 2018 In this article you will learn how a neural network can be trained by using backpropagation and stochastic gradient descent. The theories will be described thoroughly and a detailed example calculation is included where both weights and biases is updated. This is the second part in a series of articles: I assume … Read more

Machine Learning Introduction: A Comprehensive Guide

Dec 3, 2018 This is the first of a series of articles in which I will describe machine learning concepts, types, algorithms and python implementations. The main goals of this series are: Creating a comprehesive guide towards machine learning theory and intuition. Sharing and explaining machine learning projects, developed in python, to show in a … Read more

The Perceptron Algorithm

In my blog post Neural Nets: From Linear Regression to Deep Nets I talked about how a deep neural net is simply a sequence of simple building blocks of the form: \[\sigma(\underbrace{w^T}_{weights}x + \overbrace{b}^{bias}) = a\] and that a linear regression model is one of the most basic neural networks where the activation function \(\sigma\) … Read more


Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates Abstract: This post provides an overview of a phenomenon called “Super Convergence” where we can train a deep neural network in order of magnitude faster compared to conventional training methods. One of the key elements is training the network using “One-cycle policy” with maximum … Read more

The Hidden Dangers in Algorithmic Decision Making

A robot judge in Futurama was all fun and games, until COMPAS was created. The quiet revolution of artificial intelligence looks nothing like the way movies predicted; AI seeps into our lives not by overtaking our lives as sentient robots, but instead, steadily creeping into areas of decision-making that were previously exclusive to humans. Because it … Read more

Because it’s Friday: If planets were as close as the moon

What would the sky look like if Mars, Jupiter, Saturn, or Neptune were as close to us as the Moon is now? Well, other than the global calamity caused by extreme tides and general astrophysical disruption, it looks quite pretty. If Planets were as close as the Moon — Physics & Astronomy Zone (@ZonePhysics) … Read more

Simulating dinosaur populations, with R

So it turns out that the 1990 Michael Crichton novel Jurassic Park is, indeed, a work of fiction. (Personal note: despite the snark to follow, the book is one of my all-time favorites — I clearly remember devouring it in 24 hours straight while ill in a hostel in France.) If the monsters and melodrama … Read more

Image Processing Class #0.2 — Digital Image

Image File Format Any images are stored in memory, raster image contain pixel values arrange in regular matrix. Conversely, vector image represent geometric objects using continuous coordinates. If you scale up the raster image,resolution of the image will be lost but it does not happen in vector image. Raster image and Vector image Tagged Image File Format … Read more

Full-Fledged Recommender System

Nov 29, 2018 The rapid rise in AI applications, decreasing processor and memory costs have allowed the last decade to show incredible progress with Recommender Systems. Given their rising importance in the retail industry, they are undoubtedly one of the more popular topics in Artificial Intelligence. However, creating a full-fledged, ready-for-production, recommender system can … Read more

We are Collage

Dada, Instagram, and the future of AI Collage is the language of the moment, but has been for over 100 years. Lets walk through where it came from (Dada), what it’s up to now (Instagram), and why it’s integral to the future of AI (Deep Fakes, GANS, and the ingrained copy). Yesterday: Dada While the technique … Read more

Categories Featured ExcerptFavorite

Attention Seq2Seq with PyTorch: learning to invert a sequence

Nov 29, 2018 TL;DR: In this article you’ll learn how to implement sequence-to-sequence models with and without attention on a simple case: inverting a randomly generated sequence. You might already have come across thousands of articles explaining sequence-to-sequence models and attention mechanisms, but few are illustrated with code snippets. Below is a non-exhaustive list of articles … Read more

Neural Networks II: First Contact

Gentle introduction on Neural Networks Nov 29, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Introduction Forward Pass Backward Propagation Learning Testing Conclusion 1. Introduction In the next illustration, an Artificial Neural Network is … Read more

What better time than now?

How art and wanting to change the world led me to data science Nov 27, 2018 Data plays a crucial role in understanding the world around us. I’ve been working with data in one way or another since before I could appreciate its value. Now I’m in an immersive data science program. Here’s a little bit … Read more

AzureVM: managing virtual machines in Azure

This is the next article in my series on AzureR, a family of packages for working with Azure in R. I’ll give a short introduction on how to use AzureVM to manage Azure virtual machines, and in particular Data Science Virtual Machines (DSVMs). Creating a VM Creating a VM is as simple as using the … Read more

Being a Machine Learning Engineer: 7-months in

What kind of data is there? Is it only numerical? Are there categorical features which could be incorporated into the model? Heads up, categorical features can be considered any type of data which isn’t immediately available in numerical form. In the problem of trying to predict housing prices, you might have number of bathrooms as … Read more

The Power of Data

Reflections on how data (or lack thereof) helps (or fails) policy makers in developing countries Foreword When I stood up to speak last Friday at the Steering Committee meeting between the Ministry of Education of Ivory Coast and TRECC — a partnership for transforming Education in cocoa producing regions, led by the Jacobs Foundation –, it had … Read more

Neural Networks I: Notation and building blocks

Gentle introduction on Neural Networks Nov 25, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Neurons Connections Layers — Neurons vs Connections 3.1 Layers of Neurons 3.2. Layers of Connections — PyTorch Example 4. Notation ambiguity: Y = … Read more

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read more

Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read more

Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read more

Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read more

Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read more

Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read more