How to Pace the London Marathon: Fuelled by Data

Chris is a current MSc Computer Science student at the University of Warwick, UK. He is also the co-founder of Sustain Investing. Before that, Chris worked at Citi Ventures and at Citi Markets. It started with an excuse Hi, I’m Chris. While building Sustain with my cofounders Andre, Nick Foden and Sylwia Zieba I’ve been studying … Read moreHow to Pace the London Marathon: Fuelled by Data

The Blockchain Scalability Problem & the Race for Visa-Like Transaction Speed

Yes, blockchain has a scalability problem. Here’s what it is, and here’s what people are doing to solve it. The battle for a scalable solution is the blockchain’s moon race. Bitcoin processes 4.6 transactions per second. Visa does around 1,700 transactions per second on average (based on a calculation derived from the official claim of … Read moreThe Blockchain Scalability Problem & the Race for Visa-Like Transaction Speed

What are the Skills Needed to Become a Data Scientist in 2019?

It’s hardly a surprise to anyone in the tech and related industries that “data scientist” is the best job to have in the States. After all, this has been what sources like the Harvard Business Review and Glassdoor report for what is now four years in a row. And even if we take the base … Read moreWhat are the Skills Needed to Become a Data Scientist in 2019?

Making Sense of Startup Valuations with Data Science

The following is a condensed and slightly modified version of a Radicle working paper on the startup economy in which we explore post-money valuations by venture capital stage classifications. We find that valuations have interesting distributional properties and then go on to describe a statistical model for estimating an undisclosed valuation with considerable ease. In … Read moreMaking Sense of Startup Valuations with Data Science

Predicting Premier league standings — putting that math to some use

I am a casual fan when it comes to football, but the idea of building a mathematical model that can be applied to a real-world problem seemed exciting enough to have a try at it. (Let’s kick off then, shall we? ⚽️) Breaking down the problem The rankings in the league table are primarily determined by … Read morePredicting Premier league standings — putting that math to some use

A Dog Detector and Breed Classifier

In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms. In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still … Read moreA Dog Detector and Breed Classifier

Build a Pipeline for Harvesting Medium Top Author Data

Nuts and Bolts One key requirement was to make deployment of my Luigi workflow very simple. I wanted to assume only one thing about the deployment environment; that the Docker daemon would be available. With Docker, I wouldn’t need to be concerned with Python version mismatches or other environmental discrepancies. It took me a little while … Read moreBuild a Pipeline for Harvesting Medium Top Author Data

Power BI

Using Power BI and R Tutorial here: Run R scripts in Power BI Desktop The only twist that I want to add is an idea on how to enable users without admin access to run R code. This can be achieved by storing a portable r installation on a mountable file storage. R Download the … Read morePower BI

How Does Back-Propagation in Artificial Neural Networks Work?

Our Neural Network Let’s finally draw a diagram of our long-awaited neural net. It should look something like this: The leftmost layer is the input layer, which takes X0 as the bias term of value 1, and X1 and X2 as input features. The layer in the middle is the first hidden layer, which also takes … Read moreHow Does Back-Propagation in Artificial Neural Networks Work?

Pix2Pix

Shocking result of Edges-to-Photo Image-to-Image translation using the Pix2Pix GAN Algorithm This article will explain the fundamental mechanisms of a popular paper on Image-to-Image translation with Conditional GANs, Pix2Pix, following is a link to the paper: Article Outline I. Introduction II. Dual Objective Function with Adversarial and L1 Loss III. U-Net Generator IV. PatchGAN Discriminator … Read morePix2Pix

Probability — Fundamentals of Machine Learning (Part 1)

The Mathematics of Probability In the beginning, I suggested that probability theory is a mathematical framework. As with any mathematical framework there is some vocabulary and important axioms needed to fully leverage the theory as a tool for machine learning. Probability is all about the possibility of various outcomes. The set of all possible outcomes … Read moreProbability — Fundamentals of Machine Learning (Part 1)

Why visual literacy is essential to good data visualization

We know data literacy matters. But visual literacy matters too. Here’s why. Photo by Markus Spiske on Unsplash Data is all around us, and the way people work has changed because of it. Companies are now investing more in roles like Chief Data Officer, building their data science teams and talking about things like “data literacy” in … Read moreWhy visual literacy is essential to good data visualization

Master Python through building real-world applications (Part 7)

Data Collector Web App with PostgreSQL and Flask Working with database and queries can be pretty daunting to some, or maybe most of us. Perhaps, I have lost 100 readers by now just because there’s PostgreSQL written in the subtitle. But as you are here, I want you to know that it is an important thing … Read moreMaster Python through building real-world applications (Part 7)

A.I enhanced molecular discovery and optimization

Awesome! But how do we get there? Researchers at the forefront of their fields have been trying to use the existing tools we have on hand to solve this problem. There is a pattern in the modus operandi of current research, and the same general process applies to any A.I based science project. Researchers are carpenters, … Read moreA.I enhanced molecular discovery and optimization

Everything you need to know about Scatter Plots for Data Visualisation

If you’re a Data Scientist there’s no doubt that you’ve worked with scatter plots before. Despite their simplicity, scatter plots are a powerful tool for visualising data. There’s a lot of options, flexibility, and representational power that comes with the simple change of a few parameters like color, size, shape, and regression plotting. Here you’ll … Read moreEverything you need to know about Scatter Plots for Data Visualisation

Making Music with Machine Learning

Image from https://www.maxpixel.net/Circle-Structure-Music-Points-Clef-Pattern-Heart-1790837 Music is not just an art, music is an expression of the human condition. When an artist is making a song you can often hear the emotions, experiences, and energy they have in that moment. Music connects people all over the world and is shared across cultures. So there is no way … Read moreMaking Music with Machine Learning

Exploration of the Social News TV: The Communication Behavior of #ajnewsgrid

What is NewsGrid? NewsGrid is a young news program broadcast globally by Al Jazeera since 2016. It is Al Jazeera’s first interactive news hour. The show is produced in three parts, top stories of the day presented by one presenter, stories create huge social reaction on Twitter presented by a social media presenter, and the … Read moreExploration of the Social News TV: The Communication Behavior of #ajnewsgrid

Hey, Who Moved the Goalposts?

Part of 10 reasons why Software Development projects fail series The most successful software development projects have a timeline and a series of milestones to accomplish that project within a set period of time. Those milestones are critical, because they help to divide a large project into a series of much smaller projects, and they … Read moreHey, Who Moved the Goalposts?

Binary Tree: The Diameter.

Dynamic programming sequences sub-problems together, having each sub-problem lead to the solution. With dynamic programming we no longer have to visits paths we’ve been down before, instead we can prune the shorter branches and track the diameter at each step. With the dynamic approach the algorithm travels down the tree and counts lengths on the … Read moreBinary Tree: The Diameter.

Revisiting Adam Smith’s Invisible Hand in the Data Economy

Fundamental paradigms of the free market should also be scrutinized by data science An unobservable market force that helps the demand and supply of goods in a free market to reach equilibrium automatically and efficiently is what we call the invisible hand. But I am a data scientist, I don’t deal in unobservable forces, no observations … Read moreRevisiting Adam Smith’s Invisible Hand in the Data Economy

Set Theory — Cardinality & Power Sets

With basic notation & operations cleared in articles one & two in this series, we’ve now built a fundamental understanding of Set Theory. This third article further compounds this knowledge by zoning in on the most important property of any given set: the total number of unique elements it contains. Also known as the cardinality, … Read moreSet Theory — Cardinality & Power Sets

Fast, static D3 maps built with Turf.js and the command-line

Combining Mike Bostock’s command-line cartography tutorial with the flexibility of Node.js Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center Recently, I needed to build a handful of U.S. state bubble maps to be embedded in a story for San Antonio Express-News. I wanted to use D3 but was concerned about slow asset … Read moreFast, static D3 maps built with Turf.js and the command-line

Introduction to Unsupervised Learning

Understand principal component analysis (PCA) and clustering methods Photo by Oscar Keys on Unsplash Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding … Read moreIntroduction to Unsupervised Learning

The Basics of Cryptography

With Applications in R Source Have you ever wondered how companies securely store your passwords? Or how your credit card information is kept private when making online purchases? The answer is cryptography. The vast majority of internet sites now use some form of cryptography to ensure the privacy of its users. Even information such as emails … Read moreThe Basics of Cryptography

Assessing NHL award winners using K-means

Data sets The final data-set used is a combination of traditional and advanced player metrics. Traditional statistics concern metrics like goals and assists (total being known as points), plus-minus, penalty minutes and time on ice, whilst advanced player metrics deal more with player behavior and puck possession. Using Python’s beautifulsoup library, I scraped more traditional … Read moreAssessing NHL award winners using K-means

Reinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

In this article I want to share my project on implementing reinforcement learning and deep reinforcement learning methods on a Tic Tac Toe game. The article contains: 1. Rigorous definition of the game as a Markov decision process. 2. How to implement the reinforcement learning method, called TD(0), to create an agent that plays the … Read moreReinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

Software 2.0 — Playing with Neural Networks (Part 1)

In this article we are going to discuss about neural networks (from scratch), the innovative concept, which has taken the world by storm. I will assume that the reader is already familiar with the following concepts: Cost function (MSE and Cross Entropy) Gradient Descent Logistic regression Activation Function Binary Classification Particularly, this article will try … Read moreSoftware 2.0 — Playing with Neural Networks (Part 1)

Sentiment Analysis with Word Bags and Word Sequences

For generic text, word bag approaches are very efficient at text classification. For a binary text classification task studied here, LSTM working with word sequences is on par in quality with SVM using tf-idf vectors. But performance is a different matter… The bag-of-words approach to turning documents into numerical vectors ignores the sequence of words … Read moreSentiment Analysis with Word Bags and Word Sequences

Visualising Machine Learning Datasets with Google’s FACETS.

Data Although you can work with data provided on the demo page, I shall be working with another set of data. I will be doing EDA with FACETS on the Load Prediction Dataset. The problem statement is to predict whether an applicant who has been granted a loan by a company, will repay it back … Read moreVisualising Machine Learning Datasets with Google’s FACETS.

The Science Behind AlphaStar

How DeepMind Uses Reinforcement Learning to Beat Human Pros in StarCraft II Long term strategic planning has long been considered a unique quality of the human mind that would be very difficult to imitate by artificial intelligence(AI) agents. Conceptually, strategic thinking involves evaluating a large number of data points in the present in order to … Read moreThe Science Behind AlphaStar

Main benefits of using a Chatbot for your business

I’m going to tell you about the future — messenger chatbots. What is Messenger Chatbot and why it is crucial for your business? Basically, It’s a digital assistant, most of the times based on AI that has the direction of various commands which looks like a natural sound conversation with your customers. Now you have the opportunity … Read moreMain benefits of using a Chatbot for your business

Uncertainty estimation for Neural Network — Dropout as Bayesian Approximation

Uncertainty Estimation One of the key distinction about Bayesian is that parameters are distributions instead of fixed weights. Error = Model Uncertainty + Model misspecification + inherent noise The Bayesian neural network decomposes uncertainty into model uncertainty, model misspecification, and inherent noise. MCDropout MCDropout One of the key here in Bayesian is that everything is … Read moreUncertainty estimation for Neural Network — Dropout as Bayesian Approximation

Machine Learning Techniques applied to Stock Price Prediction

Image generated using Neural Style Transfer. Machine learning has many applications, one of which is to forecast time series. One of the most interesting (or perhaps most profitable) time series to predict are, arguably, stock prices. Recently I read a blog post applying machine learning techniques to stock price prediction. You can read it here. … Read moreMachine Learning Techniques applied to Stock Price Prediction

Sliding Puzzle – Solving Search Problem with Iterative Deepening A*

Now as we are more familiar with the game, let’s solve it! Search Algorithms Let’s begin our Graph Traversal journey with visualizing and setting our problem. “A goal properly set is halfway reached.” Zig Ziglar Problem Given a board state, find a combination of moves that leads to the final state. Graph Representation Now that we … Read moreSliding Puzzle – Solving Search Problem with Iterative Deepening A*

What follows AlphaStar for Academic AI Researchers?

DeepMind continues making progress, but the path forward for AI researchers in academia is unclear. Ten years ago I challenged AI researchers across the globe to build a professional-level bot for StarCraft 1. The Brood War API was recently released, and for the first time academics and professionals could test out AI systems on a highly-competitive … Read moreWhat follows AlphaStar for Academic AI Researchers?

Using AI For Good

How to Help Developing Countries with Artificial Intelligence CE KanBlockedUnblockFollowFollowing Jan 27 Recently, I have come across quite a few articles stating how artificial intelligence may threaten the developing world by eliminating the need for repetitive, labor-intensive manufacturing roles. Automation of factories can potentially lead to higher unemployment rates in poorer nations, thereby disrupting local … Read moreUsing AI For Good

Hierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

Photo by sabina fratila on Unsplash In the first part of this series, we explored the basics of using a Bayesian-based machine learning model framework, PyMC3, to construct a simple Linear Regression model on Ford GoBike data. In this example problem, we aimed to forecast the number of riders that would use the bike share tomorrow … Read moreHierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

Handling imbalanced datasets in machine learning

Reworking the problem is better Up to now the conclusion is pretty disappointing: if the dataset is representative of the true data, if we can’t get any additional feature and if we target a classifier with the best possible accuracy, then a “naive behaviour” (answering always the same class) is not necessarily a problem and should … Read moreHandling imbalanced datasets in machine learning

Interactive Controls for Jupyter Notebooks

How to use interactive IPython widgets to enhance data exploration and analysis There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. Despite knowing this, I still find myself repeatedly executing cells just to make the slightest change, for example, choosing … Read moreInteractive Controls for Jupyter Notebooks

Understanding Entity Embeddings and It’s Application

As of late I’ve been reading a lot on entity embeddings after being tasked to work on a forecasting problem. The task at hand was to predict the salary of a given job title, given the historical job ads data that we have in our data warehouse. Naturally, I just had to seek out how … Read moreUnderstanding Entity Embeddings and It’s Application

Mario vs. Wario — round 2: CNNs in PyTorch and Google Colab

Since quite some time I was getting round to playing with Google Colab (yes, free access to GPU…). I think this is a really awesome initiative, which enables people with no GPU on their personal computers to play around with Deep Learning and train model they would not be able to train otherwise. Basically we … Read moreMario vs. Wario — round 2: CNNs in PyTorch and Google Colab