What I Learned from Writing a Data Science Article Every Week for a Year

3. Consistency is the critical factor The 98 articles I published in 2018 totaled 264,894 words. For every word published, there was at least 1 word that didn’t make it through editing. This works out to about 530,000 words or 1,500 words per day. The only way this was possible studying and working full-time was to … Read more

AI Problems are Human Problems

I have no illusions about the nature of wide-scale problem solving throughout the course of history. Rarely are sweeping changes noticed, worked on, and introduced to the populous by genius technocrats. Instead, magical innovations are often the synthesis of seemingly disparate ideas; cultural shifts do not occur due to governmental policy, but rather due to … Read more

Interactive Data Visualization with Python Using Bokeh

Simple and basic go-through example Recently I came over this library, learned a little about it, tried it, of course, and decided to share my thoughts. From official website: “Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to … Read more

Reinforcement Learning with Hindsight Experience Replay

Sparse and Binary Rewards Reinforcement learning has gained a lot of popularity in recent years due some spectacular successes such as defeating the Go world champion and (very recently) winning matches against top professionals in the popular Real time strategy game StarCraft 2. One of the impressive aspects of achievements such as that of AlphaZero (the … Read more

How Big is Big Data?

We have entered the Age of the Data for good. Everything we do online and even offline leaves traces in data — from cookies to our social media profiles. So how much data there really is? How much data do we process on a daily basis? Welcome to the Zettabyte Era. IBM Summit supercomputer Data … Read more

Thinking Of Switching Careers To A Developer?

I Have The Answers. But How? I know what you’re wondering: how do I even have the answers? Well, I could say from experience but as an aspiring data scientist, to demonstrate how data science can make any decision making process easier and ensure you make the correct decision. I’ll be using data from the 2018 … Read more

Unmaking Graphs

This is how things usually go when I first create any graph: Imagine I just got my hands on a juicy new dataset and I’m doing some exploratory data analysis — hunched over the keyboard with a magnifying glass looking for correlations and analyzing clues. I decide to conjure up some graphs to visualize the data because … Read more

The Unsung Heroes of Modern Software Development

Open Source Foundation Leaders I’ll highlight six open source foundations that are key to many important projects. For each foundation I’ll give a brief bio, provide the number of projects being supported as of early 2019, and highlight some well-known projects. Note that these groups fall under various IRS classifications for charitable and trade organizations — not … Read more

Introducing the AI Project Canvas

AI Project Canvas Imagine the following scenario: You have a brilliant idea for a new AI project. To make it happen, you need to convince management to fund your idea. You need to pitch your AI project idea to stakeholders and management. Yuck. This is the first step where the AI Project Canvas comes into play. … Read more

The Grass Really is Greener on the Other Side: Buying Local and its Shortcomings

Evidence-Based Policy is Bigger than You or Your Feelings — Part II Just because your vegetables travel thousands of kilometers to your kitchen table doesn’t mean they can’t be better for the environment than produce from your local farmer’s market. There. I’ve said it. As unpopular opinions go, this one is somewhere between ‘pineapple on pizza’ and ‘healthcare … Read more

ML Algorithms: One SD (σ)

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of … Read more

How to Pace the London Marathon: Fuelled by Data

Chris is a current MSc Computer Science student at the University of Warwick, UK. He is also the co-founder of Sustain Investing. Before that, Chris worked at Citi Ventures and at Citi Markets. It started with an excuse Hi, I’m Chris. While building Sustain with my cofounders Andre, Nick Foden and Sylwia Zieba I’ve been studying … Read more

The Blockchain Scalability Problem & the Race for Visa-Like Transaction Speed

Yes, blockchain has a scalability problem. Here’s what it is, and here’s what people are doing to solve it. The battle for a scalable solution is the blockchain’s moon race. Bitcoin processes 4.6 transactions per second. Visa does around 1,700 transactions per second on average (based on a calculation derived from the official claim of … Read more

Making Sense of Startup Valuations with Data Science

The following is a condensed and slightly modified version of a Radicle working paper on the startup economy in which we explore post-money valuations by venture capital stage classifications. We find that valuations have interesting distributional properties and then go on to describe a statistical model for estimating an undisclosed valuation with considerable ease. In … Read more

Value Investing with Machine Learning

Your favourite holding period doesn’t have to be forever… The Oracle of Omaha once said: “Price is what you pay, value is what you get.” Warren Buffet But how can you be certain that you are paying a fair price for an investment? How can you make the most of a fair or unfair situation? This … Read more

Introducing Snorkel

How this Tiny Project Solves One of the Major Problems in Real World Machine Learning Solutions Building high quality training datasets is one of the most difficult challenges of machine learning solutions in the real world. Disciplines like deep learning have helped us to build more accurate models but, to do so, they require vastly … Read more

Cross Validation — Why & How

So, you have been working on an imbalanced data set for a few days now and trying out different machine learning models, training them on a part of your data set, testing their accuracy and you are ecstatic to see the score going above 0.95 every-time. Do you really think you have achieved 95% accuracy … Read more

Using Tensorflow Serving GRPC

How to write a GRPC Client for the wrapped model Once you have your Tensorflow or Keras based model trained, one needs to think on how to use it in,deploy it in production. You may want to Dockerize it as a micro-service, implementing a custom GRPC (or REST- or not) interface. Then deploy this to server … Read more

A Dog Detector and Breed Classifier

In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms. In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still … Read more

Build a Pipeline for Harvesting Medium Top Author Data

Nuts and Bolts One key requirement was to make deployment of my Luigi workflow very simple. I wanted to assume only one thing about the deployment environment; that the Docker daemon would be available. With Docker, I wouldn’t need to be concerned with Python version mismatches or other environmental discrepancies. It took me a little while … Read more

Power BI

Using Power BI and R Tutorial here: Run R scripts in Power BI Desktop The only twist that I want to add is an idea on how to enable users without admin access to run R code. This can be achieved by storing a portable r installation on a mountable file storage. R Download the … Read more

Pix2Pix

Shocking result of Edges-to-Photo Image-to-Image translation using the Pix2Pix GAN Algorithm This article will explain the fundamental mechanisms of a popular paper on Image-to-Image translation with Conditional GANs, Pix2Pix, following is a link to the paper: Article Outline I. Introduction II. Dual Objective Function with Adversarial and L1 Loss III. U-Net Generator IV. PatchGAN Discriminator … Read more

Probability — Fundamentals of Machine Learning (Part 1)

The Mathematics of Probability In the beginning, I suggested that probability theory is a mathematical framework. As with any mathematical framework there is some vocabulary and important axioms needed to fully leverage the theory as a tool for machine learning. Probability is all about the possibility of various outcomes. The set of all possible outcomes … Read more

Why visual literacy is essential to good data visualization

We know data literacy matters. But visual literacy matters too. Here’s why. Photo by Markus Spiske on Unsplash Data is all around us, and the way people work has changed because of it. Companies are now investing more in roles like Chief Data Officer, building their data science teams and talking about things like “data literacy” in … Read more

My journey applying AI to horse racing

My journey into machine learning began in the summer of 2016. It all started at a barbecue party at the home of my fiancé’s aunt and uncle’s in northern Stockholm. I was sitting outside at a garden table together with the older men of her family. These are old and tough Finish men, her granddad … Read more

A.I enhanced molecular discovery and optimization

Awesome! But how do we get there? Researchers at the forefront of their fields have been trying to use the existing tools we have on hand to solve this problem. There is a pattern in the modus operandi of current research, and the same general process applies to any A.I based science project. Researchers are carpenters, … Read more

Everything you need to know about Scatter Plots for Data Visualisation

If you’re a Data Scientist there’s no doubt that you’ve worked with scatter plots before. Despite their simplicity, scatter plots are a powerful tool for visualising data. There’s a lot of options, flexibility, and representational power that comes with the simple change of a few parameters like color, size, shape, and regression plotting. Here you’ll … Read more

Making Music with Machine Learning

Image from https://www.maxpixel.net/Circle-Structure-Music-Points-Clef-Pattern-Heart-1790837 Music is not just an art, music is an expression of the human condition. When an artist is making a song you can often hear the emotions, experiences, and energy they have in that moment. Music connects people all over the world and is shared across cultures. So there is no way … Read more

Exploration of the Social News TV: The Communication Behavior of #ajnewsgrid

What is NewsGrid? NewsGrid is a young news program broadcast globally by Al Jazeera since 2016. It is Al Jazeera’s first interactive news hour. The show is produced in three parts, top stories of the day presented by one presenter, stories create huge social reaction on Twitter presented by a social media presenter, and the … Read more

Hey, Who Moved the Goalposts?

Part of 10 reasons why Software Development projects fail series The most successful software development projects have a timeline and a series of milestones to accomplish that project within a set period of time. Those milestones are critical, because they help to divide a large project into a series of much smaller projects, and they … Read more

Binary Tree: The Diameter.

Dynamic programming sequences sub-problems together, having each sub-problem lead to the solution. With dynamic programming we no longer have to visits paths we’ve been down before, instead we can prune the shorter branches and track the diameter at each step. With the dynamic approach the algorithm travels down the tree and counts lengths on the … Read more

Revisiting Adam Smith’s Invisible Hand in the Data Economy

Fundamental paradigms of the free market should also be scrutinized by data science An unobservable market force that helps the demand and supply of goods in a free market to reach equilibrium automatically and efficiently is what we call the invisible hand. But I am a data scientist, I don’t deal in unobservable forces, no observations … Read more

Set Theory — Cardinality & Power Sets

With basic notation & operations cleared in articles one & two in this series, we’ve now built a fundamental understanding of Set Theory. This third article further compounds this knowledge by zoning in on the most important property of any given set: the total number of unique elements it contains. Also known as the cardinality, … Read more

Fast, static D3 maps built with Turf.js and the command-line

Combining Mike Bostock’s command-line cartography tutorial with the flexibility of Node.js Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center Recently, I needed to build a handful of U.S. state bubble maps to be embedded in a story for San Antonio Express-News. I wanted to use D3 but was concerned about slow asset … Read more

Introduction to Unsupervised Learning

Understand principal component analysis (PCA) and clustering methods Photo by Oscar Keys on Unsplash Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding … Read more

The Basics of Cryptography

With Applications in R Source Have you ever wondered how companies securely store your passwords? Or how your credit card information is kept private when making online purchases? The answer is cryptography. The vast majority of internet sites now use some form of cryptography to ensure the privacy of its users. Even information such as emails … Read more

Assessing NHL award winners using K-means

Data sets The final data-set used is a combination of traditional and advanced player metrics. Traditional statistics concern metrics like goals and assists (total being known as points), plus-minus, penalty minutes and time on ice, whilst advanced player metrics deal more with player behavior and puck possession. Using Python’s beautifulsoup library, I scraped more traditional … Read more

Reinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

In this article I want to share my project on implementing reinforcement learning and deep reinforcement learning methods on a Tic Tac Toe game. The article contains: 1. Rigorous definition of the game as a Markov decision process. 2. How to implement the reinforcement learning method, called TD(0), to create an agent that plays the … Read more

On Canonical Companies

In Information Technology and tech startups, we talk about “systems of record”. It’s a system that is the source for a particular data that may also exist in other systems. That “system or record” is the ultimate source of truth. It’s the canonical record of data. We often talk about these systems as the place … Read more

H2O for Inexperienced Users

Some background: I am a senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer … Read more

Software 2.0 — Playing with Neural Networks (Part 1)

In this article we are going to discuss about neural networks (from scratch), the innovative concept, which has taken the world by storm. I will assume that the reader is already familiar with the following concepts: Cost function (MSE and Cross Entropy) Gradient Descent Logistic regression Activation Function Binary Classification Particularly, this article will try … Read more

Sentiment Analysis with Word Bags and Word Sequences

For generic text, word bag approaches are very efficient at text classification. For a binary text classification task studied here, LSTM working with word sequences is on par in quality with SVM using tf-idf vectors. But performance is a different matter… The bag-of-words approach to turning documents into numerical vectors ignores the sequence of words … Read more

Data Visualization in Music

Last fall I went to an Edward Tufte lecture where he began with a very effective, very sweet video of music visualized in a sequence: Tufte knows this is a great and charming start to a lecture. He knows it provides a welcome change from the outside world; an elegant fusion of music, color, and … Read more