Soft Skills Will Make or Break You as a Data Scientist

As businesses gather an increasing amount of data related to various aspects of their organisation (e.g. internal business operations, customer purchases and behaviour), the demand for data-savvy employees has exploded over the last 5 years. Business leaders have woken up to the fact that data-driven decision-making can lead to making better decisions (it is not … Read more

The most important idea in statistics

What comes to mind when you think of the discipline of statistics? Populations, samples, and hypotheses? Or perhaps you took a course that emphasized probabilities, distributions, p-values, and confidence intervals? All of these are pieces of the puzzle, but they’re downstream from the core. The real start of everything — the springboard that launches the whole tangle — is … Read more

Use Pyspark with a Jupyter Notebook in an AWS EMR cluster

Jupyter Notebook is an incredible tool for learning and troubleshooting code. Here is a blog to show how to take advantage of this powerful tool as you learn Spark! Spark is helpful if you’re doing anything computationally intense which can be parallelized. Check out this Quora question for more information. This blog will be about … Read more

Machine Learning — Perfection always starts with mistakes

Mistake Types The most common types of mistakes in the ML pipeline relate to one or more of the following areas: Data Preparation — Data Cleansing It’s axiomatic to say that ‘dirty data’ is one of the biggest barriers Data Scientists face, and Data Cleansing is the most time consuming part of a ML project taking 60% … Read more

Capsule Networks: The New Deep Learning Network

A guide and introduction to understanding and using Capsule Networks. Convolutional Networks have been hugely successful in the field of deep learning and they are the primary reason why deep learning is so popular right now! They seem to work well enough, but they have drawbacks in their basic architecture, causing them to not work … Read more

Making computers understand the sentiment of tweets

Understanding whether a tweet is meant as positive or negative is something humans rarely have problems with. For computers, however, it is an entirely different story — complicated sentence structure, sarcasm, figurative language etc. make it difficult for computers to judge the meaning and sentiment of a sentence. However, automatically assessing the sentiment of a tweet would … Read more

5 Ways Artificial Intelligence and Chatbots Are Changing Education

Artificial Intelligence (AI) and Chatbots are changing the world in more ways we can ever imagine. Completing a diversified range of tasks, AI and Chatbots have become a normal element in our everyday life. The technology has played an important role in the development of varied fields including education and online tutoring. Through artificial intelligence, … Read more

Review: G-RMI — Winner in 2016 COCO Detection (Object Detection)

A Guide to Select a Detection Architecture: Faster R-CNN, R-FCN and SSD This time, G-RMI, Google Research and Machine Intelligence, who won the 1st place in 2016 MS COCO detection challenge is reviewed. G-RMI is the team name attending the challenge. It is not a name for a proposed approach. Because they do not have any … Read more

Brexit: The Uncivil War showed us how the EU Referendum was won with Data Science

Ash NewBlockedUnblockFollowFollowing Jan 11 Something that wasn’t discussed as much as it should have after the release of Channel 4 / HBO’s Brexit The Uncivil War last week was how the extensive use of data science was the main reason that the Leave campaign won against all the odds. In this article, I discuss how … Read more

Data Cleaning , Detection and Imputation of Missing Values in R Markdown — Part 2

Data cleaning and transforming variables in R using Australian Tennis Open data on the Men’s tour from 2000 to 2018. Photo by The Creative Exchange on Unsplash Today is Day 4 and marks the continuation of my #29dayproject bootcamp of data science and things I have learnt in data science. This is a tutorial guide of … Read more

Entropy: How Decision Trees Make Decisions

The simple logic and math behind a very effective machine learning algorithm SamBlockedUnblockFollowFollowing Jan 10 You’re a Data Scientist in training. You’ve come a long way from writing your first line of Python or R code. You know your way around Scikit-Learn like the back of your hand. You spend more time on Kaggle than … Read more

Trust Region and Proximal Policy Optimization

Photo from Deepmind Welcome to another journey towards unraveling the secrets behind Reinforcement Learning. This time, we going to take a step back and return to policy optimization in order to introduce two new methods: trust region policy optimization (TRPO) and proximal policy optimization (PPO). Remember that in policy gradients techniques, we try to optimize a … Read more

A Non-Technical Reading List for Data Science

The main idea of How Not to be Wrong is similar as Ellenburg takes us through stories showing both the use and misuse of statistical concepts like linear regression, inference, Bayesian inference, and probability. Applying the laws of probability show us that playing the lottery is always a losing proposition, — except in the rare cases where … Read more

Optimizing Neural Networks — Where to Start?

Setting up the Environment We’ll use Google Colab for this project, so most of the libraries are already installed. Since we’ll train neural networks, it’s important to use GPU to speed up training. To enable GPU, just go to “Runtime” in the dropdown menu and select “Change runtime type”. You can then verify by hovering … Read more

HyperNEAT: Powerful, Indirect Neural Network Evolution

HyperNEAT: Powerful, Indirect Neural Network Evolution Expanding NeuroEvolution Last week, I wrote an article about NEAT (NeuroEvolution of Augmenting Topologies) and we discussed a lot of the cool things that surrounded the algorithm. We also briefly touched upon how this older algorithm might even impact how we approach network building today, alluding to the fact … Read more

Stock Market Prediction by Recurrent Neural Network on LSTM Model

The art of forecasting stock prices has been a difficult task for many of the researchers and analysts. In fact, investors are highly interested in the research area of stock price prediction. For a good and successful investment, many investors are keen on knowing the future situation of the stock market. Good and effective prediction … Read more

Cold Start Energy Predictions

About three months ago, I participated in “Power Laws: Cold Start Energy Forecasting”, a competition organized by Schneider Electric on DrivenData platform. The aim was to predict the electricity consumption of several buildings based on the previous consumption and additional factors such as temperatures, holidays information, etc. An interesting part of the challenge was a … Read more

Managing geographical data: ISO3166, UN/LOCODE and GeoNames

Recently I had the need to define a data model for handling geographical data at the international level, — i.e., how to properly manage data about postal addresses when your potential address is any place around the globe — , here are the challenges and the outcome of the alternatives I found when dealing with incorporating geographical data into … Read more

Whisked Away by BigQuery ML

Photo credit: Phil Goerdt Predicting Whiskey Preferences in GCP “I’m looking for… …anything that is peaty, smoky like a campfire on the beach, medium to high alcohol, dried fruits, citrus, and a has subtle hint of terra firma.” It’s easy for me to walk into my local liquor store and talk tasting notes (and joke) with the whiskey … Read more

Practical NumPy — Understanding Python library through its functions

Before embarking on the journey of data science and machine learning, it is very important to learn a few python libraries which are ubiquitous in the world of data science like Numpy, Pandas and Matplotlib. Numpy is one such powerful library for array processing along with a large collection of high-level mathematical functions to operate … Read more

Logical Positivism and the Scientific Method in Genetic Algorithmics

Logical Positivism & The Scientific Method in Genetic Algorithmics The genetic algorithm owes its form to biomimicry, not derivation from first principles. So, unlike the workings of conventional optimization algorithms, which are typically apparent from the underlying mathematical derivations, the workings of the genetic algorithm require elucidation. Attempts to explain how genetic algorithms work can … Read more

Using the latest advancements in deep learning to predict stock price movements

2. The Data We need to understand what affects whether GS’s stock price will move up or down. It is what people as a whole think. Hence, we need to incorporate as much information (depicting the stock from different aspects and angles) as possible. (We will use daily data — 1,585 days to train the various algorithms (70% … Read more

Having Fun with TextBlob

A Python library for processing textual data, NLP framework, sentiment analysis As an NLP library for Python, TextBlob has been around for a while, after hearing many good things about it such as part-of-speech tagging and sentiment analysis, I decided to give it a try, therefore, this is the first time I am using TextBlob … Read more

Learn Enough Docker to be Useful

Part 1: The Conceptual Landscape Containers are hugely helpful for improving security, reproducibility, and scalability in software development and data science. Their rise is one of the most important trends in technology today. Docker is a platform to develop, deploy, and run applications inside containers. Docker is essentially synonymous with containerization. If you’re a current … Read more

Viral Fashion, Networks & Statistical Physics

Simulating why Germans gave up ties with an interacting agent approach “All models are wrong, but some are useful” — George Box When German Chancellor Angela Merkel met with Dieter Zetsche, the CEO of German carmaker Daimler and Mercedes-Benz to lay the foundation stone of a new factory in May 2017, some in the german business world were not … Read more

EdTech & Algorithmic Transparency

The recent news surrounding the investigation of Florida high-school student Kamilah Campbell’s SAT score, which was flagged by The College Board for possible cheating, offers an interesting perspective into issues surrounding algorithmic transparency in the growing EdTech sector. While it’s likely that only a portion of the process used to flag the test was automated, … Read more

Machine Learning Security — A Growing Societal Problem

A Growing Societal Problem This article was originally published at Please go there to subscribe. As more and more systems leverage ML models in their decision-making processes, it will become increasingly important to consider how malicious actors might exploit these models, and how to design defenses against those attacks. The purpose of this post is … Read more

Qrash Course: Deep Q Networks from the Ground Up in 10 Minutes

This article assumes no prior knowledge in Reinforcement Learning, but it does assume some basic understanding of neural networks. Out of all the different types of Machine Learning fields, the one fascinating me the most is Reinforcement Learning. For those who are less familiar with it — while Supervised Learning deals with predicting values or classes based … Read more

An Introduction to R— Merging and filtering data— Part 1

Data understanding by filtering and merging the 2019 Australian Tennis Open data for the Men’s tour. Photo by Christopher Burns on Unsplash You know it’s summer when the Australian Tennis Open visits Melbourne and everyone is excited that Roger and Serena are in town. Problem I am interested to predict who might win the 2019 Australian … Read more

Will my Customer Come Back : Playing with CLV

I’ve been tinkering with customer lifetime value modeling the past few days since the Olist dataset in Kaggle went up. In particular, I wanted to explore the tried and tested probabilistic models, BG/NBD and GammaGamma to forecast future purchases and profits. I also wanted to see if the machine learning approach could do well — simply predicting … Read more

Data Science in International Development. Part I: Working with Text

Part I: Working with Text Co-written by Kelsey Barton-Henry. Today, headlines are filled with claims about the power of Artificial Intelligence (AI) to do things only humans could do before. Recognizing objects in images, responding to voice queries, or interpreting complex text instances, to mention a few. But how do AI applications work? What are the … Read more

The Golden AI Glacier: Rethinking Roger’s Bell Curve for Healthcare

“One reason why there is so much interest in the diffusion of innovations is because getting a new idea adopted, even when it has obvious advantages, is often very difficult,” said Everett Rogers, ostensibly the pioneer on the topic, in introduction to the 3rd edition of his seminal work, Diffusions of Innovation, published in 1983 … Read more

Animating the Traveling Salesman Problem

Lessons Learned From Animating Models Animation can be a powerful tool. It is one thing to explain a complex topic in words or even in pictures, but visuals in motion have an amazing quality to bring abstract ideas to life. This can be especially helpful in complex areas of computer science like optimization and machine … Read more

Predicting Russian Trolls Using Reddit Comments

Using Machine Learning to Predict Russian Trolls Code for those Interested Introduction Reddit Logo. Source: Russia has long maintained a contentious relationship with countries in the west. Vladimir Putin, the Russian President, has long been known as a Russian nationalist who will do anything to advance the interests of his country (Marten 2018). This has … Read more

ONNX.js: Universal Deep Learning Models in The Browser

An Introduction to The Universal Open Standard Deep Learning Format and Using It In The Browser (ONNX/ONNX.js) Photo by Franck V. on Unsplash Running deep learning models on the client-end browser is not something new. Early 2018, Google released TensorFlow.js. It is an open-source library that is used to define, train, and run machine learning (ML) … Read more

Soft Actor-Critic Demystified

An intuitive explanation of the theory and a PyTorch implementation guide Soft Actor-Critic, the new Reinforcement Learning Algorithm from the folks at Open AI and UC Berkley has been making a lot of noise recently. The algorithm not only boasts of being more sample efficient than traditional RL algorithms but also promises to be robust … Read more

A Complete View of Decision Trees and SVM in Machine Learning

Tree-based Methods Tree-based methods have been favorite techniques in many industries with proven successful cases for prediction. These methods are considered non-parametric, making no assumption on the distribution of data and the structure of the true model. They require less data cleaning and are not influenced by outliers and multicollinearity to some fair extent. The … Read more

Deep learning from a programmer’s perspective (aka Differentiable Programming)

Or why neural networks are not-so-neural anymore The main lesson from 2018: deep learning is “cool”. One of the main reasons is that the basic problems faced by DL are general enough to be of interest to an insane amount of disciplines, from computer vision to neural machine translation to voice interfaces. More importantly, DL … Read more

Are Our Thoughts Really Dot Products?

The Mathematician, Philosopher, and Number Religion leader Pythagoras How AI Research Revived Pythagoreanism and Confused Science with Philosophy Recently, I wrote an article about how deep learning might be hitting its limitations and posed the possibility of another AI winter. I closed that article with a question about whether AI’s limitations are defined just as … Read more

How to A/B test without spending a dime

Get statistically significant results without paying for a testing platform Lisa XuBlockedUnblockFollowFollowing Jan 8 A/B testing is an integral part of the product development process and used by everyone from growth marketers to designers. However, not everyone has a proper A/B testing platform. Maybe you can’t afford a system that can cost up to $100,000/yr or … Read more

Implementing a Profitable Promotional Strategy for Starbucks with Machine Learning (Part 1)

V. Data Preprocessing: Generating Monthly Data In order to transform the datasets into something useful, we will have to perform substantial amount of data cleaning and pre-processing. At the end of this section, we will generate a dataset that looks like this: Snapshot of Monthly Data After Data Preprocessing The primary task will be to identify … Read more

Sound UX: Sound Representation of Machine Learning Estimation on Image and Temperature Data by…

The purpose of this research is to display a captured image and a temperature data by sonification. Sonification is the technique of a transition from various data to sound and is used for accessibility, media art and interaction design. We proposed a system generate sound from minimum distances between moving objects and path prediction by … Read more

How to give an effective presentation at a Meetup or Conference

Essential skills to help you prepare for your first presentation at a Meetup or Conference. Photo by Teemu Paananen on Unsplash We learn about the technical side of data science and spend weeks learning to code and exploring linear regression, logistic regression, PCA, clustering, ridge regression, lasso, decision tree, random forest. However communication skills are also … Read more