Transfer Learning using ELMO Embedding

Last year, the major developments in “Natural Language Processing” were about Transfer Learning. Basically, Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task. Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of … Read more

Model-Free Prediction: Reinforcement Learning

Part 4: Model-Free Predictions with Monte-Carlo Learning, Temporal-Difference Learning and TD( λ) Previously, we looked at planning by dynamic programming to solve a known MDP. In this post, we will use model-free prediction to estimate the value function of an unknown MDP. i.e We will look at policy evaluation of an unknown MDP. This series of … Read more

Matplotlib Tutorial: Learn basics of Python’s powerful Plotting library

What is Matplotlib To make necessary statistical inferences, it becomes necessary to visualize your data and Matplotlib is one such solution for the Python users. It is a very powerful plotting library useful for those working with Python and NumPy. The most used module of Matplotib is Pyplot which provides an interface like MATLAB but … Read more

Introduction to TWO approaches of Content-based Recommendation System

A complete guide to resolve the confusion Content-based filtering is one of the common methods in building recommendation systems. While I tried to do some research in understanding the detail, it is interesting to see that there are 2 approaches that claim to be “Content-based”. Below I will share my findings and hope it can … Read more

Machine Learning and Particle Motion in Liquids: An Elegant Link

The gradient descent algorithm is one of the most popular optimization techniques in machine learning. It comes in three flavors: batch or “vanilla” gradient descent (GD), stochastic gradient descent (SGD), and mini-batch gradient descent which differ in the amount of data used to compute the gradient of the loss function at each iteration. The goal … Read more

Three steps for a successful machine learning project

Less technical considerations to make for all ML projects As people and companies venture into machine learning (ML), it is common for some to expect to dive right into building models and generating useful output. And while some parts of ML feel like this technical wizardry with magical predictions, there are other aspects that are less … Read more

Contextual Embeddings for NLP Sequence Labeling

Text representation (aka text embeddings) is a breakthrough of solving NLP tasks. At the beginning, single word vector represent a word even though carrying different meaning among context. For example, “Washington” can be a location, name or state. “University of Washington” Zalando released an amazing NLP library, flair, makes our life easier. It already implement … Read more

Espresso Filters: An Analysis

Data Analysis 1. Hole Diameter First, we will look at hole size per filter. Originally, hole size was calculated by determining the area of pixels above threshold per each hole and determining the diameter. However, the results did not show the detail I was looking for because it was based on whole pixels. This is a … Read more

Deep Learning with Satellite Data

“The rockets and the satellites, spaceships that we’re creating now,we’re pollinating the universe.” -Neil Young Overview— Satellite Data—Data Collection— Model — Results Overview While at the University of Sannio in Benevento, Italy this January, my friend Tuomas Oikarinen and I created a (semi-automated) pipeline for downloading publicly available images, and trained a 3-D Convolutional Neural Network on … Read more

Making Programming Easier with Keyboard Macros — Video

A recent video from Linus Tech Tips introduced how one of their editors uses macros for video editing. This got me thinking; can macros be easily created to improve my programming? This video demonstrates how creating code macros can be achieved and how useful it can be: Background Source: Linus Tech Tips — Can your Keyboard do … Read more

Unsupervised Feature Learning

Deep Convolutional Networks on Image tasks take in Image Matrices of the form (height x width x channels) and process them into low-dimensional features through a series of parametric functions. Supervised and Unsupervised Learning tasks both aim to learn a semantically meaningful representation of features from raw data. Training Deep Supervised Learning models requires a … Read more

Predicting Kickstarter Campaign Success with Gradient Boosted Decision Trees: A Machine Learning…

Fitting the models, evaluating performance, choosing a final model, and predicting on a new (totally real) campaign Another common thing in the data science workflow is trying out multiple models. There are ways to minimize the effort in this stage based on what you want to accomplish or what the dataset is/what the problem is (you … Read more

Best practices in Ads Search

Big Data is the process of collecting and analyzing large amounts of information. The complexity and large volume of data that our society currently generates has made it impossible to capture, manage, process or analyse with the technologies we know so far. Big Data embraces five features: volume (manages terabytes or petabytes of information), variety … Read more

Modeling cumulative impact — Part I

Create simple features of cumulative impact, predict sports performance with the fitness-fatigue model “Little by little, a little becomes a lot.” -Tanzanian proverb Welcome to Modeling cumulative impact, a series that views the cumulative impact of athletic training on sports performance through a variety of modeling lenses. The journey starts here in Part I with … Read more

WTF is image classification?

Conquering convolutional neural networks for the curious and confused Photo by Micheile Henderson on Unsplash “One thing that struck me early is that you don’t put into a photograph what’s going to come out. Or, vice versa, what comes out is not what you put in.” ― Diane Arbus A notification pops up on your favorite social … Read more

Review: DCN — Deformable Convolutional Networks, 2nd Runner Up in 2017 COCO Detection (Object…

With Deformable Convolution, Improved Faster R-CNN and R-FCN, Got 2nd Runner Up in COCO Detection & 3rd Runner Up in COCO Segmentation. After reviewed STN, this time, DCN (Deformable Convolutional Networks), by Microsoft Research Asia (MSRA), is reviewed. (a) Conventional Convolution, (b) Deformable Convolution, (c) Special Case of Deformable Convolution with Scaling, (d) Special Case … Read more

Statistics is the Grammar of Data Science — Part 3/5

Moments Moments describe various aspects of the nature and shape of our distribution. #1 — The first moment is the mean of the data, which describes the location of the distribution. #2 — The second moment is the variance, which describes the spread of the distribution. High values are more spread out than smaller values. #3 — The third moment is … Read more

Comparing Different Classification Machine Learning Models for an imbalanced dataset

A data set is called imbalanced if it contains many more samples from one class than from the rest of the classes. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class) while other classes make up the majority. In this scenario, … Read more

ML Algorithms: One SD (σ)- Instance-based Algorithms

An intro to machine learning instance-based algorithms TThe obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read more

The Data Driven Partier: Movie Mustache

The concept behind ‘Movie Mustache’ is simple, but revolutionary. Watch a movie with friends but with a mustache or two on the TV — whenever the mustache lines up with a character’s upper lip, everyone drinks. This game was foreign to me until a few weeks ago when I got to experience it watching the Adam Sandler … Read more

Comparing Python Virtual Environment tools

Thanks to Keith Smith, Alexander Mohr, Victor Kirillov and Alain SPAITE for recommending pew, venv and pipenv. I just love the community that we have on Medium. I recently published an article on using Virtual Environments for Python projects. The article was well received and the feedback from readers opened a new view for me. … Read more

Data Science and Agile

Suggested frameworks for effectiveness (Part 2 of 2) This is the second post in a 2-part sharing on Data Science and Agile. In the last post, we discussed about the aspects of Agile that work, and don’t work, in the data science process. You can find the previous post here. A quick recap of what works well … Read more

PyViz: Simplifying the Data Visualisation process in Python.

Exploring Data with PyViz In this section, we will see how different libraries are effective in bringing out different insights from data and their conjunction can really help to analyse data in a better way. Dataset The dataset being used pertains to the number of cases of measles and pertussis recorded per, 100,000 people over time … Read more

4 Machine Learning Techniques with Python

4 Machine Learning Techniques with Python Machine Learning Techniques vs Algorithms While this tutorial is dedicated to Machine Learning techniques with Python, we will move over to algorithms pretty soon. But before we can begin focussing on techniques and algorithms, let’s find out if they’re the same thing. A technique is a way of solving … Read more

Using Image Data to Determine Text Structure

Painting by Patrick Henry Bruce Dotting the i’s and following the lines In my previous article, I discussed how to implement fairly simple image processing techniques in order to detect blobs of text in an image. Realistically, that algorithm did little more than find high contrasting pixel regions in an image. Yet, the simple procedure still laid … Read more

How To Train Your Artificial Intelligence: The Hidden Code

How machine learning and Power Rangers will make us the dragon-riders of the Silicon Age Admittedly, artificial intelligence isn’t quite as cool as dragons. On the flip side, you’re far more likely to have encountered some form of artificial intelligence in your life than you have a dragon — or maybe you have, who knows? I don’t know your … Read more

Doing meaningful work with Machine Learning — Classify Disaster Messages

Build models to help disaster organizations save people’s lives. I’m writing this post at 1am in Bucharest, Romania. Hello there again! Welcome to my fourth piece of content about Machine Learning. I’ve recently done a project that I believe to be socially meaningful. I’ll give a brief overview what this is all about and I’ll dive … Read more

Live Object Detection

Object Detection As said above the example notebook can be reused for our new application. This is because the main part of the notebook is importing the needed libraries, downloading the model and specifying useful helper code. The only section we need to modify is the detection section, which comprises of the last three cells … Read more

Reinforcement Learning: From Grid World to Self-Driving Cars

0. Agents, Environments, and Rewards Underlying many of the major announcements from researchers in Artificial Intelligence in the last few years is a discipline known as reinforcement learning (RL). Recent breakthroughs are mostly driven by minor twists on on classic RL ideas, enabled by the availability of powerful computing hardware and software that leverages said hardware. … Read more

Machine Learning Versus The News

PART TWO: A SOLUTION So, what to do? Mathematically, we may be tempted to think that to know the truth in its unvarnished and untarnished essence, we must read every article that covers the events of the story. Somehow we would then average away all the noise and be left with a well-informed and unbiased view … Read more

Supervised Learning: Basics of Classification and Main Algorithms

Introduction As stated in the first article of this series, Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels (discrete, unoredered values, group membership) of new instances based on past observations. There are two main types of classification problems: Binary classification: The typical example is e-mail spam … Read more

What’s your soccer team’s nemesis?

Is Barcelona really Real Madrid’s toughest opponent? Historical data paint an interesting story. Image from unsplash.com Real Madrid vs Barcelona. Manchester United vs Liverpool. Inter vs Milan. Olympique Lyonnais vs Olympique de Marseille. Chelsea vs everybody. European soccer is filled with some amazing rivalries. These rivalries got created and evolved over time for reasons on … Read more

Keras challenges the Avengers

Sentiment Analysis, also called Opinion Mining, is a useful tool within natural language processing that allow us to identify, quantify, and study subjective information. Due to the fact that quintillion of bytes of data is produced every day, this technique gives us the possibility to extract attributes of this data such as negative or positive … Read more

Creation of Sentence Embeddings Based on Topical Word Representations

An approach towards universal language understanding I am researching on word and sentence embeddings for over a year now and recently wrote also my master’s thesis [1] in this area. The results which I am presenting now were also published here and resulted in cooperation with SAP and the University of Liechtenstein. In the following … Read more

What I Learned from Writing a Data Science Article Every Week for a Year

3. Consistency is the critical factor The 98 articles I published in 2018 totaled 264,894 words. For every word published, there was at least 1 word that didn’t make it through editing. This works out to about 530,000 words or 1,500 words per day. The only way this was possible studying and working full-time was to … Read more

AI Problems are Human Problems

I have no illusions about the nature of wide-scale problem solving throughout the course of history. Rarely are sweeping changes noticed, worked on, and introduced to the populous by genius technocrats. Instead, magical innovations are often the synthesis of seemingly disparate ideas; cultural shifts do not occur due to governmental policy, but rather due to … Read more

Interactive Data Visualization with Python Using Bokeh

Simple and basic go-through example Recently I came over this library, learned a little about it, tried it, of course, and decided to share my thoughts. From official website: “Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to … Read more

Reinforcement Learning with Hindsight Experience Replay

Sparse and Binary Rewards Reinforcement learning has gained a lot of popularity in recent years due some spectacular successes such as defeating the Go world champion and (very recently) winning matches against top professionals in the popular Real time strategy game StarCraft 2. One of the impressive aspects of achievements such as that of AlphaZero (the … Read more

How Big is Big Data?

We have entered the Age of the Data for good. Everything we do online and even offline leaves traces in data — from cookies to our social media profiles. So how much data there really is? How much data do we process on a daily basis? Welcome to the Zettabyte Era. IBM Summit supercomputer Data … Read more

Thinking Of Switching Careers To A Developer?

I Have The Answers. But How? I know what you’re wondering: how do I even have the answers? Well, I could say from experience but as an aspiring data scientist, to demonstrate how data science can make any decision making process easier and ensure you make the correct decision. I’ll be using data from the 2018 … Read more

Unmaking Graphs

This is how things usually go when I first create any graph: Imagine I just got my hands on a juicy new dataset and I’m doing some exploratory data analysis — hunched over the keyboard with a magnifying glass looking for correlations and analyzing clues. I decide to conjure up some graphs to visualize the data because … Read more

The Unsung Heroes of Modern Software Development

Open Source Foundation Leaders I’ll highlight six open source foundations that are key to many important projects. For each foundation I’ll give a brief bio, provide the number of projects being supported as of early 2019, and highlight some well-known projects. Note that these groups fall under various IRS classifications for charitable and trade organizations — not … Read more

Introducing the AI Project Canvas

AI Project Canvas Imagine the following scenario: You have a brilliant idea for a new AI project. To make it happen, you need to convince management to fund your idea. You need to pitch your AI project idea to stakeholders and management. Yuck. This is the first step where the AI Project Canvas comes into play. … Read more

The Grass Really is Greener on the Other Side: Buying Local and its Shortcomings

Evidence-Based Policy is Bigger than You or Your Feelings — Part II Just because your vegetables travel thousands of kilometers to your kitchen table doesn’t mean they can’t be better for the environment than produce from your local farmer’s market. There. I’ve said it. As unpopular opinions go, this one is somewhere between ‘pineapple on pizza’ and ‘healthcare … Read more

ML Algorithms: One SD (σ)

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of … Read more