Building and Testing Recommender Systems With Surprise, Step-By-Step

Learn how to build your own recommendation engine with the help of Python and Surprise Library, Collaborative Filtering Recommender systems are one of the most common used and easily understandable applications of data science. Lots of work has been done on this topic, the interest and demand in this area remains very high because of … Read more Building and Testing Recommender Systems With Surprise, Step-By-Step

Metropolitan Trends Analysis for Home Improvement Spending in 2015 and Projection for 2017

Analysis in Python Pandas and Graphs in Plotly & Seaborn The Data Set: from Joint Center for Housing Studies at Harvard. It a broader data and I used a subset of it in my previous post published in Towards Data Science. Key Features of Interest in This Data Set: 25 US metropolitan areas Median income: 2015 … Read more Metropolitan Trends Analysis for Home Improvement Spending in 2015 and Projection for 2017

A quick introduction to derivatives for machine learning people

Dec 25, 2018 Introduction If you’re like me you probably have used derivatives for a huge part of your life and learned a few rules on how they work and behave without actually understanding where it all comes from. As kids we learn some of these rules early on like the power rule for example … Read more A quick introduction to derivatives for machine learning people

Mythbusting Fantasy Premier League: Form over fixtures

Using football data and machine learning to test if form is more important than fixtures in predicting clean sheets, goals, and assists Dec 25, 2018 Form vs fixtures discussions on Reddit Over 6 million compete in the Fantasy Premier League (FPL) trying to assemble the best 11 player squad to score the highest over 38 weeks of … Read more Mythbusting Fantasy Premier League: Form over fixtures

Image Processing Class (EGBE443) #5 — Edge and Contour

Edge Operator The basic principle of many edge operators is from the first derivative function. They only differ in the way of the component in the filter are combined. Prewitt and Sobel Operation These methods use linear filter extend over 3 adjacent lines and columns. For the Prewitt operator, the filter H along x and … Read more Image Processing Class (EGBE443) #5 — Edge and Contour

Q-LocalSearch

A Q-learning based algorithm for feature selection Dec 25, 2018 “This time I won’t make any silly jokes or references”, that’s what SHE said! In today’s article, I’m gonna try to explain to you what I’ve been doing for the last week, and what I talked about in my previous article, and any comment or … Read more Q-LocalSearch

The Hottest Topics In Machine Learning

Introduction The NIPS conference (Neural Information Processing Systems) is one of the most prestigious yearly events in the machine learning community. At each NIPS conference, a large number of research papers are published. Over 50,000 PDF files were automatically downloaded and processed to obtain a dataset on various machine learning techniques. These NIPS papers are … Read more The Hottest Topics In Machine Learning

Introduction to Data Preprocessing in Machine Learning

Handling Categorical Variables — Handling categorical variables is another integral aspect of Machine Learning. Categorical variables are basically the variables that are discrete and not continuous. Ex — color of an item is a discrete variables where as its price is a continuous variable. Categorical variables are further divided into 2 types — Ordinal categorical variables — These variables … Read more Introduction to Data Preprocessing in Machine Learning

Web Scraping Apartment Listings in Stockholm

Me and by partner have sold our apartment and are in search of a new apartment and since the majority of the people searching for a new apartment manually go through https://www.hemnet.se/. This, to me, seems to tedious and exhausting, so I thought — why not use my Python knowledge and bottomless crave for these types of … Read more Web Scraping Apartment Listings in Stockholm

Use AWS Glue and/or Databricks’ Spark-xml to process XML data

Dataset : http://opensource.adobe.com/Spry/data/donuts.xml Code&Snippets : https://github.com/elifinspace/GlueETL/tree/article-2 0. Upload dataset to S3: Download the file from the given link and go to S3 service on AWS console. Create a bucket with “aws-glue-” prefix(I am leaving settings default for now) Click on the bucket name and click on Upload:(this is the easiest way to do this, you can also setup … Read more Use AWS Glue and/or Databricks’ Spark-xml to process XML data

I Worked With A Data Scientist, Here’s What I Learned.

Background In late 2017, I started to develop interest in the Machine Learning field. I talked about my experience when I started my journey. In summary, it has been filled with fun challenges and lots of learning. I am an Android Engineer, and this is my experience working on ML projects with our data scientist. … Read more I Worked With A Data Scientist, Here’s What I Learned.

Predictive and Prescriptive Maintenance for Manufacturing Industry with Machine Learning

Monitoring machine health and optimizing manufacturing yield with machine learning Authors: Partha Deka and Rohit Mittal Today’s trend of Artificial Intelligence (AI) and the increased level of Automation in manufacturing allow firms to flexibly connect assets and improve productivity through data-driven insights that has not been possible before. As more automation is used in manufacturing, … Read more Predictive and Prescriptive Maintenance for Manufacturing Industry with Machine Learning

Using Object Detection for Complex Image Classification Scenarios Part 2:

The Custom Vision Service TLDR; This series is based on the work detecting complex policies in the following real life code story. Code for the series can be found here. Part 2: The Custom Vision Service In the last post of the series, we outlined the challenge of a complex image classification task in this post we … Read more Using Object Detection for Complex Image Classification Scenarios Part 2:

Introducing Wav2latter++

How Facebook Implements Speech Recognition Systems Completely Based on Convolutional Neural Networks Speech recognition systems has been one of the most developed areas in the deep learning ecosystem. The current generation of speech recognition models rely mostly on recurrent neural networks(RNNs) for acoustic and language modeling and on computationally-expensive artifacts such as feature extraction pipelines for … Read more Introducing Wav2latter++

The anguish of unintended consequences

We celebrate the promise of artificial intelligence as we suffer the unintended consequences of social media. These states are linked across time. In the formative days, we spoke of social media with the same radiant enthusiasm that now shines on artificial intelligence. With youthful optimism, we just needed to bring everyone together. The focus was overwhelmingly … Read more The anguish of unintended consequences

Simply Explained Logistic Regression with Example in R

I am assuming that the reader is familiar with Liner regression model and its functionality. Here I have tried to explain logistic regression with as easy explanation as it was possible for me. When I was trying to understand the logistic regression myself, I wasn’t getting any comprehensive answers for it, but after doing thorough … Read more Simply Explained Logistic Regression with Example in R

Build Hand Gesture Recognition from Scratch using Neural Network — Machine Learning Easy and Fun

From Self Captured images to Learning the Neural Network Model Introduction So now let’s start building the Hand Gesture Recognition Neural Network from the bottom. This algorithm should be working with all the different kinds of skin color, just make sure to position your hand in the middle. For this solution I used the GNU … Read more Build Hand Gesture Recognition from Scratch using Neural Network — Machine Learning Easy and Fun

Python Plotting API: Expose your scientific python plots through a flask API

In my daily work as a data scientist, I often have the need to integrate relatively complex plots into back-office applications. These plots are mainly used to illustrate algorithmic decisions and give data intuitions to operational departments. A possible approach here would be to build an API that returns data and let the front-end of … Read more Python Plotting API: Expose your scientific python plots through a flask API

Discretisation Using Decision Trees

1. Introduction Discretisation is the process of transforming continuous variables into discrete variables by creating a set of contiguous intervals that span the range of variable values. 1.1 Discretisation helps handle outliers and highly skewed variables Discretisation helps handle outliers by placing these values into the lower or higher intervals together with the remaining inlier … Read more Discretisation Using Decision Trees

Atari – Solving Games with AI (Part 2: Neuroevolution)

Selection We are selecting the top performers of the population (10%) based on their fitness scores. Only the selected chromosomes will be allowed to procreate and breed a new generation. For each chromosome, we are performing a gameplay and storing a final score which will be used for an evaluation. In order to perform a … Read more Atari – Solving Games with AI (Part 2: Neuroevolution)

Creating a G-Force Analysis Tool for Flat Ride Animations in Planet Coaster

Dec 23, 2018 The term ‘flat rides’ describes every ride in a theme park that isn’t a roller coaster and includes large attractions like the Ferris wheel and other large, hydraulic monsters that wobble humans around for fun. It was important that we got these rides right when we made Planet Coaster, not just the … Read more Creating a G-Force Analysis Tool for Flat Ride Animations in Planet Coaster

A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

2. Categorical Analysis We can start reading the data using pd.read_csv() . By doing a .head() on the data frame, we could have a quick peek at the top 5 rows of our data. For those who are not familiar with pandas or the concept of a data frame, I would highly recommend spending half a day … Read more A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

All you need to know about Regularization

Causes of overfitting and how regularization improves it Alice : Hey Bob!!! I have been training my model for 10 hrs but my model is yielding very bad accuracy although it performs exceptionally well on training data what’s the issue ? Bob : Oh !! It seems your model is overfitting on training data, Did you use regularization ? Alice : What’s … Read more All you need to know about Regularization

Understand Text Summarization and create your own summarizer in python

How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural … Read more Understand Text Summarization and create your own summarizer in python

Project Diversita: An Intelligent Edge Computing Device

1. Definition of the Product Original Input from Microsoft In the kick-off meeting on the Microsoft Campus, the project team from the Microsoft side had the following requirements as the input: Consumer-facing: The final product should be for consumers, able to disrupt the camera trap market; Mobility: The final product should be easy to carry; Connectivity: … Read more Project Diversita: An Intelligent Edge Computing Device

Probability Part 2: Conditional Probability

This is the second in a series of blogposts which I am writing about probability. In this post I introduce the fundamental concept of conditional probability, which allows us to include additional information into our probability calculations. The ideas behind conditional probability lead naturally to the most important idea in probability theory, known as Bayes … Read more Probability Part 2: Conditional Probability

Simulating Tennis Matches with Python or Moneyball for Tennis

Control flow: The code above has a section which runs all the code. We consider default values for most of the important parameters such as player name, ps1 and ps2, and bigpoint1 and bigpoint2. I liked to think of ps1 and ps2 as first serve percentage but we can do a lot of interesting feature … Read more Simulating Tennis Matches with Python or Moneyball for Tennis

Word2Vec For Phrases — Learning Embeddings For More Than One Word

Learning Phrases From Unsupervised Text (Collocation Extraction) We can easily create bi-grams with our unsupervised corpus and take it as an input to Word2Vec. For example, the sentence “I walked today to the park” will be converted to “I_walked walked_today today_to to_the the_park” and each bi-gram will be treated as a uni-gram in the Word2Vec … Read more Word2Vec For Phrases — Learning Embeddings For More Than One Word

Boston Airbnb Analysis

There is more to Boston than chowdah and Marky Mark: The definitive guide to Airbnb pricing. Heat Map of Boston airbnb properties Sep 16-Sep 17 Since 2009, Airbnb has been letting people into strangers’ homes all over the world. Boston is no exception as thousands of properties are listed for rent on Airbnb. The listings include photographs, … Read more Boston Airbnb Analysis

Machine Learning — Multiclass Classification with Imbalanced Data-set

Challenges in classification and techniques to improve performance source [Unsplash] Classification problems having multiple classes with imbalanced dataset present a different challenge than a binary classification problem. The skewed distribution makes many conventional machine learning algorithms less effective, especially in predicting minority class examples. In order to do so, let us first understand the problem … Read more Machine Learning — Multiclass Classification with Imbalanced Data-set

Learning to generate videos with uncertain futures

TL;DR: This post provides a high-level overview of the video generation model described in Stochastic Video Generation with a Learned Prior, which is capable of generating video sequences with multiple futures. Video Generation as Self-Supervised Learning task Supervised deep learning models have proven to yield groundbreaking results in the recent past on hard tasks like real-time … Read more Learning to generate videos with uncertain futures

Mapping Physical Activity with R, Selenium and Leaflet

We all know that exercise is one of the most important factors in our mental and physical health. And with the new year fast approaching, an emphatic declaration to Work Out More! is sure to top many resolution lists. But figuring out how to actually accomplish this can be difficult. While January is the most … Read more Mapping Physical Activity with R, Selenium and Leaflet

Lumiere London 2018 (Part 3): Computer Vision

Part 3: Analysing 5,000 Flickr images using computer vision Introduction In this final blog post of the series, I apply computer vision techniques to understand 5,000 Flickr images about the Lumiere London 2018, a huge light festival which took place in London earlier in January this year. During Lumiere London 2018, more than 50 public artworks … Read more Lumiere London 2018 (Part 3): Computer Vision

Explainable AI vs Explaining AI — Part 1

Despite the recent remarkable results of deep learning (DL), there is always a risk that it produces delusional and unrealistic results due to several reasons such as under-fitting, over-fitting or incomplete training data. For example the famous Move 78 of the professional Go player Lee Sedol which caused a delusional behavioral of Alpha Go, adversarial … Read more Explainable AI vs Explaining AI — Part 1

Understanding AI and ML for Mobile app development

Last time I published this blog where I explained about one application of AI and ML — ‘Vision’ and also explained briefly about using ML kit in mobile development which is a cloud platform offered by Google to integrate ML features in Android and iOS apps. This article is prequel to that one and in this I … Read more Understanding AI and ML for Mobile app development

SD-WAN Link Switch as Reinforcement Learning experiment with Deep Q-Learning

Credit — https://usblogs.pwc.com/emerging-technology/deep-learning-ai/ ‘Deep Q’ or Deep Q-Learning is a well-known algorithm in reinforcement learning which approximates Q Value of an MDP system with deep neural network. In this article I have explored the same algorithm in solving the link switch problem in SD-WAN network for which I already have developed an AI-gym based on Mininet (see … Read more SD-WAN Link Switch as Reinforcement Learning experiment with Deep Q-Learning

All birds are black

A simple way to think about bias-variance trade-off Photo by Hannes Wolf on Unsplash I’ve come across multiple approaches and philosophies for building models to represent real world relationships. My statistics professor was relentless in emphasising Occam’s Razor and parsimony. Social scientists are obsessed with finding causal relationships in models, often through experiments. Don’t go there, Simba! … Read more All birds are black

Forecasting with Prophet

How to make high quality forecasts The origin of Prophet When we think of forecasting we often think of weather forecasts, but it is also used by many organizations in supply chain management, sales and economics. Forecasts are used to guide policymakers and play an important role in shaping business decisions (e.g. Federal Reserve adjusting interest … Read more Forecasting with Prophet

30 Data Science Punchlines

A holiday reading list condensed into 30 quotes For those who like brainfood on your vacation, here’s a handy index of all my articles from 2018 boiled down to 30 (occasionally cheeky) punchlines to help you avoid/cause awkward silences at family events and holiday parties. Sections: Data Science and Analytics, ML/AI Concepts, How Not To Fail … Read more 30 Data Science Punchlines

Why Machine Learning is the BEST field in the world

A few years ago, when I was a junior software engineer, I worked on a problem with one of our algorithm developers. I thought that I found the breaking point: there was an algorithm that did something wrong. I asked the developer why the algorithm did what it did, and the answer I got was: … Read more Why Machine Learning is the BEST field in the world

The Mathematics Behind Principal Component Analysis

Introduction The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), … Read more The Mathematics Behind Principal Component Analysis

Investigating the variance of the world chess championship final

A statistical analysis on Carlsen’s final match approach 2018 World Chess Championship logo showing 5 overlapping arms above chessboard holding or moving chess pieces. While the FIDE World Chess Championship was going on, I found the following problem: Say one of the players is better than his opponent to the degree that he wins 20 percent of … Read more Investigating the variance of the world chess championship final

Which hypothesis test to perform?

Overview of the various hypothesis tests with an example of a one-sample t-test The objective of statistics is to make inferences about a population based on information contained in a sample. The numerical measures used to characterize populations are called parameters. The population parameters are : μ: mean M: median σ: standard deviation π: Proportion Most … Read more Which hypothesis test to perform?

20 Years of Data, 10 Conclusions

I got my first “real” job in 1999, working for a power company in Copenhagen (shout out to Lars) creating electricity pricing reports in Excel. Since then I’ve worked for small companies, start-ups and large companies across a range of industries. I’ve worked with passionate founders as well as hired guns and I’ve sat at … Read more 20 Years of Data, 10 Conclusions

Myth-busting about Data Science with Simon Greiner

Cesar Viteri on www.unsplash.com In her initial blog, Anita Lakhotia asked the question: What does a Data Scientist do all day? This is the most frequent question she gets asked by non-data-scientists. Thinking about data scientists she has personally come across during meet-ups, hackathons or blogs, it is difficult to give just one answer. Therefore, … Read more Myth-busting about Data Science with Simon Greiner