Atari – Solving Games with AI (Part 2: Neuroevolution)

Selection We are selecting the top performers of the population (10%) based on their fitness scores. Only the selected chromosomes will be allowed to procreate and breed a new generation. For each chromosome, we are performing a gameplay and storing a final score which will be used for an evaluation. In order to perform a … Read moreAtari – Solving Games with AI (Part 2: Neuroevolution)

Creating a G-Force Analysis Tool for Flat Ride Animations in Planet Coaster

Dec 23, 2018 The term ‘flat rides’ describes every ride in a theme park that isn’t a roller coaster and includes large attractions like the Ferris wheel and other large, hydraulic monsters that wobble humans around for fun. It was important that we got these rides right when we made Planet Coaster, not just the … Read moreCreating a G-Force Analysis Tool for Flat Ride Animations in Planet Coaster

A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

2. Categorical Analysis We can start reading the data using pd.read_csv() . By doing a .head() on the data frame, we could have a quick peek at the top 5 rows of our data. For those who are not familiar with pandas or the concept of a data frame, I would highly recommend spending half a day … Read moreA Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

All you need to know about Regularization

Causes of overfitting and how regularization improves it Alice : Hey Bob!!! I have been training my model for 10 hrs but my model is yielding very bad accuracy although it performs exceptionally well on training data what’s the issue ? Bob : Oh !! It seems your model is overfitting on training data, Did you use regularization ? Alice : What’s … Read moreAll you need to know about Regularization

Understand Text Summarization and create your own summarizer in python

How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural … Read moreUnderstand Text Summarization and create your own summarizer in python

Project Diversita: An Intelligent Edge Computing Device

1. Definition of the Product Original Input from Microsoft In the kick-off meeting on the Microsoft Campus, the project team from the Microsoft side had the following requirements as the input: Consumer-facing: The final product should be for consumers, able to disrupt the camera trap market; Mobility: The final product should be easy to carry; Connectivity: … Read moreProject Diversita: An Intelligent Edge Computing Device

Probability Part 2: Conditional Probability

This is the second in a series of blogposts which I am writing about probability. In this post I introduce the fundamental concept of conditional probability, which allows us to include additional information into our probability calculations. The ideas behind conditional probability lead naturally to the most important idea in probability theory, known as Bayes … Read moreProbability Part 2: Conditional Probability

Simulating Tennis Matches with Python or Moneyball for Tennis

Control flow: The code above has a section which runs all the code. We consider default values for most of the important parameters such as player name, ps1 and ps2, and bigpoint1 and bigpoint2. I liked to think of ps1 and ps2 as first serve percentage but we can do a lot of interesting feature … Read moreSimulating Tennis Matches with Python or Moneyball for Tennis

Word2Vec For Phrases — Learning Embeddings For More Than One Word

Learning Phrases From Unsupervised Text (Collocation Extraction) We can easily create bi-grams with our unsupervised corpus and take it as an input to Word2Vec. For example, the sentence “I walked today to the park” will be converted to “I_walked walked_today today_to to_the the_park” and each bi-gram will be treated as a uni-gram in the Word2Vec … Read moreWord2Vec For Phrases — Learning Embeddings For More Than One Word

Boston Airbnb Analysis

There is more to Boston than chowdah and Marky Mark: The definitive guide to Airbnb pricing. Heat Map of Boston airbnb properties Sep 16-Sep 17 Since 2009, Airbnb has been letting people into strangers’ homes all over the world. Boston is no exception as thousands of properties are listed for rent on Airbnb. The listings include photographs, … Read moreBoston Airbnb Analysis

Machine Learning — Multiclass Classification with Imbalanced Data-set

Challenges in classification and techniques to improve performance source [Unsplash] Classification problems having multiple classes with imbalanced dataset present a different challenge than a binary classification problem. The skewed distribution makes many conventional machine learning algorithms less effective, especially in predicting minority class examples. In order to do so, let us first understand the problem … Read moreMachine Learning — Multiclass Classification with Imbalanced Data-set

Learning to generate videos with uncertain futures

TL;DR: This post provides a high-level overview of the video generation model described in Stochastic Video Generation with a Learned Prior, which is capable of generating video sequences with multiple futures. Video Generation as Self-Supervised Learning task Supervised deep learning models have proven to yield groundbreaking results in the recent past on hard tasks like real-time … Read moreLearning to generate videos with uncertain futures

Mapping Physical Activity with R, Selenium and Leaflet

We all know that exercise is one of the most important factors in our mental and physical health. And with the new year fast approaching, an emphatic declaration to Work Out More! is sure to top many resolution lists. But figuring out how to actually accomplish this can be difficult. While January is the most … Read moreMapping Physical Activity with R, Selenium and Leaflet

Lumiere London 2018 (Part 3): Computer Vision

Part 3: Analysing 5,000 Flickr images using computer vision Introduction In this final blog post of the series, I apply computer vision techniques to understand 5,000 Flickr images about the Lumiere London 2018, a huge light festival which took place in London earlier in January this year. During Lumiere London 2018, more than 50 public artworks … Read moreLumiere London 2018 (Part 3): Computer Vision

Explainable AI vs Explaining AI — Part 1

Despite the recent remarkable results of deep learning (DL), there is always a risk that it produces delusional and unrealistic results due to several reasons such as under-fitting, over-fitting or incomplete training data. For example the famous Move 78 of the professional Go player Lee Sedol which caused a delusional behavioral of Alpha Go, adversarial … Read moreExplainable AI vs Explaining AI — Part 1

Understanding AI and ML for Mobile app development

Last time I published this blog where I explained about one application of AI and ML — ‘Vision’ and also explained briefly about using ML kit in mobile development which is a cloud platform offered by Google to integrate ML features in Android and iOS apps. This article is prequel to that one and in this I … Read moreUnderstanding AI and ML for Mobile app development

SD-WAN Link Switch as Reinforcement Learning experiment with Deep Q-Learning

Credit — https://usblogs.pwc.com/emerging-technology/deep-learning-ai/ ‘Deep Q’ or Deep Q-Learning is a well-known algorithm in reinforcement learning which approximates Q Value of an MDP system with deep neural network. In this article I have explored the same algorithm in solving the link switch problem in SD-WAN network for which I already have developed an AI-gym based on Mininet (see … Read moreSD-WAN Link Switch as Reinforcement Learning experiment with Deep Q-Learning

All birds are black

A simple way to think about bias-variance trade-off Photo by Hannes Wolf on Unsplash I’ve come across multiple approaches and philosophies for building models to represent real world relationships. My statistics professor was relentless in emphasising Occam’s Razor and parsimony. Social scientists are obsessed with finding causal relationships in models, often through experiments. Don’t go there, Simba! … Read moreAll birds are black

Forecasting with Prophet

How to make high quality forecasts The origin of Prophet When we think of forecasting we often think of weather forecasts, but it is also used by many organizations in supply chain management, sales and economics. Forecasts are used to guide policymakers and play an important role in shaping business decisions (e.g. Federal Reserve adjusting interest … Read moreForecasting with Prophet

30 Data Science Punchlines

A holiday reading list condensed into 30 quotes For those who like brainfood on your vacation, here’s a handy index of all my articles from 2018 boiled down to 30 (occasionally cheeky) punchlines to help you avoid/cause awkward silences at family events and holiday parties. Sections: Data Science and Analytics, ML/AI Concepts, How Not To Fail … Read more30 Data Science Punchlines

Why Machine Learning is the BEST field in the world

A few years ago, when I was a junior software engineer, I worked on a problem with one of our algorithm developers. I thought that I found the breaking point: there was an algorithm that did something wrong. I asked the developer why the algorithm did what it did, and the answer I got was: … Read moreWhy Machine Learning is the BEST field in the world

The Mathematics Behind Principal Component Analysis

Introduction The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), … Read moreThe Mathematics Behind Principal Component Analysis

Investigating the variance of the world chess championship final

A statistical analysis on Carlsen’s final match approach 2018 World Chess Championship logo showing 5 overlapping arms above chessboard holding or moving chess pieces. While the FIDE World Chess Championship was going on, I found the following problem: Say one of the players is better than his opponent to the degree that he wins 20 percent of … Read moreInvestigating the variance of the world chess championship final

Which hypothesis test to perform?

Overview of the various hypothesis tests with an example of a one-sample t-test The objective of statistics is to make inferences about a population based on information contained in a sample. The numerical measures used to characterize populations are called parameters. The population parameters are : μ: mean M: median σ: standard deviation π: Proportion Most … Read moreWhich hypothesis test to perform?

20 Years of Data, 10 Conclusions

I got my first “real” job in 1999, working for a power company in Copenhagen (shout out to Lars) creating electricity pricing reports in Excel. Since then I’ve worked for small companies, start-ups and large companies across a range of industries. I’ve worked with passionate founders as well as hired guns and I’ve sat at … Read more20 Years of Data, 10 Conclusions

Myth-busting about Data Science with Simon Greiner

Cesar Viteri on www.unsplash.com In her initial blog, Anita Lakhotia asked the question: What does a Data Scientist do all day? This is the most frequent question she gets asked by non-data-scientists. Thinking about data scientists she has personally come across during meet-ups, hackathons or blogs, it is difficult to give just one answer. Therefore, … Read moreMyth-busting about Data Science with Simon Greiner

You can’t just Google everything

And other things I wish I knew before I started my latest Data Science project Photo by Chris Ried on Unsplash I got started on a new data science project a few weeks ago and expectedly, it’s been h̶o̶r̶r̶i̶f̶i̶c̶ absolutely enlightening. I’ve made a few tentative explorations into data science and machine learning projects before. But I … Read moreYou can’t just Google everything

Multi-Class Classification in Text using R

This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require a multi-class classification and quite a bit … Read moreMulti-Class Classification in Text using R

AI for Business

This is going to be a short, to-the-point article. First I’ll talk about the problems with AI right now, then the problems with understanding and applying AI in business scenarios, finally a first glance on the solutions I’m thinking. Part I. The AI world I hate this image 🙂 Since the beginning of humanity as a society, … Read moreAI for Business

Shrinkage Estimators: Shrinking statistical estimates

In this article series on how to optimize portfolios, we have looked at the existence of market invariants, estimating distribution of returns using nonparametric and maximum likelihood. Now we discuss a method of estimating the probability distribution using shrinkage estimators. For those interested in optimizing portfolios, look at OptimalPortfolio. I must agree, the name shrinkage … Read moreShrinkage Estimators: Shrinking statistical estimates

Implementation of Uni-Variate Linear Regression in Python using Gradient Descent Optimization from…

Learn, Code and Tune…. Regression is an example of Continuous Classification of Data or data-points in feature-space. Francis Galton invented the usage of Regression Line in 1886 [1]. As the name suggests, “Linear”, this means that the hypothesis regarding the Machine Learning Algorithm is linear in nature or simply a linear equation. Yeah!! it’s a … Read moreImplementation of Uni-Variate Linear Regression in Python using Gradient Descent Optimization from…

Review: DeepMask (Instance Segmentation)

1. Model Architecture Model Architecture (Top), Positive Samples (Green, Left Bottom), Negative Samples (Red, Right Bottom) Left Bottom: Positive Samples A label yk=1 is given for k-th positive sample. To be a positive sample, two criteria need to be satisfied: The patch contains an object roughly centered in the input patch. The object is fully contained in … Read moreReview: DeepMask (Instance Segmentation)

A different kind of (deep) learning: part 2

Self Supervised learning: generative approaches Intro In the previous post, we’ve discussed some self supervised learning articles, along with some attempts to strive towards the “holy grail”: exploiting the almost unlimited number of un-annotated images available wherever to generalize for other tasks. And hopefully, get closer to the currently unmet benchmark of ImageNet pre-training. Surprisingly, … Read moreA different kind of (deep) learning: part 2

Logistic Regression For Facial Recognition

Facial recognition algorithms have always fascinated me, and wanting to flex my newfound logistic regression skills on some data, I created a model based on a dataset I found called “Skin Segmentation.” As noted in its description, the data in Skin Segmentation were collected “by randomly sampling B,G,R values from face images of various age … Read moreLogistic Regression For Facial Recognition

Planet Beehive

Exploring our Activities Now that we have created our Scoring Measure, we can start exploring the data. Let’s start by comparing global regions’ activity rating with the average number of reviews per activity: Fig. 8 Scatter plot generated with Seaborn in Python Now this is getting interesting! Some very particular facts here that might just go … Read morePlanet Beehive

Machine Learning and Music Classification: A Content-Based Filtering Approach

Using the Librosa Python Library, KNN, and Random Forest to Classify Music In my previous blog post, Introduction to Music Recommendation and Machine Learning, I discussed the two methods for music recommender systems, Content-Based Filtering and Collaborative Filtering. The collaborative filtering approach involved recommending music based on user listening history, while the content-based approach used an … Read moreMachine Learning and Music Classification: A Content-Based Filtering Approach

Get Smarter with Data Science — Tackling Real Enterprise Challenges

Introduction The ‘Data Science Strategic Guide — Get Smarter with Data Science’ is envisioned as a series of articles, which serve to be more of a strategic guide depicting essential challenges, pitfalls and principles to keep in mind when implementing and executing data science projects in the real-world. We also focus on how you can get maximum … Read moreGet Smarter with Data Science — Tackling Real Enterprise Challenges

Measuring pedestrian accessibility

Walkable neighborhoods are great for health, happiness and economic growth. Cities around the world that want to draw a talented young workforce increasingly focused on creating a good pedestrian experience. How could we measure and map walkability using data science tools? This blog suggests an approach drawing on Pandana, an excellent Python library developed by … Read moreMeasuring pedestrian accessibility

Classifying Skin Lesions with Convolutional Neural Networks

Imagine this. You wake up and find a frightening mark on your skin so you go to the doctor’s office to get it checked up. They say it’s fine so you go home and don’t worry about it for a couple months, but then you have a throbbing pain from that spot — it looks ugly and … Read moreClassifying Skin Lesions with Convolutional Neural Networks

Building sentence embeddings via quick thoughts

Introduction to Quick Thoughts In previous story, I shared skip-thoughts to compute a sentence embeddings. Today, we have another unsupervised learning approach to compute sentence embeddings which is Quick Thoughts. Logeswaran et al. (2018) introduced quick-thoughts approach to retrieve sentence embeddings for downstream application. After reading this article, you will understand: Quick-Thoughts Design Evaluation Experiments Reference … Read moreBuilding sentence embeddings via quick thoughts

Quality inspection in manufacturing using deep learning based computer vision

Improving yield by removing bad quality material with image recognition Author: Partha Deka and Rohit Mittal Automation in Industrial manufacturing: Today’s increased level of automation in manufacturing also demands automation of material quality inspection with little human intervention. The trend is to reach human level accuracy or more in quality inspection with automation. To stay … Read moreQuality inspection in manufacturing using deep learning based computer vision

Do you need a graduate degree for data science?

Maybe so. Maybe not. I’ll level with you: I’m a PhD dropout. I’ve gotten a lot of mileage out of that title, by the way: it hints that I’ve done a lot of grad school, but still maintains the aura of badassery that only the word “dropout” can provide. In some ways, it’s the ultimate humble … Read moreDo you need a graduate degree for data science?