Are you Bilingual? Be Fluent in R and Python!

If you ask me where to invest your time in R or Python, I will advise to be fluent in both. I cannot tell you which language — Mandarin, English, Hindustani, Spanish, Arabic, Malay, Russian, Greek, or Hindi is superior. Each language has its long history of development and merits — just like R and … Read moreAre you Bilingual? Be Fluent in R and Python!

Prototyping an anomaly detection system for videos, step by step using LSTM convolutional…

If we want to treat the problem as a binary classification problem, we need labeled data and in this case, collecting labeled data is hard because of the following reasons: Abnormal events are challenging to obtain due to their rarity. There is a massive variety of abnormal events, and manually detecting and labeling such events … Read morePrototyping an anomaly detection system for videos, step by step using LSTM convolutional…

The top 20 CO₂ polluters, visualized

“The Guardian” had the data and I had a free afternoon Polluting factory — Patrick Hendry, Unsplash “The Guardian” has recently published a list of just 20 companies, who are responsible for a third of all the global energy-related CO₂ emissions since 1965, a year when the political and industry leaders acknowledged that burning fossil … Read moreThe top 20 CO₂ polluters, visualized

The AI Box Experiment

What a Simple Experiment Can Teach Us About Superintelligence Imagine it’s 2040. After years of research and dedicated programming, you believe you have created the World’s first Artificial General Intelligence (AGI): an Artificial Intelligence (AI) that’s roughly as intelligent as humans are among all their intellectual domains. A superintelligence will find a way to get … Read moreThe AI Box Experiment

Selenium Tutorial: Scraping Glassdoor.com in 10 Minutes

I had to scrape jobs data from Glassdoor.com for a project. Let me tell you how I did it… What is Scraping? It’s a method for collecting information from web pages. Why Scraping? Other than the fact that it is fun, Glassdoor’s library provides a limited number of data points. It doesn’t allow you to … Read moreSelenium Tutorial: Scraping Glassdoor.com in 10 Minutes

On the Automation of Time Series Forecasting Models: Technical and Organizational Considerations.

This post is an elboration on a reply that I originally posted to a question on Cross-Validated (Stackoverflow’s sister site for statistics and data science related topics). The original question was: I would like to build an algorithm that would be able to analyze any time series and “automatically” choose the best traditional/statiscal forecasting method … Read moreOn the Automation of Time Series Forecasting Models: Technical and Organizational Considerations.

Generating MRI Images of Brain Tumors with GANs

The need for more data within the field of artificial intelligence is significant, especially in medical imaging. In order to produce ways in which we can speed up the process of diagnosing certain disorders through AI, we need many data sets with accurate imaging first, so that they can be fed into neural networks accordingly. … Read moreGenerating MRI Images of Brain Tumors with GANs

Doing and reporting your first mediation analysis in R

How to provide support for the mediation with statistical procedures We will provide statistical support for the mediation with the help of the mediation analysis in 4 simple steps. First, we will test the total effect. Here we are looking if any change in sepal length impacts the DV at all. More on this later. … Read moreDoing and reporting your first mediation analysis in R

Live Prediction of Traffic Accident Risks Using Machine Learning and Google Maps

Here, I describe the creation and deployment of an interactive traffic accident predictor using scikit-learn, Google Maps API, Dark Sky API, Flask and PythonAnywhere. Traffic accidents are extremely common. If you live in a sprawling metropolis like I do, chances are that you’ve heard about, witnessed, or even involved in one. Because of their frequency, … Read moreLive Prediction of Traffic Accident Risks Using Machine Learning and Google Maps

Understanding Fixup initialization

How to train residual networks without normalization layers. Why should we even care about initialization? Proper initialization of weight matrices is extremely important. According to Jeremy Howard, people for decades could not train neural networks because of improper initialization. In order to see it, we can reproduce one of the experiments from Jeremy’s lectures. Let’s … Read moreUnderstanding Fixup initialization

Guided Grad-CAM is Broken! Sanity Checks for Saliency Maps

Certain techniques for understanding what a CNN is looking at don’t work. They have no connection to the model’s weights or to the training data, and may be merely acting as edge detectors. In this post we will discuss the NeurIPS 2018 paper, “Sanity Checks for Saliency Maps” which demonstrates that several popular saliency map … Read moreGuided Grad-CAM is Broken! Sanity Checks for Saliency Maps

Predicting Taxi fares in NYC using Google Cloud AI Platform(Billion + rows) Part 1

Taxis | Photo by Francesco Ungaro on pexels.com This project aims at creating a Machine Learning model to estimate taxi fares in New York City using a dataset corresponding to taxi rides which is hosted in BigQuery. There are more than a Billion rows with a size of 130 GB. You can find it here. … Read morePredicting Taxi fares in NYC using Google Cloud AI Platform(Billion + rows) Part 1

Explain your machine learning with feature importance

Let’s imagine that you’ve landed a consulting gig with a bank who have asked you to identify those who have a high likelihood of default on the next month’s bill. Armed with the machine learning techniques that you’ve learnt and practiced, let’s say you proceed to analyze the data set given by your client and … Read moreExplain your machine learning with feature importance

Stocks and Bonds are Now Both Right — No Recession in Sight

Expect Stocks to Make New Highs as the Economy Firms Up Recession fears are everywhere. The US-China trade war, Middle East violence, impeachment in DC, protests in Hong Kong, and Brexit have all weighed on the economy. These fears are likely overblown. Consumer economic data remains robust and the Fed has cut rates twice as … Read moreStocks and Bonds are Now Both Right — No Recession in Sight

Benchmarking simple models with feature extraction against modern black-box methods

Comparison of Normalized Scores The results over the different datasets and algorithms have to be normalized in order to make them comparable to each other. For this purpose we process the data with the following three steps: 1. Remove unstable algorithms: Algorithms which did not converge and yielded results way below dummy performance were excluded. … Read moreBenchmarking simple models with feature extraction against modern black-box methods

Modeling News Coverage with Python. Part 3: Newspaper Coverage and Google Search Trends

Fitting models of Google Search to Search Trends and News Articles This post integrates data from a limited sample of newspaper coverage with Google Search trends to model interactions between the two. In these examples, the preliminary analysis finds news coverage useful for forecasting search trends but small and mixed results in the other direction. … Read moreModeling News Coverage with Python. Part 3: Newspaper Coverage and Google Search Trends

How to achieve adoption of Intelligent Automation with maximum ROI and speed?

Source: https://www.burwood.com/blog-archive/automation-vs-orchestration-whats-the-difference Automation will displace 200,000 workers from their jobs in the banking industry in the next ten years, Almost half of the wage-paying jobs all around the world could theoretically be at risk of automation using technologies already at hand. In the next 12 years, 1 out of 3 American workers are at risk … Read moreHow to achieve adoption of Intelligent Automation with maximum ROI and speed?

Models as Serverless Functions

Source: Wikimedia Chapter 3 of “Data Science in Production” I recently published Chapter 3 of my book-in-progress on leanpub. The goal with this chapter is to empower data scientists to leverage managed services to deploy models to production and own more of DevOps. Data Science in Production Building Scalable Model Pipelines with Python towardsdatascience.com Serverless … Read moreModels as Serverless Functions

Word Clouds Are Lame

Exploring the limitations of the word cloud as a data visualization. Author: Shelby Temple; Made with Tableau Word clouds have recently become a staple of data visualization. They are especially popular when analyzing text. According to Google Trends, it seems that the rise in popularity started around 2009 with search term interest currently just under … Read moreWord Clouds Are Lame

Five Tips for Contributing to Open Source Software

A data scientist’s perspective Photo by Yancy Min on Unsplash Contributing to Open-Source Software (OSS) can be a rewarding endeavor, especially for new data scientists. It helps improve skills, provides invaluable experience when collaborating on projects, and gives you a chance to showcase your code. However, many data scientists do not consider themselves to be … Read moreFive Tips for Contributing to Open Source Software

A Data Visualization Adventure

From raw data to the #1 spot on DataIsBeautiful Is data visualization art or science? Does the clarity from bar charts and line graphs always trump data viz that is unusual and/or beautiful? These are some polarizing questions in the data viz community. Some of you just screamed out loud “Science! Clarity!” While others would … Read moreA Data Visualization Adventure

Build Your First Computer Vision Project — Dog Breed Classification

Get started building your first computer vision project in less than 30 minutes. Photo by Joe Caione on Unsplash For us, humans, it is pretty easy to tell one dog breed from another. That is if you are talking about 10–20 popular dog breeds. When we are talking about more than 100 kinds of dogs, … Read moreBuild Your First Computer Vision Project — Dog Breed Classification

Data Science with SQL in Python

Python Application in SQL Ever hear about the database programming language, Sequel (SQL)? How can we use Python code to harness the power of SQL databases & be able to retrieve, manipulate & delete that information stored in the database, with Python? In this article, I plan on giving a thorough beginner’s tutorial on Sequel … Read moreData Science with SQL in Python

Malware Classification using Machine Learning

Takeaway from implementing the Microsoft Malware Classification Challenge (BIG) Image Source : Kaggle If you love to explore large and challenging data sets, then probably you should give Microsoft Malware Classification a try. Before diving deep in to the problem let’s take few points on what can you expect to learn from this: How to … Read moreMalware Classification using Machine Learning

[NLP] SpaCy Classifier with pre-train token2vec VS. One without pre-train

Photo by Nuno Silva on Unsplash We see quite satisfactory results from classifier without pre-train language model. However, let’s experiment and see how much it will further improve when we apply one. First, we need to implement spaCy pretrain on the documents and save the token2vec model. But before we start pre-training, we need to … Read more[NLP] SpaCy Classifier with pre-train token2vec VS. One without pre-train

First step towards Data Science: Journey to the Home for Data Science

Journey to the Home for Data Science Getting started with kaggle competitions Source: https://miro.medium.com/max/1053/1*gO6yZ3Z855MW26FuEiQjKw.png Kaggle is an AirBnB for Data Scientists — this is where they spend their nights and weekends. It’s a crowd-sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science, machine learning and … Read moreFirst step towards Data Science: Journey to the Home for Data Science

How to write Web apps using simple Python for Data Scientists?

In the start we said that each time we change any widget, the whole app runs from start to end. This is not feasible when we create apps that will serve deep learning models or complicated machine learning models. Streamlit covers us in this aspect by introducing Caching. 1. Caching In our simple app. We … Read moreHow to write Web apps using simple Python for Data Scientists?

Data Mangement Strategy: Part 2

Data Quality & Architecture This is part 2of the series of articles related to carrying out and implementing a successful Data Management Strategy within an aspiring Digital Organization. You can find the the introduction to this series here. In this article we will focus on the following topics: Data Quality Data Architecture Data Integration These … Read moreData Mangement Strategy: Part 2

My Learning Plan for Getting Into Data Science from Scratch

I started when I was in college and still continue up to this day! My decision to get into data science started way back when I was still in college in early 2015. I actually didn’t plan to become a data scientist originally, but a quant — someone who is essentially a financial analyst that … Read moreMy Learning Plan for Getting Into Data Science from Scratch

“This is CS50”: A Pleasant Way to Kick Off Your Data Science Education

What is CS50? It is the introductory course on computer science taught at Harvard University by Professor David J. Malan. It is the largest class at Harvard with 800 students, 102 staff, and a professional production team. It offers both an on-campus and an online course. I’ve taken the online one, but it’s already THE … Read more“This is CS50”: A Pleasant Way to Kick Off Your Data Science Education

Locate V-beat in Electrocardiogram (ECG)

using machine learning and image processing A health technology company gave me a challenge:Given a collection of ECG strip images, find the location of V-beat in each image. The ECG plot records a V-beat during a premature ventricular contraction in the heartbeat. This article explains what I did to train a machine learning model to … Read moreLocate V-beat in Electrocardiogram (ECG)

Discovering football anthems through data analysis

In one of my prior blogs around the FIFA World Cup, I wrote about how music and football are inseparable. Music is something that is part of football’s culture and vice versa. Music unites the armies behind clubs all over the world and enhances the atmosphere prior to (and during) games. Everyone probably has his … Read moreDiscovering football anthems through data analysis

Speech Recognition Analysis

Build a speech recognition model using Keras. From Siri to smart home devices, speech recognition is widely used in our lives. This speech recognition project is to utilize Kaggle speech recognition challenge dataset to create Keras model on top of Tensorflow and make predictions on the voice files. https://cosmosmagazine.com/technology/hey-siri-how-does-voice-recognition-work The link of the Kaggle speech … Read moreSpeech Recognition Analysis

Twitter — Or where my bot talks to your bot

Photo by Safar Safarov on Unsplash My activities on Twitter were mind-numbingly repetitive. From what Kirk was doing, it also didn’t exactly seem like he was reading everything that he was posting about. And whenever something is done over and over again, it’s typically a prime candidate for automation. I found tweepy, a Python library … Read moreTwitter — Or where my bot talks to your bot

A few points on the state of software engineering

In this short essay, I present a brief discussion on the challenges of contemporary software engineering practice, identifying some of the potential causes of its current state. In subsequent essays, I will present and discuss strategies mainstream paradigms employ to minimize the complexity of the software. Observations laid here are mostly drawn from my own … Read moreA few points on the state of software engineering

7 things to quickly improve your Data Analysis in Python

The ‘Magics’ of IPython are basically a series of enhancements that IPython has layered on-top of the standard Python syntax. Magic commands come in two flavors: line magics, which are denoted by a single % prefix and operate on a single line of input, and cell magics, which are denoted by a double %% prefix … Read more7 things to quickly improve your Data Analysis in Python

An Easier Way to Encode Categorical Features

Photo by Ash Edmonds on Unsplash Using the python category encoder library to handle high cardinality variables in machine learning I have recently been working on a machine learning project which had several categorical features. Many of these features were high cardinality, or in other words, had a high number of unique values. The simplest … Read moreAn Easier Way to Encode Categorical Features

Getting rich quick with machine learning and stock market predictions

If a human investor can be successful, why can’t a machine? Algorithmic trading has revolutionised the stock market and its surrounding industry. Over 70% of all trades happening in the US right now are being handled by bots[1]. Gone are the days of the packed stock exchange with suited people waving sheets of paper shouting … Read moreGetting rich quick with machine learning and stock market predictions

Data Preprocessing — Art or Science

Missing Value Imputation Data Preprocessing (Part 1) Most of the people say that the heart of any analytics model is Model building, but I would rather say its data preprocessing, not the Model building. One can only build the model, once he/she have processed data or clean data. All the available raw data is incomplete … Read moreData Preprocessing — Art or Science

Netflix: Quantity, Quality, and the paradox of choice

I have been rather disappointed with Netflix lately. After they delisted a few of my favourite shows (especially Doctor Who), and I struggled to find anything new and good to watch, I started wondering if the Netflix Overlords had made a strategic decision to offer cheaper, lower quality, but more variety of content to hopefully … Read moreNetflix: Quantity, Quality, and the paradox of choice