Detecting stock market crashes with topological data analysis

Evidently this simple method is rather noisy. With many points labelled as a crash, following this advice will result in over-panicking and selling your assets too soon. Let’s see if TDA can help us reduce the noise in the signal and obtain a more robust detector! The mathematics underlying TDA is deep and won’t be … Read moreDetecting stock market crashes with topological data analysis

How to Build Slim Docker Images Fast

Do you remember those days, when you wrote awesome software but you couldn’t install it on someone else’s machine or it crashed there? Though this is never a nice experience, we could always say Nowadays, that’s not an excuse any more due to containerization. Very briefly, with containerization, you pack your application and all necessary … Read moreHow to Build Slim Docker Images Fast

Markov Chain Analysis and Simulation using Python

Simulating from a Markov Chain One can simulate from a Markov chain by noting that the collection of moves from any given state (the corresponding row in the probability matrix) form a multinomial distribution. One can thus simulate from a Markov Chain by simulating from a multinomial distribution. One way to simulate from a multinomial … Read moreMarkov Chain Analysis and Simulation using Python

Google Colab: Jupyter Lab on steroids (perfect for Deep Learning)

I’m working on a Deep Learning project that I need to deliver in a couple of weeks. Unfortunately, my poor MacBook is having a hard time to process all my project data and the complex models I’m generating on Jupyter Lab and that was delaying my whole project. Google Colaboratory, better known as Google Colab, … Read moreGoogle Colab: Jupyter Lab on steroids (perfect for Deep Learning)

Are you ready to lead a data science project?

“Looking at a data science project from the perspective of any software development project” What is the problem that is compelling you to solve using data science? The power in data and the mechanisms to harness this power is now available to us. Identifying the right problem or use case is the first step. There … Read moreAre you ready to lead a data science project?

Machine Learning — Don’t Just Rely on Your University

To illustrate the difference between formal learning and online courses, let’s look at a few common machine learning concepts and see how it is explained in my university versus how it is approached by Kirill’s course. The following list is a snapshot of what I find challenging to understand while in university but completely debunked … Read moreMachine Learning — Don’t Just Rely on Your University

You Need to Move from Cloud Computing to Edge Computing Now!

This type of infrastructure may be suitable for applications where users can afford to wait for 2 or 3 seconds to get a response. However, this is unsuitable for applications that need to respond faster, and especially those which are looking for real-time action such as in a self-driving car. But even in a more … Read moreYou Need to Move from Cloud Computing to Edge Computing Now!

Why Your Organization Should Become More Data-Driven

The advantages of understanding and applying the value of data. Image Source: UnSplash Machine learning is now useful to organizations of any type and any size, for processes ranging from the routine to the revolutionary. Remaining competitive in this market requires a working knowledge of artificial intelligence and machine learning skills. This piece explains the … Read moreWhy Your Organization Should Become More Data-Driven

Productionalize Your Machine Learning Model Using Flask And Google App Engine

This small tutorial will help you understand how a trained machine learning model is used in production. Nowadays you can find lots of tutorials, MOOCs and videos for learning Data Science and Machine Learning. But none of them explain what happens to your machine learning model after you train and optimize one at your local … Read moreProductionalize Your Machine Learning Model Using Flask And Google App Engine

How Wearables & Biosensors Can Help Researchers Fight Crohn’s Disease Using Machine Learning

Photo by Luke Chesser on Unsplash My boyfriend found out he had Crohn’s disease when he was 20 years old. Just like many of us, he managed the pain until he could not ignore it anymore and avoided going to see a doctor for several weeks. Before meeting him, I had never heard of Crohn’s … Read moreHow Wearables & Biosensors Can Help Researchers Fight Crohn’s Disease Using Machine Learning

Understanding the Normal Distribution (with Python)

(For the full code, please check out my GitHub here) First, let’s get our inputs out of the way: import numpy as npfrom scipy.stats import normimport matplotlib.pyplot as pltimport seaborn as sns Now let’s generate some data. We will assume that the true mean height of a person is 5 feet 6 inches and the … Read moreUnderstanding the Normal Distribution (with Python)

Deep Learning Algorithms and Brain-Computer Interfaces

As part of a research team, I wanted to explain how Deep learning (DL) has lifted the performance of brain-computer interface systems (BCI) significantly in recent years. For those not familiar with Brain-Computer Interface, BCI is a system that translates activity patterns of the human brain into messages or commands to communicate with other devices. … Read moreDeep Learning Algorithms and Brain-Computer Interfaces

How Insurance Companies Have Successfully Adopted AI Data and Machine Learning

Is your insurance company doing the best they can for you with the technology available? Image Source: UnSplash.com The take-up of A.I. has become a key feature in driving business changes across the insurance journey lifecycle. Early adopters use it obtain better lead scoring, higher conversion rates, more effective cross-sell and upsell, increased retention, and … Read moreHow Insurance Companies Have Successfully Adopted AI Data and Machine Learning

Deep learning has a new friend — Tabular datasets

Going beyond deep learning just for image datasets When you think Deep learning, tabular data sets do not come first in our minds. What we first think is image datasets. This is because Deep learning and image datasets have a special relationship. Deep learning became popular mainly due its capability to do wonders on images. … Read moreDeep learning has a new friend — Tabular datasets

Web Scraping of 10 Online Shops in 30 Minutes with Python and Scrapy

Photo by Nguyen Bui on Unsplash Scrapy Spiders are classes which define how a certain site (or a group of sites) will be scraped, including how to perform the crawl (i.e. follow links) and how to extract structured data from their pages (i.e. scraping items). In other words, Spiders are the place where you define … Read moreWeb Scraping of 10 Online Shops in 30 Minutes with Python and Scrapy

Lenovo T495 thoughts

Mac lovers beware https://www.lenovo.com/us/en/laptops/thinkpad/thinkpad-t-series/T495s/p/22TP2TT495S Irrespective of your particular proclivity to Python, R, Java or I guess even JavaScript you will need a mobile workstation for today’s 24X7 Corporate cultures. Real computing is now in the Cloud and you only need a Browser or SSH terminal to play with that. Or so the Cloud war stories … Read moreLenovo T495 thoughts

Jupyter notebooks tips and tricks

There are many great extensions in jupyter_contrib_nbextensions. You should be using Jupyter lab though instead. First you need to instead jupyter_contrib_nbextensions and then you can install various useful extensions. pip install jupyter_contrib_nbextensionsjupyter contrib nbextension install –user These are the ones I love: code_prettify backed by autopep8 is great for reformatting code in notebook code cells … Read moreJupyter notebooks tips and tricks

Time Series Analysis From Scratch in Python: Part 2

What resampling means in a nutshell, is a way of aggregating the data. Here I’m using ‘M’ as a resampling rule, which stands for month, and I’m using mean as an aggregation function. That will do the following: Fetch all prices for a given month Calculate the mean by dividing the sum of all prices … Read moreTime Series Analysis From Scratch in Python: Part 2

5 Ideas for Data Science projects you can try now

Start building your GitHub portfolio of open-source projects now. I’m a huge fan of building your own portfolio of open-source GitHub projects as a way to grow as a Data Scientist. Here’s a list of projects you can try for yourself if you’re into practical data science. Those will definitely get you hired! Build a … Read more5 Ideas for Data Science projects you can try now

Why AI Should Be Handled With Caution

How AI Could Make or Break the Global Economy Image Source: Pixabay Artificial intelligence (AI) is everywhere, generating excitement about how it could transform our lives in multiple ways. Yet the technology is very likely to be disruptive. Businesses and policymakers must try to capture the full value of what AI has to offer while … Read moreWhy AI Should Be Handled With Caution

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

Artificial intelligence, or AI for short, has been around since the mid 1950s. It’s not necessarily new. But it became super popular recently because of the advancements in processing capabilities. Back in the 1900s, there just wasn’t the necessary computing power to realise AI. Today, we have some of the fastest computers the world has … Read moreData Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

Using RandomForest to Predict Medical Appointment No-shows

Photo by Adhy Savala on Unsplash No-shows, or patients who miss their scheduled appointments are common and costly to healthcare institutions. A US study found that up to 30% of patients miss their appointments, and $150 billion is lost every year because of them. Identifying potential no-shows can help healthcare institutions pursue targeted interventions (e.g. … Read moreUsing RandomForest to Predict Medical Appointment No-shows

Optimizing Blackjack Strategy through Monte Carlo Methods

Fundamentals of Reinforcement Learning Reinforcement Learning has taken the AI world by storm. From AlphaGo to AlphaStar, increasing numbers of traditional human-dominated activities have now been conquered by AI agents powered by reinforcement learning. Briefly, these achievements rely on the optimization of an agent’s actions within an environment to achieve maximal reward. Over the past … Read moreOptimizing Blackjack Strategy through Monte Carlo Methods

An Introduction to Binomials & Inference

Inference is about drawing conclusions about a greater population via some sample of observed data. For example, you have some sample of the countries opinion on the president and you’d like to make some conclusions about the population at large. Obviously you wont be asking every single citizen, rather you will make an inference about … Read moreAn Introduction to Binomials & Inference

Deep Learning for Detecting Pneumonia from X-ray Images

An end to end pipeline for pneumonia detection from X-ray images The risk of pneumonia is immense for many, especially in developing nations where billions face energy poverty and rely on polluting forms of energy. The WHO estimates that over 4 million premature deaths occur annually from household air pollution-related diseases including pneumonia. Over 150 … Read moreDeep Learning for Detecting Pneumonia from X-ray Images

Exploring Textual Data using LDA

Make sense of unstructured text data by applying machine learning principles. I recently completed my first machine learning project at work and decided to apply the methods used in that project to a project of my own. The project I completed at work revolved around automatically classifying textual data using Latent Dirichlet Allocation (LDA). LDA … Read moreExploring Textual Data using LDA

A Journey into BigQuery Fuzzy Matching — 4 of [1, ∞) — A Tribute to FuzzyWuzzy

Okay, I know in the last article I said we would use this one to start going into adding address elements and match groups (and I promise they’re still on their way), but I wanted to take a slight detour and add another set of functions before we get there. Let’s revisit the Levenshtein Distance … Read moreA Journey into BigQuery Fuzzy Matching — 4 of [1, ∞) — A Tribute to FuzzyWuzzy

Introduction to Statistical Methods in AI

Statistical Learning is a set of tools for understanding data. These tools broadly come under two classes: supervised learning & unsupervised learning. Generally, supervised learning refers to predicting or estimating an output based on one or more inputs. Unsupervised learning, on the other hand, provides a relationship or finds a pattern within the given data … Read moreIntroduction to Statistical Methods in AI

An Introduction to Enterprise Artificial Intelligence

The Importance of Data-Management for A.I and Machine Learning implementation in the Business World. Image Source: UnSplash.com As business moves into a new decade, the 2020’s, enterprises continue to look for the leg up that will push them above the competition. For years now, artificial intelligence has been one of the critical technologies that can … Read moreAn Introduction to Enterprise Artificial Intelligence

Scraping Web Articles Using NewsAPI in Python

This post is meant to provide a gentle introduction to the scraping and use of the popular news web scraping tool NewsAPI. When scraping relevant news articles, there are a variety of options to choose from. Bing News Search, Bloomberg, and New York Times all have very useful API programs. However, NewsAPI is the jack-of … Read moreScraping Web Articles Using NewsAPI in Python

Normality testing: The graphical way

The entire code is provided in a gist below There are a certain set of assumptions that are applicable when working with regression problems. Take for instance linear regression where we have the following assumptions- 1) We have a linear relationship between the independent variable and the target variable.2) Our data is homoscedastic3) Residuals have … Read moreNormality testing: The graphical way

Learn NLP the practical way

Getting quickly from none to done. Writing is very different today compared with fifteen years ago. To be published used to mean in print, which constrained space but was less hasty. There were more gatekeepers, and way less content competing for readers’ attention. It only took a few years, but technology completely overhauled the economics … Read moreLearn NLP the practical way

How to create a machine learning dataset from scratch?

My grandmothers cook book meets machine learning part I Figure 1: My grandmothers old, German cookbook: “Praktisches Kochbuch” by Henriette Davidis My grandmother was an outstanding cook. So when I recently came across her old cook book I tried to read through some of the recipes, hoping I could recreate some of the dishes I … Read moreHow to create a machine learning dataset from scratch?

Predictive spatial modeling of loan repayment

Over 350 million people in Africa lack access to electricity. Fenix International is addressing this by building affordable lease-to-own solar home systems. And they provide upfront, flexible financing. The basic ReadyPay Solar Power System produces electricity for lighting and phone charging and can be upgraded to power radios, televisions, and cookstoves. Customers pay an initial … Read morePredictive spatial modeling of loan repayment

Importance of Innovation Indicators utilizing Machine Learning

[1] R. Carnegie and Business Council of Australia., Managing the innovating enterprise: Australian companies competing with the world’s best. Business Library, 1993, p. 427, ISBN: 1863501517. [Online]. Available: https://catalogue.nla. gov.au/Record/1573090. [2] H. Tohidi and M. M. Jabbari, “The important of Innovation and its Crucial Role in Growth, Survival and Success of Organizations”, Procedia Technology, vol. … Read moreImportance of Innovation Indicators utilizing Machine Learning

Using Spark to Predict Churn

Distributions of Numerical Features Now we have 16 features in total (excluding the userId and label(churn) columns). root|– userId: string (nullable = true)|– label: integer (nullable = true)|– gender: string (nullable = true)|– last_level: string (nullable = true)|– n_session: long (nullable = false)|– reg_days: integer (nullable = true)|– n_about_per_hour: double (nullable = true)|– n_error_per_hour: double … Read moreUsing Spark to Predict Churn

Data Science jobs

https://en.wikipedia.org/wiki/Big_data#/media/File:Big_Data.png Running an analytics job on Social data, streaming at Real Time, globally is probably a Petabyte (PB) discussion. An end user workstation is part of the technology that takes part in the job. Weak off-the-shelf Computers Workstations, either Laptops or Desktops, get developed and are marketed based on a defined set of use cases. … Read moreData Science jobs

Complete Data Science Project Template with Mlflow for Non-Dummies.

Best practices for everyone working either locally or in the cloud, from start-up ninja to big enterprise teams. Data science has come a long way as a field and business function alike. There are now cross-functional teams working on algorithms all the way to full-stack data science products. Kenneth Jensen [CC BY-SA 3.0] With the … Read moreComplete Data Science Project Template with Mlflow for Non-Dummies.

Product Category Prediction with Time-series Using ARIMA

Onto the application, first, we must determine if the data is stationary and this can be achieved by using an Augmented Dickey-Fuller Test. In the test, we want to see if we can reject the null hypothesis (𝐻₀) which states that time series is not stationary. Second, we can perform mathematical transformation using rolling mean … Read moreProduct Category Prediction with Time-series Using ARIMA

An Exploration of Human Genetic Variants In The ClinVar Database

What lies at the heart of every living thing is not a fire, not warm breath, not a ‘spark of life’. It is information, words, instructions. If you want to understand life, don’t think about vibrant, throbbing gels and oozes, think about information technology. -Richard Dawkins As Dawkins described in his ever-vocal and oft-mocking exclamation, … Read moreAn Exploration of Human Genetic Variants In The ClinVar Database

Why AutoML is An Essential New Tool For Data Scientists

Machine learning (ML) is the current paradigm for modeling statistical phenomena by harnessing algorithms that exploit computer intelligence. It is common place to build ML models that predict housing prices, aggregate users by their potential marketing interests, and use image recognition techniques to identify brain tumors. However, up until now these models have required scrupulous … Read moreWhy AutoML is An Essential New Tool For Data Scientists

Entropy — The Pillar of both Thermodynamics and Information Theory

Entropy is a vague yet powerful term that forms that backbone of many key ideas in Thermodynamics and Information Theory. It was first identified by physical scientists in the 19th century and acted as a guiding principle for many of the Industrial Revolution’s revolutionary technologies. However, the term also helped spark the Information Age when … Read moreEntropy — The Pillar of both Thermodynamics and Information Theory

Knowledge Distillation — A technique developed for compacting and accelerating Neural Nets

Intuition The red plot is of the large model(Teacher), the blue plot is of the student network trained without distillation, and the purple plot is of the student network trained using distillation. Our main objective is to train a model that could generate good results on the test dataset. To do this, we train a … Read moreKnowledge Distillation — A technique developed for compacting and accelerating Neural Nets

Streamlit 101: An in-depth introduction

Deep dive into Streamlit with Airbnb NYC data Streamlit is an awesome new tool that allows engineers to quickly build highly interactive web applications around their data, machine learning models, and pretty much anything. The best thing about Streamlit is it doesn’t require any knowledge of web development. If you know Python, you’re good to … Read moreStreamlit 101: An in-depth introduction

Illustrated: Self-Attention

0. What is self-attention? If you’re thinking if self-attention is similar to attention, then the answer is yes! They fundamentally share the same concept and many common mathematical operations. A self-attention module takes in n inputs, and returns n outputs. What happens in this module? In laymen’s terms, the self-attention mechanism allows the inputs to … Read moreIllustrated: Self-Attention