Flask: An Easy Access Door to API development

Photo by Chris Ried on Unsplash The world has gone through a huge transition; from separating the piece of code as functions in procedural languages to the development of libraries; from RPC calls to Web Service specifications in Service Oriented Architecture(SOA) like SOAP and REST. This has paved a way to Web APIs and microservices, … Read moreFlask: An Easy Access Door to API development

Deep Learning Approach for Separating Fast and Slow Components

Some Background (A slide deck for this work can be found https://speakerdeck.com/jchin/decomposing-dynamics-from-different-time-scale-for-time-lapse-image-sequences-with-a-deep-cnn) I left my job as a Scientific Fellow in PacBio after 9-year venture helping to make single-molecule sequencing becoming useful for the scientific community (see my story about the first couple year in PacBio there). Most of my technical/scientific work had something to … Read moreDeep Learning Approach for Separating Fast and Slow Components

Combining supervised learning and unsupervised learning to improve word vectors

To achieve state-of-the-art result in NLP tasks, researchers try tremendous way to let machine understand language and solving downstream tasks such as textual entailment, semantic classification. OpenAI released a new model which named as Generative Pre-Training (GPT). After reading this article, you will understand: Finetuned Transformer LM Design Architecture Experiments Implementation Take Away Finetuned Transformer … Read moreCombining supervised learning and unsupervised learning to improve word vectors

How to deploy your website to a custom domain

This blog documents the steps needed to deploy a website written in Python with Flask framework to a custom domain using Heroku and NameCheap. Flask is a micro-framework that allows us to use Python in the back-end to interact with our front-end code in HTML/CSS or Javascript to build web sites. People also use other … Read moreHow to deploy your website to a custom domain

How to do Deep Learning on Graphs with Graph Convolutional Networks

Part 2: Semi-Supervised Learning with Spectral Graph Convolutions Machine learning on graphs is a difficult task due to the highly complex, but also informative graph structure. This post is the second in a series on how to do deep learning on graphs with Graph Convolutional Networks (GCNs), a powerful type of neural network designed to … Read moreHow to do Deep Learning on Graphs with Graph Convolutional Networks

Machine Learning Project: Predicting Boston House Prices With Regression

Introduction In this project, we will develop and evaluate the performance and the predictive power of a model trained and tested on data collected from houses in Boston’s suburbs. Once we obtain a good fit, we will use this model to predict about the monetary value of a house which is in that location. A … Read moreMachine Learning Project: Predicting Boston House Prices With Regression

Lessons Learned from Kaggle’s Airbus Challenge.

The challenge banner Over the last three months, I have participated in the Airbus Ship Detection Kaggle challenge. As evident from the title, it is a detection computer vision (segmentation to be more precise) competition proposed by Airbus (its satellite data division) that consists in detecting ships in satellite images. Before I start this challenge, … Read moreLessons Learned from Kaggle’s Airbus Challenge.

I wrote a program that speaks like the collective hive-mind of The Straits Times Forum

Results I very diligently studied thousands of the Straits Times Forum Letters and was able to create a second-order Markov chain capturing the “style” of the forum letters. I then generated my own articles using the above-mentioned second-order Markov chain — you can play with it here: Straits Times Forum Letter Generator. Here are some of my … Read moreI wrote a program that speaks like the collective hive-mind of The Straits Times Forum

Statistics is the Grammar of Data Science — Part 1

Data Types We cannot go more basic than this: Data is split in three categories, based on which a Data Scientist chooses how to further analyse and process it: #1. Numerical data represents some quantifiable information that is measurable and is further divided into two subcategories: Discrete data, which is integer based (e.g. number of … Read moreStatistics is the Grammar of Data Science — Part 1

A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

“We trained a machine learning model with high performance. However, it did not work and was not useful in practice.” I have heard this sentence several times, and each time I was eager to find out the reason. There could be different reasons that a model failed to work in practice. As these issues are … Read moreA Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

Welcome to the Forest. London Borough of Culture 2019 Twitter Analysis

Welcome to the Forest. We’ve got fun and games! Last weekend between Friday 11th January to Sunday 13th January 2019, Waltham Forest, a Borough of London, threw a huge three-day event to celebrate being chosen as the first ever Mayor’s London Borough of Culture. The event was called Welcome to the Forest and was described as … Read moreWelcome to the Forest. London Borough of Culture 2019 Twitter Analysis

AI or marketing hype? (My first lunch and learn at work)

I’m the only data scientist at my company. It allows me to have a huge amount of breadth in my work, which is great, but it leaves me few people to really nerd out with. I mean the type of nerding out that’s specific to data science- there’s definitely a lot of nerding out that … Read moreAI or marketing hype? (My first lunch and learn at work)

Roadmap for multi-class sentiment analysis with deep learning

A practical guide to create incrementally better models Sentiment analysis quickly gets difficult as we increase the number of classes. For this blog, we’ll have a look at what difficulties you might face and how to get around them when you try to solve such a problem. Instead of prioritizing theoretical rigor, I’ll focus on … Read moreRoadmap for multi-class sentiment analysis with deep learning

Ridesharing my way — Uber

USA Uber only provides you with the trip begin and end coordinates. I calculated the haversine distance between the coordinates. This provided me with a lower bound estimate for the ride distance. Haversine distance is basically euclidean distance but on a sphere. It takes into consideration the latitude and longitude to calculate the straight line … Read moreRidesharing my way — Uber

Rat City: Visualizing New York City’s Rat Problem

Is Your Neighborhood a Rat Hotspot too? Check out the interactive rat sighting map here: https://nbviewer.jupyter.org/github/lksfr/rats_nyc/blob/master/rats_for_nbviewer_only.ipynb Introduction If you have ever spent a significant amount of time in New York City, you have very likely come across rats. Regardless if you are waiting for the subway or strolling through Washington Square Park, your chances of running … Read moreRat City: Visualizing New York City’s Rat Problem

Simply deep learning: an effortless introduction

Conquer artificial neural network basics in less than 15 minutes This article is part of the Intro to Deep Learning: Neural Networks for Novices, Newbies, and Neophytes Series. Photo by ibjennyjenny on Pixabay What is an artificial neural network, how does it work, and what does it have to do with deep learning? Let’s start with a … Read moreSimply deep learning: an effortless introduction

Startup Funding, Investments, and Acquisitions

Exploratory Data Analysis (EDA) Funding I am just going to just jump straight in and figure out whether we can answer our first question. Well, we can break it down a bit since there are a number of parts to this question. Let’s first look at the average amount funded, total funding and the number of … Read moreStartup Funding, Investments, and Acquisitions

Gentle Introduction of XGBoost Library

If things don’t go your way in predictive modeling, use XGboost. XGBoost algorithm has become the ultimate weapon of many data scientist. It’s a highly sophisticated algorithm, powerful enough to deal with all sorts of irregularities of data. In this article, you will discover XGBoost and get a gentle introduction to what it is, where … Read moreGentle Introduction of XGBoost Library

From FaceApp to Deepfakes

Thoughts on appropriation and AI Considering my background in both photography and Gender Studies, perhaps it’s no surprise that I became interested in the works of people like Yasumasa Morimura and Cindy Sherman. Both artists used self-portraiture to explore the performance of identity, often referencing other media. Sherman became known for her series Untitled Film Stills, … Read moreFrom FaceApp to Deepfakes

Prediction task with Multivariate TimeSeries and VAR model.

Time Series data can be confusing, but very interesting to explore. The reason this sort of data grabbed my attention is that it can be found in almost every business (sales, deliveries, weather conditions etc.). For instance: using Google BigQuery how to explore weather effects on NYC link. The main steps in the task: Problem … Read morePrediction task with Multivariate TimeSeries and VAR model.

Computer Designed Humans — The AI Revolution in the Test Tube

Forget self-driving cars and voice-controlled speakers: the most dramatic effects of artificial intelligence will be seen in a very different area in the coming years. These days there are always reports from the world of science whose cross connections and consequences are not immediately obvious. A current example can be found in the latest edition … Read moreComputer Designed Humans — The AI Revolution in the Test Tube

Pricing diamonds using scatterplots and predictive models

My last post railed against the bad visualizations that people often use to plot quantitive data by groups, and pitted pie charts, bar charts and dot plots against each other for two visualization tasks. Dot plots came out on top. I argued that this is because humans are good at the cognitive task of comparing … Read morePricing diamonds using scatterplots and predictive models

Implementing a Corporate AI Strategy

There is a cost to moving too slowly — almost as much as moving too fast In the wake of this generation’s digital transformation, machine learning and the greater promise of artificial intelligence creates wonder in people’s minds and effervescence within organizations. And the attraction to the field is justified: troves of process improvements are announced every day, … Read moreImplementing a Corporate AI Strategy

A Crash course on proving the Halting Problem

Explained in an informally rigorous way A plan for Charles Babbage’s Analytical Engine circa 1840, which would have been a Turing complete mechanical computer had it ever been built. CC BY 4.0 Suppose Jeff Bezos announced over twitter: “I will offer $1 Billion to the person who can write a program that can test any and all … Read moreA Crash course on proving the Halting Problem

The easy way to use Maxmind GeoIP with Redshift

Photo by Westley Ferguson on Unsplash It always starts with an innocent observation. “We get a lot of traffic from Boston,” your boss remarks. You naturally throw out a guess or two and discuss why that might be. Until your boss drops the bomb — “Can you dig into that?” Darn it. You walked right into that … Read moreThe easy way to use Maxmind GeoIP with Redshift

What is data?

Musings on information, memory, analytics, and distributions Everything our senses perceive is data, though its storage in our cranial wet stuff leaves something to be desired. Writing it down is a bit more reliable, especially when we write it down on a computer. When those notes are well-organized, we call them data… though I’ve seen … Read moreWhat is data?

Autoencoders for the compression of stock market data

A Pythonic exploration of diverse neural-network autoencoders to reduce the dimensionality of Bitcoin price time series Stock market data space is highly dimensional and, as such, algorithms that try to exploit potential patterns or structure in the price formation can suffer from the so-called “curse of dimensionality”. In this short article, we will explore the potential … Read moreAutoencoders for the compression of stock market data

Predicting Breast Cancer with Decision Trees

How to implement decision trees with bagging, boosting and random forest to predict breast cancer from routine blood tests Photo by Hello I’m Nik on Unsplash In a previous post, I introduced the theory of decision trees and its performance can be improved using bagging, boosting or random forests. Now, we implement these techniques to predict … Read morePredicting Breast Cancer with Decision Trees

Recommender Systems and Hyper-parameter tuning

Photo by rawpixel on Unsplash The (often) forgotten child of Machine Learning Everyone with an internet connection has been subjected to a recommender system (RS). Spotify suggestions to Almost all media services have a particular section where the system recommends things to you, being things a movie in Netflix, a product to buy in Amazon, a playlist … Read moreRecommender Systems and Hyper-parameter tuning

Data Science and the Paradox of Predictions

Paradox by Nick Youngson How the act of knowing changes what we know. Many data science projects are a hunt for knowledge. As history has taught us through the years, the mere act of knowing can change what it is we believe to know. Professor Harari explores this topic in Homo Deus with the skill we’ve become … Read moreData Science and the Paradox of Predictions

On the role of technology in Regulatory Modernization

The challenge with regulations Regulations are instruments of legislative power and have the force of law. They carry out the intent of corresponding Acts which set out requirements that businesses must adhere to. Regulations are necessary to protect the health, safety and security of individual consumers and the environment as well as to support commerce … Read moreOn the role of technology in Regulatory Modernization

Improve your workflow by managing your machine learning experiments using Sacred

Model tuning is my least favorite task as a Data Scientist. I hate it. I think it’s because managing the experiments is something that always gets very messy. While searching for tools to help me with that I saw a lot of people mentioning Sacred, so I decided to give it a try. In this … Read moreImprove your workflow by managing your machine learning experiments using Sacred

Deep Q-Network Implementation with SONY’s NNabla

WHAT IS NNABLA? SONY released Neural Network Libraries, in short “NNabla”. NNabla is device-ready and has fast training speed with GPU by efficient memory management. The most interesting feature is that NNabla allows both define-by-run and define-and-run by default. For example, the define-and-run style code looks like below. # build static graph like tensorflowx = nn.Variable((2, … Read moreDeep Q-Network Implementation with SONY’s NNabla

Implementing a ResNet model from scratch.

When implementing the ResNet architecture in a deep learning project I was working on, it was a huge leap from the basic, simple convolutional neural networks I was used to. One prominent feature of ResNet is that it utilizes a micro-architecture within it’s larger macroarchitecture: residual blocks! I decided to look into the model myself … Read moreImplementing a ResNet model from scratch.

Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model

Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” Language modeling has become an important NLP technique thanks to the ability to apply it to various NLP tasks, such as machine translation and topic classification. Today, there are two leading architectures for language modeling — Recurrent Neural Networks (RNNs) and Transformers. While the former handles the … Read moreTransformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model

Machine learning with the “diabetes” data set in R

decision tree Following the same logic as used for choosing logistic regression over linear regression, we’ll be building a classification tree rather than a regression tree. Decision trees construct “nodes” at which data is separated, eventually terminating in “leaves” which give you the model’s assigned class. Here again, I implemented 100 folds of training-test splits, … Read moreMachine learning with the “diabetes” data set in R

Solving Travelling Salesperson Problems with Python

How to use randomized optimization algorithms to solve travelling salesperson problems with Python’s mlrose package mlrose provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss what is meant by the travelling salesperson problem and step … Read moreSolving Travelling Salesperson Problems with Python

How I built a Cannabis Recommendation app using Topic Models and Latent Dirchlet Allocation (LDA)

How I built www.rightstrain.co, a cannabis recommendation tool that used by online dispensaries Background: On October 17th, 2018, cannabis became legal in Canada. As an entrepreneur, I’m always reading about the latest tech startups, following how markets are developing and sniffing out emerging opportunities. As a data scientist, I’m always looking for data driven solutions … Read moreHow I built a Cannabis Recommendation app using Topic Models and Latent Dirchlet Allocation (LDA)

A journey into Convolutional Neural Network visualization

Francesco Saverio Zuppichini There is one famous urban legend about computer vision. Around the 80s, the US military wanted to use neural networks to automatically detect camouflaged enemy tanks. They took a number of pictures of trees without tanks and then pictures with the same trees with tanks behind them. The results were impressive. So … Read moreA journey into Convolutional Neural Network visualization

The 4 Types of Johnny Depp Movies

Process of Clustering Movies After gathering filmography data from IMDb, box office revenue data from Box Office Mojo, and critic rating data from Metacritic, it was time to prepare an algorithm that could adequately group movies into different groups. To effectively separate movies into distinct groups, I implemented a clustering algorithm known as a Gaussian … Read moreThe 4 Types of Johnny Depp Movies

Automating project management with deep learning

Introduction In the data-driven future of project management, project managers will be augmented by artificial intelligence that can highlight project risks, determine the optimal allocation of resources and automate project management tasks. For example, many organisations require project managers to provide regular project status updates as part of the delivery assurance process. These updates typically … Read moreAutomating project management with deep learning

Game Theory for Data Scientists

The Theoretical Foundations of Multi-Agent AI Systems Games are playing a key role in the evolution of artificial intelligence(AI). For starters, game environments are becoming a popular training mechanism in areas such as reinforcement learning or imitation learning. In theory, any multi-agent AI system can be subjected to gamified interactions between its participants. The branch of … Read moreGame Theory for Data Scientists