How to deploy your website to a custom domain

This blog documents the steps needed to deploy a website written in Python with Flask framework to a custom domain using Heroku and NameCheap. Flask is a micro-framework that allows us to use Python in the back-end to interact with our front-end code in HTML/CSS or Javascript to build web sites. People also use other … Read more How to deploy your website to a custom domain

How to do Deep Learning on Graphs with Graph Convolutional Networks

Part 2: Semi-Supervised Learning with Spectral Graph Convolutions Machine learning on graphs is a difficult task due to the highly complex, but also informative graph structure. This post is the second in a series on how to do deep learning on graphs with Graph Convolutional Networks (GCNs), a powerful type of neural network designed to … Read more How to do Deep Learning on Graphs with Graph Convolutional Networks

Machine Learning Project: Predicting Boston House Prices With Regression

Introduction In this project, we will develop and evaluate the performance and the predictive power of a model trained and tested on data collected from houses in Boston’s suburbs. Once we obtain a good fit, we will use this model to predict about the monetary value of a house which is in that location. A … Read more Machine Learning Project: Predicting Boston House Prices With Regression

My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 6,7,8

This is the final set of presentations in my series ‘Elements of Neural Networks and Deep Learning’. This set follows the earlier 2 sets of presentations namely1. My presentations on ‘Elements of Neural Networks & Deep Learning’ -Part1,2,32. My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5 In this final set of … Read more My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 6,7,8

Lessons Learned from Kaggle’s Airbus Challenge.

The challenge banner Over the last three months, I have participated in the Airbus Ship Detection Kaggle challenge. As evident from the title, it is a detection computer vision (segmentation to be more precise) competition proposed by Airbus (its satellite data division) that consists in detecting ships in satellite images. Before I start this challenge, … Read more Lessons Learned from Kaggle’s Airbus Challenge.

I wrote a program that speaks like the collective hive-mind of The Straits Times Forum

Results I very diligently studied thousands of the Straits Times Forum Letters and was able to create a second-order Markov chain capturing the “style” of the forum letters. I then generated my own articles using the above-mentioned second-order Markov chain — you can play with it here: Straits Times Forum Letter Generator. Here are some of my … Read more I wrote a program that speaks like the collective hive-mind of The Straits Times Forum

Statistics is the Grammar of Data Science — Part 1

Data Types We cannot go more basic than this: Data is split in three categories, based on which a Data Scientist chooses how to further analyse and process it: #1. Numerical data represents some quantifiable information that is measurable and is further divided into two subcategories: Discrete data, which is integer based (e.g. number of … Read more Statistics is the Grammar of Data Science — Part 1

A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

“We trained a machine learning model with high performance. However, it did not work and was not useful in practice.” I have heard this sentence several times, and each time I was eager to find out the reason. There could be different reasons that a model failed to work in practice. As these issues are … Read more A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

Welcome to the Forest. London Borough of Culture 2019 Twitter Analysis

Welcome to the Forest. We’ve got fun and games! Last weekend between Friday 11th January to Sunday 13th January 2019, Waltham Forest, a Borough of London, threw a huge three-day event to celebrate being chosen as the first ever Mayor’s London Borough of Culture. The event was called Welcome to the Forest and was described as … Read more Welcome to the Forest. London Borough of Culture 2019 Twitter Analysis

A Newbie’s Guide to Making A Pull Request (for an R package)

I had the wonderful opportunity to participate in the{tidyverse} Developer Daythe day after rstudio::conf2019officially wrapped up. One of the objectives of the eventwas to encourageopen-source contributor newbies (like me ?) togain some experience, namely through submittingpull requests to address issues with {tidyverse} packages. Having only ever worked with my own packages/repos before,I found this was … Read more A Newbie’s Guide to Making A Pull Request (for an R package)

GeoPAT2: Entropy calculations for local landscapes

GeoPAT 2 is an open-source software written in C and dedicated to pattern-based spatial and temporal analysis.Four main types of analysis available in GeoPAT 2 are (i) search, (ii) change detection, (iii) segmentation, and (iv) clustering.However, additional applications are also possible, including extracting information about spatial patterns. Global landscape diversity (based on Shannon entropy of … Read more GeoPAT2: Entropy calculations for local landscapes

AI or marketing hype? (My first lunch and learn at work)

I’m the only data scientist at my company. It allows me to have a huge amount of breadth in my work, which is great, but it leaves me few people to really nerd out with. I mean the type of nerding out that’s specific to data science- there’s definitely a lot of nerding out that … Read more AI or marketing hype? (My first lunch and learn at work)

Roadmap for multi-class sentiment analysis with deep learning

A practical guide to create incrementally better models Sentiment analysis quickly gets difficult as we increase the number of classes. For this blog, we’ll have a look at what difficulties you might face and how to get around them when you try to solve such a problem. Instead of prioritizing theoretical rigor, I’ll focus on … Read more Roadmap for multi-class sentiment analysis with deep learning

Ridesharing my way — Uber

USA Uber only provides you with the trip begin and end coordinates. I calculated the haversine distance between the coordinates. This provided me with a lower bound estimate for the ride distance. Haversine distance is basically euclidean distance but on a sphere. It takes into consideration the latitude and longitude to calculate the straight line … Read more Ridesharing my way — Uber

Rat City: Visualizing New York City’s Rat Problem

Is Your Neighborhood a Rat Hotspot too? Check out the interactive rat sighting map here: https://nbviewer.jupyter.org/github/lksfr/rats_nyc/blob/master/rats_for_nbviewer_only.ipynb Introduction If you have ever spent a significant amount of time in New York City, you have very likely come across rats. Regardless if you are waiting for the subway or strolling through Washington Square Park, your chances of running … Read more Rat City: Visualizing New York City’s Rat Problem

Simply deep learning: an effortless introduction

Conquer artificial neural network basics in less than 15 minutes This article is part of the Intro to Deep Learning: Neural Networks for Novices, Newbies, and Neophytes Series. Photo by ibjennyjenny on Pixabay What is an artificial neural network, how does it work, and what does it have to do with deep learning? Let’s start with a … Read more Simply deep learning: an effortless introduction

Startup Funding, Investments, and Acquisitions

Exploratory Data Analysis (EDA) Funding I am just going to just jump straight in and figure out whether we can answer our first question. Well, we can break it down a bit since there are a number of parts to this question. Let’s first look at the average amount funded, total funding and the number of … Read more Startup Funding, Investments, and Acquisitions

Gentle Introduction of XGBoost Library

If things don’t go your way in predictive modeling, use XGboost. XGBoost algorithm has become the ultimate weapon of many data scientist. It’s a highly sophisticated algorithm, powerful enough to deal with all sorts of irregularities of data. In this article, you will discover XGBoost and get a gentle introduction to what it is, where … Read more Gentle Introduction of XGBoost Library

From FaceApp to Deepfakes

Thoughts on appropriation and AI Considering my background in both photography and Gender Studies, perhaps it’s no surprise that I became interested in the works of people like Yasumasa Morimura and Cindy Sherman. Both artists used self-portraiture to explore the performance of identity, often referencing other media. Sherman became known for her series Untitled Film Stills, … Read more From FaceApp to Deepfakes

Prediction task with Multivariate TimeSeries and VAR model.

Time Series data can be confusing, but very interesting to explore. The reason this sort of data grabbed my attention is that it can be found in almost every business (sales, deliveries, weather conditions etc.). For instance: using Google BigQuery how to explore weather effects on NYC link. The main steps in the task: Problem … Read more Prediction task with Multivariate TimeSeries and VAR model.

Computer Designed Humans — The AI Revolution in the Test Tube

Forget self-driving cars and voice-controlled speakers: the most dramatic effects of artificial intelligence will be seen in a very different area in the coming years. These days there are always reports from the world of science whose cross connections and consequences are not immediately obvious. A current example can be found in the latest edition … Read more Computer Designed Humans — The AI Revolution in the Test Tube

Pricing diamonds using scatterplots and predictive models

My last post railed against the bad visualizations that people often use to plot quantitive data by groups, and pitted pie charts, bar charts and dot plots against each other for two visualization tasks. Dot plots came out on top. I argued that this is because humans are good at the cognitive task of comparing … Read more Pricing diamonds using scatterplots and predictive models

Implementing a Corporate AI Strategy

There is a cost to moving too slowly — almost as much as moving too fast In the wake of this generation’s digital transformation, machine learning and the greater promise of artificial intelligence creates wonder in people’s minds and effervescence within organizations. And the attraction to the field is justified: troves of process improvements are announced every day, … Read more Implementing a Corporate AI Strategy

Create R Markdown reports and presentations even better with these 3 practical tips

Including R Markdown in the workflow for presenting and publishing analyses that use code in R or other languages is a great way to make presentations, dashboards or reports good looking, reproducible and version controllable. In this post, we will look at three simple ways to improve that workflow even further with methods that are … Read more Create R Markdown reports and presentations even better with these 3 practical tips

simmer 4.2.1

The 4.2.1 release of simmer, the Discrete-Event Simulator for R, is on CRAN with quite interesting new features and fixes. As discussed in the mailing list, there is a way to handle the specific case in which an arrival is rejected because a queue is full: library(simmer) reject <- trajectory() %>% log_(“kicked off…”) patient <- … Read more simmer 4.2.1

A Crash course on proving the Halting Problem

Explained in an informally rigorous way A plan for Charles Babbage’s Analytical Engine circa 1840, which would have been a Turing complete mechanical computer had it ever been built. CC BY 4.0 Suppose Jeff Bezos announced over twitter: “I will offer $1 Billion to the person who can write a program that can test any and all … Read more A Crash course on proving the Halting Problem

The easy way to use Maxmind GeoIP with Redshift

Photo by Westley Ferguson on Unsplash It always starts with an innocent observation. “We get a lot of traffic from Boston,” your boss remarks. You naturally throw out a guess or two and discuss why that might be. Until your boss drops the bomb — “Can you dig into that?” Darn it. You walked right into that … Read more The easy way to use Maxmind GeoIP with Redshift

Extracting colours from your images with Image Quantization

magick really does the “Magic!” I have been playing around bit with package “magick”, and I think I am now hooked… Although I haven’t been able to understand everything written in vignette just yet. One of function I got really excited is image_quantize. This function will reduce the number of unique colours used in the … Read more Extracting colours from your images with Image Quantization

What is data?

Musings on information, memory, analytics, and distributions Everything our senses perceive is data, though its storage in our cranial wet stuff leaves something to be desired. Writing it down is a bit more reliable, especially when we write it down on a computer. When those notes are well-organized, we call them data… though I’ve seen … Read more What is data?

Autoencoders for the compression of stock market data

A Pythonic exploration of diverse neural-network autoencoders to reduce the dimensionality of Bitcoin price time series Stock market data space is highly dimensional and, as such, algorithms that try to exploit potential patterns or structure in the price formation can suffer from the so-called “curse of dimensionality”. In this short article, we will explore the potential … Read more Autoencoders for the compression of stock market data

Predicting Breast Cancer with Decision Trees

How to implement decision trees with bagging, boosting and random forest to predict breast cancer from routine blood tests Photo by Hello I’m Nik on Unsplash In a previous post, I introduced the theory of decision trees and its performance can be improved using bagging, boosting or random forests. Now, we implement these techniques to predict … Read more Predicting Breast Cancer with Decision Trees

Recommender Systems and Hyper-parameter tuning

Photo by rawpixel on Unsplash The (often) forgotten child of Machine Learning Everyone with an internet connection has been subjected to a recommender system (RS). Spotify suggestions to Almost all media services have a particular section where the system recommends things to you, being things a movie in Netflix, a product to buy in Amazon, a playlist … Read more Recommender Systems and Hyper-parameter tuning

Data Science and the Paradox of Predictions

Paradox by Nick Youngson How the act of knowing changes what we know. Many data science projects are a hunt for knowledge. As history has taught us through the years, the mere act of knowing can change what it is we believe to know. Professor Harari explores this topic in Homo Deus with the skill we’ve become … Read more Data Science and the Paradox of Predictions

On the role of technology in Regulatory Modernization

The challenge with regulations Regulations are instruments of legislative power and have the force of law. They carry out the intent of corresponding Acts which set out requirements that businesses must adhere to. Regulations are necessary to protect the health, safety and security of individual consumers and the environment as well as to support commerce … Read more On the role of technology in Regulatory Modernization

Window Aggregate operator in batch mode in SQL Server 2019

So this came as a surprise, when working on calculating simple statistics on my dataset, in particular min, max and median. First two are trivial. The last one was the one, that caught my attention. While finding the fastest way on calculating the median (statistic: median) for given dataset, I have stumbled upon an interesting … Read more Window Aggregate operator in batch mode in SQL Server 2019

Improve your workflow by managing your machine learning experiments using Sacred

Model tuning is my least favorite task as a Data Scientist. I hate it. I think it’s because managing the experiments is something that always gets very messy. While searching for tools to help me with that I saw a lot of people mentioning Sacred, so I decided to give it a try. In this … Read more Improve your workflow by managing your machine learning experiments using Sacred

Rcrastinate is moving.

Hi all, this is just an announcement. I am moving Rcrastinate to a blogdown-based solution and am therefore leaving blogger.com. If you’re interested in the new setup and how you could do the same yourself, please check out the all shiny and new Rcrastinate over at http://rcrastinate.rbind.io/ In my first post over there, I am … Read more Rcrastinate is moving.

Factor Analysis in R with Psych Package: Measuring Consumer Involvement

The post Factor Analysis in R with Psych Package: Measuring Consumer Involvement appeared first on The Lucid Manager. The first step for anyone who wants to promote or sell something is to understand the psychology of potential customers. Getting into the minds of consumers is often problematic because measuring psychological traits is a complex task. … Read more Factor Analysis in R with Psych Package: Measuring Consumer Involvement

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes. Load … Read more Are you parallelizing your raster operations? You should!

Deep Q-Network Implementation with SONY’s NNabla

WHAT IS NNABLA? SONY released Neural Network Libraries, in short “NNabla”. NNabla is device-ready and has fast training speed with GPU by efficient memory management. The most interesting feature is that NNabla allows both define-by-run and define-and-run by default. For example, the define-and-run style code looks like below. # build static graph like tensorflowx = nn.Variable((2, … Read more Deep Q-Network Implementation with SONY’s NNabla