Part 1: A neural network from scratch — Foundation

Nov 27, 2018 In this series of articles I will explain the inner workings of a neural network. I will lay the foundation for the theory behind it as well as show how a competent neural network can be written in few and easy to understand lines of Java code. This is the first part … Read morePart 1: A neural network from scratch — Foundation

Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Nov 27, 2018 I recently leveled up to fatherhood. That’s why I am currently on 5 months of parental leave (thank’s to the awesome team @store2be for going along with this!). Every morning at around 5am, I leave the bedroom with my son for the kitchen so his mom can have two real hours of … Read moreMap the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Being a Machine Learning Engineer: 7-months in

What kind of data is there? Is it only numerical? Are there categorical features which could be incorporated into the model? Heads up, categorical features can be considered any type of data which isn’t immediately available in numerical form. In the problem of trying to predict housing prices, you might have number of bathrooms as … Read moreBeing a Machine Learning Engineer: 7-months in

The Power of Data

Reflections on how data (or lack thereof) helps (or fails) policy makers in developing countries Foreword When I stood up to speak last Friday at the Steering Committee meeting between the Ministry of Education of Ivory Coast and TRECC — a partnership for transforming Education in cocoa producing regions, led by the Jacobs Foundation –, it had … Read moreThe Power of Data

Neural Networks I: Notation and building blocks

Gentle introduction on Neural Networks Nov 25, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Neurons Connections Layers — Neurons vs Connections 3.1 Layers of Neurons 3.2. Layers of Connections — PyTorch Example 4. Notation ambiguity: Y = … Read moreNeural Networks I: Notation and building blocks

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read moreExploratory Data Analysis (EDA) techniques for kaggle competition beginners

Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read moreBlockchain can be the new paradigm of the net

Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read moreBlogging with Hugo and Jupyter

Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read moreCombating media bias with AWS Comprehend

Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read moreBecoming An Analytics Manager Isn’t A Promotion.

Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read moreTraining your staff in data science? Here’s how to pick the right programming language

Kaggle: TGS Salt Identification Challenge

Nov 13, 2018 A few weeks ago finished TGS Salt Identification Challenge on the Kaggle, a popular platform for data science competitions. The task was to accurately identify if a subsurface target is a salt or not on seismic images. Our team: Insaf Ashrapov, Mikhail Karchevskiy, Leonid Kozinkin We finished 28th top 1% and would … Read moreKaggle: TGS Salt Identification Challenge

DOGNET: can an AI model fool a human?

The experiment was simple: could a machine learning (ML) model produce Golden Retriever images that people would mistake for being real? The reason for choosing dogs… was because dogs are awesome! In our current climate, we often hear the term ‘fake news’, and with ML models becoming more advanced, their ability to create non-human content … Read moreDOGNET: can an AI model fool a human?

Telling Apart AI and Humans: #2 Photo VS GAN generated image

If you missed the 1st installement of this series, Humans vs Robots is here. Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures. Here is the updated version of that thread. A few of my tips are still … Read moreTelling Apart AI and Humans: #2 Photo VS GAN generated image

Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

Image Source: www.mapr.com/products/apache-hadoop/ There are many links on the web about install Hadoop 3. Many of them are not working well or need improvements. This article is taken from the official documentation and other articles in addition of many answers from Stackoverflow.com Note: All prerequisites must be applied on name node and data nodes First, … Read moreInstalling Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read morePyTorch 101 for Dummies like Me

The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Austrian Quant The Austrian Quant is named after the Austrian School of Economics which serves as the inspiration for how I structured the portfolio. I designed a trading strategy composed of 3 different investment funds to gain a better understanding of investments, machine learning and programming and how they all combine together in the world … Read moreThe Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Getting started – Azure SQL Server Managed Instance

There are a lot of options for data scientists to store data in the Azure cloud. In this blog post I will cover the pros and cons of Azure SQL Server Managed Instance and will provide a few tips so you can hit the ground running if you decide to take it for a test … Read moreGetting started – Azure SQL Server Managed Instance

Industrial strength Natural Language Processing

I have spent much of my career as a graduate student researcher, and now as a Data Scientist in the industry. One thing I have come to realize is that a vast majority of solutions proposed both in academic research papers and in the workplace are just not meant to ship — they just don’t scale! And … Read moreIndustrial strength Natural Language Processing

Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Introduction In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series. The Pieces Ruby Sinatra Google Cloud APIs Line (a chat client) Since I live in Japan: I’ll be using … Read moreBuilding a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Debugging a Machine Learning model written in TensorFlow and Keras

Things that could go wrong, and how to diagnose if they did. Oct 24, 2018 In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don’t judge. Cheat sheet. The numbers refer to sections in this article (https://bit.ly/2PXpzRh) 1 … Read moreDebugging a Machine Learning model written in TensorFlow and Keras

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read moreIntroduction to Linear Regression in Python

A line-by-line layman’s guide to Linear Regression using TensorFlow

Computing the Graph With generate_dataset() and linear_regression(), we are now ready to run the program and begin finding our optimal gradient W and bias b! [line 2, 3] x_batch, y_batch = generate_dataset()x, y, y_pred, loss = linear_regression() In this run() function, we start off by calling generate_dataset() and linear_regression() to get x_batch, y_batch, x, y, y_pred … Read moreA line-by-line layman’s guide to Linear Regression using TensorFlow

Perplexity Intuition (and Derivation)

The perplexity of a discrete probability distribution is defined as: from https://en.wikipedia.org/wiki/Perplexity where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events. In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of bits that we need to … Read morePerplexity Intuition (and Derivation)

Telling Apart AI and Humans: #1 Humans VS Androids

ALife 2018 conference, © Lana Sinapayen Prompted by a video where people thought a human was actually a hyper-realistic robot, I decided to write about how to spot humanoid robots. Here are a few tips! After spending so much time with Alter the android and various hyper-realistic robots, I know a thing or two about … Read moreTelling Apart AI and Humans: #1 Humans VS Androids

Neural Nets: From Linear Regression to Deep Nets

Neural networks, especially deep neural networks, have received a lot of attention over the last couple of years. They perform remarkably well on image and speech recognition and form the backbone of the technology used for self-driving cars. What many people find hard to believe is that the mathematics of neural networks have been around … Read moreNeural Nets: From Linear Regression to Deep Nets

SQL Server

Columnstore A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost. For analytics, a columnstore index offers an order of magnitude better performance than a btree index. Columnstore indexes are the preferred data storage format for data warehousing and analytics workloads. … Read moreSQL Server

The future of data visualization

Tools to shape the future In many product announcements from Google, Apple and BMW, more and more data will be overlaid in our physical environments through augmented reality or projection. That means not only will data be visualized more, but the visual reality around us will be turned into data. Data visualization of a new AR … Read moreThe future of data visualization

Waiting for Weekends: Some Insights on How to Select the Best Wine

There is a huge selection of wines on the market and as for a wine lover it is always a quest to select the best wine. US, France, Spain, Germany and many other wine countries with numerous varieties of wines are easily available in any liquor store. Price can also vary drastically. From my experience, … Read moreWaiting for Weekends: Some Insights on How to Select the Best Wine

The Best Public Datasets for Machine Learning

First, a couple of pointers to keep in mind when searching for datasets. According to Carnegie Mellon University: 1.- A high-quality dataset should not be messy, because you do not want to spend a lot of time cleaning data. 2.- A high-quality dataset should not have too many rows or columns, so it is easy … Read moreThe Best Public Datasets for Machine Learning

The intuition behind Shannon’s Entropy

Now, back to our formula 3.49: The definition of Entropy for a probability distribution (from The Deep Learning Book) I(x) is the information content of X. I(x) itself is a random variable. In our example, the possible outcomes of the War. Thus, H(x) is the expected value of every possible information. Using the definition of expected … Read moreThe intuition behind Shannon’s Entropy

Forecasting Exchange Rates Using ARIMA In Python

Sep 29, 2018 Nearly all sectors use time series data to forecast future time points. Forecasting future can assist analysts and management in making better calculated decisions to maximise returns and minimise risks. I will be demonstrating how we can forecast exchange rates in this article. If you are new to finance and want to … Read moreForecasting Exchange Rates Using ARIMA In Python

‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

I bought a plane ticket to the US with no return flight. I’d been studying for a year and I figured it was about time I started putting my skills into practice. My plan was to rock up to the US and get hired. Then Ashlee messaged me on LinkedIn, “Hey I’ve seen your posts … Read more‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Fast RCNN So the next idea from the same authors: Why not create convolution map of input image and then just select the regions from that convolutional map? Do we really need to run so many convnets? What we can do is run just a single convnet and then apply region proposal crops on the … Read moreObject Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Image Processing Class (EGBE443) #2 -Histogram

Computing the histogram In this section, the histogram was calculated by implementation of python programming code (Python 3.6). For python 3.6, There are a lot of common modules using in image processing such as Pillow, Numpy, OpenCV, etc. but in this program Pillow and Numpy module was used. To import the image from your computer, … Read moreImage Processing Class (EGBE443) #2 -Histogram

Get system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

Hi all, there is a very quick guide how to configure a system monitoring for one or more servers using a modern stack of technologies like Grafana, Docker and Telegraf with Influxdb. The main goal for this article is to show how to start geting system metrics from your servers quick and easy, without spending … Read moreGet system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

How to get fbprophet working on AWS Lambda

Solving package size issues of fbprophet serverless deployment Adi Goldstein / Unsplash I assume you’re reading this post because you’re looking for ways to use the awesome fbprophet (Facebook open source forecasting) library on AWS Lambda and you’re already familiar with the various issues around getting it done. I will be using a python 3.6 … Read moreHow to get fbprophet working on AWS Lambda

Machine Learning – Particle Swarm Optimization (PSO) and Twitter

We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our opinions … Read moreMachine Learning – Particle Swarm Optimization (PSO) and Twitter

Multi-Layer perceptron using Tensorflow

Sep 11, 2018 In this blog, we are going to build a neural network(multilayer perceptron) using TensorFlow and successfully train it to recognize digits in the image. Tensorflow is a very popular deep learning framework released by, and this notebook will guide for build a neural network with this library. If you want to understand … Read moreMulti-Layer perceptron using Tensorflow

Diving into K-Means…

Sep 9, 2018 We have completed our first basic supervised learning model i.e. Linear Regression model in the last post here. Thus in this post we get started with the most basic unsupervised learning algorithm- K-means Clustering. Let’s get started without further ado! Background: K-means clustering as the name itself suggests, is a clustering algorithm, … Read moreDiving into K-Means…

3 approaches for backtesting historical data

Reading and processing data for statistical and quantitative analysis in trading Sep 8, 2018 Anyone interested in the statistical analysis of financial markets has the need to process historical data. Historical data is needed in order to backtest or train: Quantitative trading. Statistical trading. Price action replay/walkthrough. Each need comes from different goals. 3 examples on … Read more3 approaches for backtesting historical data

Microsoft Big Data Overview

https://academy.microsoft.com/en-us/professional-program/tracks/big-data/ Block 1 – Data Fundamentals Learn data science basics. Explore topics like data queries, data analysis, data visualization and how statistics informs data science practices. Please choose from Course 2a or Course 2b to complete the unit. Course 1: Microsoft Professional Program: Introduction to Big Data Course 2a: Analyzing and Visualizing Data with Power … Read moreMicrosoft Big Data Overview

Box Cox Transformation

When we do time series analysis, we are usually interested either in uncovering causal relationships (Does \(X_t\) influence \(Y_{t+1}\)?) or in getting the most accurate forecast possible. Especially in the second case it can be beneficial to transform our historical data to make it easier to extract a signal. A very common transformation is to … Read moreBox Cox Transformation

Why feature weights in a machine learning model are meaningless

Don’t make decisions based on the weights of an ML model Aug 31, 2018 As I see our customers fall in love with BigQuery ML, an old problem rises its head — I find that they can not resist the temptation to assign meaning to feature weights. “The largest weight in my model to predict customer lifetime value,” … Read moreWhy feature weights in a machine learning model are meaningless

Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read moreDoing XGBoost hyper-parameter tuning the smart way — Part 1 of 2