Full-Fledged Recommender System

Nov 29, 2018 The rapid rise in AI applications, decreasing processor and memory costs have allowed the last decade to show incredible progress with Recommender Systems. Given their rising importance in the retail industry, they are undoubtedly one of the more popular topics in Artificial Intelligence. https://thedatascientist.com/wp-content/uploads/2018/05/recommender_systems.png However, creating a full-fledged, ready-for-production, recommender system can … Read more Full-Fledged Recommender System

We are Collage

Dada, Instagram, and the future of AI Collage is the language of the moment, but has been for over 100 years. Lets walk through where it came from (Dada), what it’s up to now (Instagram), and why it’s integral to the future of AI (Deep Fakes, GANS, and the ingrained copy). Yesterday: Dada While the technique … Read more We are Collage

Attention Seq2Seq with PyTorch: learning to invert a sequence

Nov 29, 2018 TL;DR: In this article you’ll learn how to implement sequence-to-sequence models with and without attention on a simple case: inverting a randomly generated sequence. You might already have come across thousands of articles explaining sequence-to-sequence models and attention mechanisms, but few are illustrated with code snippets. Below is a non-exhaustive list of articles … Read more Attention Seq2Seq with PyTorch: learning to invert a sequence

Neural Networks II: First Contact

Gentle introduction on Neural Networks Nov 29, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Introduction Forward Pass Backward Propagation Learning Testing Conclusion 1. Introduction In the next illustration, an Artificial Neural Network is … Read more Neural Networks II: First Contact

Part 1: A neural network from scratch — Foundation

Nov 27, 2018 In this series of articles I will explain the inner workings of a neural network. I will lay the foundation for the theory behind it as well as show how a competent neural network can be written in few and easy to understand lines of Java code. This is the first part … Read more Part 1: A neural network from scratch — Foundation

Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Nov 27, 2018 I recently leveled up to fatherhood. That’s why I am currently on 5 months of parental leave (thank’s to the awesome team @store2be for going along with this!). Every morning at around 5am, I leave the bedroom with my son for the kitchen so his mom can have two real hours of … Read more Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Being a Machine Learning Engineer: 7-months in

What kind of data is there? Is it only numerical? Are there categorical features which could be incorporated into the model? Heads up, categorical features can be considered any type of data which isn’t immediately available in numerical form. In the problem of trying to predict housing prices, you might have number of bathrooms as … Read more Being a Machine Learning Engineer: 7-months in

The Power of Data

Reflections on how data (or lack thereof) helps (or fails) policy makers in developing countries Foreword When I stood up to speak last Friday at the Steering Committee meeting between the Ministry of Education of Ivory Coast and TRECC — a partnership for transforming Education in cocoa producing regions, led by the Jacobs Foundation –, it had … Read more The Power of Data

Neural Networks I: Notation and building blocks

Gentle introduction on Neural Networks Nov 25, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Neurons Connections Layers — Neurons vs Connections 3.1 Layers of Neurons 3.2. Layers of Connections — PyTorch Example 4. Notation ambiguity: Y = … Read more Neural Networks I: Notation and building blocks

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read more Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read more Blockchain can be the new paradigm of the net

Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read more Blogging with Hugo and Jupyter

Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read more Combating media bias with AWS Comprehend

Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read more Becoming An Analytics Manager Isn’t A Promotion.

Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read more Training your staff in data science? Here’s how to pick the right programming language

Kaggle: TGS Salt Identification Challenge

Nov 13, 2018 A few weeks ago finished TGS Salt Identification Challenge on the Kaggle, a popular platform for data science competitions. The task was to accurately identify if a subsurface target is a salt or not on seismic images. Our team: Insaf Ashrapov, Mikhail Karchevskiy, Leonid Kozinkin We finished 28th top 1% and would … Read more Kaggle: TGS Salt Identification Challenge

DOGNET: can an AI model fool a human?

The experiment was simple: could a machine learning (ML) model produce Golden Retriever images that people would mistake for being real? The reason for choosing dogs… was because dogs are awesome! In our current climate, we often hear the term ‘fake news’, and with ML models becoming more advanced, their ability to create non-human content … Read more DOGNET: can an AI model fool a human?

Telling Apart AI and Humans: #2 Photo VS GAN generated image

If you missed the 1st installement of this series, Humans vs Robots is here. Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures. Here is the updated version of that thread. A few of my tips are still … Read more Telling Apart AI and Humans: #2 Photo VS GAN generated image

Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

Image Source: www.mapr.com/products/apache-hadoop/ There are many links on the web about install Hadoop 3. Many of them are not working well or need improvements. This article is taken from the official documentation and other articles in addition of many answers from Stackoverflow.com Note: All prerequisites must be applied on name node and data nodes First, … Read more Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read more PyTorch 101 for Dummies like Me

The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Austrian Quant The Austrian Quant is named after the Austrian School of Economics which serves as the inspiration for how I structured the portfolio. I designed a trading strategy composed of 3 different investment funds to gain a better understanding of investments, machine learning and programming and how they all combine together in the world … Read more The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Getting started – Azure SQL Server Managed Instance

There are a lot of options for data scientists to store data in the Azure cloud. In this blog post I will cover the pros and cons of Azure SQL Server Managed Instance and will provide a few tips so you can hit the ground running if you decide to take it for a test … Read more Getting started – Azure SQL Server Managed Instance

Industrial strength Natural Language Processing

I have spent much of my career as a graduate student researcher, and now as a Data Scientist in the industry. One thing I have come to realize is that a vast majority of solutions proposed both in academic research papers and in the workplace are just not meant to ship — they just don’t scale! And … Read more Industrial strength Natural Language Processing

Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Introduction In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series. The Pieces Ruby Sinatra Google Cloud APIs Line (a chat client) Since I live in Japan: I’ll be using … Read more Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Debugging a Machine Learning model written in TensorFlow and Keras

Things that could go wrong, and how to diagnose if they did. Oct 24, 2018 In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don’t judge. Cheat sheet. The numbers refer to sections in this article (https://bit.ly/2PXpzRh) 1 … Read more Debugging a Machine Learning model written in TensorFlow and Keras

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read more Introduction to Linear Regression in Python

A line-by-line layman’s guide to Linear Regression using TensorFlow

Computing the Graph With generate_dataset() and linear_regression(), we are now ready to run the program and begin finding our optimal gradient W and bias b! [line 2, 3] x_batch, y_batch = generate_dataset()x, y, y_pred, loss = linear_regression() In this run() function, we start off by calling generate_dataset() and linear_regression() to get x_batch, y_batch, x, y, y_pred … Read more A line-by-line layman’s guide to Linear Regression using TensorFlow

Perplexity Intuition (and Derivation)

The perplexity of a discrete probability distribution is defined as: from https://en.wikipedia.org/wiki/Perplexity where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events. In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of bits that we need to … Read more Perplexity Intuition (and Derivation)

Telling Apart AI and Humans: #1 Humans VS Androids

ALife 2018 conference, © Lana Sinapayen Prompted by a video where people thought a human was actually a hyper-realistic robot, I decided to write about how to spot humanoid robots. Here are a few tips! After spending so much time with Alter the android and various hyper-realistic robots, I know a thing or two about … Read more Telling Apart AI and Humans: #1 Humans VS Androids

Neural Nets: From Linear Regression to Deep Nets

Neural networks, especially deep neural networks, have received a lot of attention over the last couple of years. They perform remarkably well on image and speech recognition and form the backbone of the technology used for self-driving cars. What many people find hard to believe is that the mathematics of neural networks have been around … Read more Neural Nets: From Linear Regression to Deep Nets

SQL Server

Columnstore A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost. For analytics, a columnstore index offers an order of magnitude better performance than a btree index. Columnstore indexes are the preferred data storage format for data warehousing and analytics workloads. … Read more SQL Server

The future of data visualization

Tools to shape the future In many product announcements from Google, Apple and BMW, more and more data will be overlaid in our physical environments through augmented reality or projection. That means not only will data be visualized more, but the visual reality around us will be turned into data. Data visualization of a new AR … Read more The future of data visualization

Waiting for Weekends: Some Insights on How to Select the Best Wine

There is a huge selection of wines on the market and as for a wine lover it is always a quest to select the best wine. US, France, Spain, Germany and many other wine countries with numerous varieties of wines are easily available in any liquor store. Price can also vary drastically. From my experience, … Read more Waiting for Weekends: Some Insights on How to Select the Best Wine

The Best Public Datasets for Machine Learning

First, a couple of pointers to keep in mind when searching for datasets. According to Carnegie Mellon University: 1.- A high-quality dataset should not be messy, because you do not want to spend a lot of time cleaning data. 2.- A high-quality dataset should not have too many rows or columns, so it is easy … Read more The Best Public Datasets for Machine Learning

The intuition behind Shannon’s Entropy

Now, back to our formula 3.49: The definition of Entropy for a probability distribution (from The Deep Learning Book) I(x) is the information content of X. I(x) itself is a random variable. In our example, the possible outcomes of the War. Thus, H(x) is the expected value of every possible information. Using the definition of expected … Read more The intuition behind Shannon’s Entropy

Forecasting Exchange Rates Using ARIMA In Python

Sep 29, 2018 Nearly all sectors use time series data to forecast future time points. Forecasting future can assist analysts and management in making better calculated decisions to maximise returns and minimise risks. I will be demonstrating how we can forecast exchange rates in this article. If you are new to finance and want to … Read more Forecasting Exchange Rates Using ARIMA In Python

‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

I bought a plane ticket to the US with no return flight. I’d been studying for a year and I figured it was about time I started putting my skills into practice. My plan was to rock up to the US and get hired. Then Ashlee messaged me on LinkedIn, “Hey I’ve seen your posts … Read more ‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Fast RCNN So the next idea from the same authors: Why not create convolution map of input image and then just select the regions from that convolutional map? Do we really need to run so many convnets? What we can do is run just a single convnet and then apply region proposal crops on the … Read more Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Image Processing Class (EGBE443) #2 -Histogram

Computing the histogram In this section, the histogram was calculated by implementation of python programming code (Python 3.6). For python 3.6, There are a lot of common modules using in image processing such as Pillow, Numpy, OpenCV, etc. but in this program Pillow and Numpy module was used. To import the image from your computer, … Read more Image Processing Class (EGBE443) #2 -Histogram

Get system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

Hi all, there is a very quick guide how to configure a system monitoring for one or more servers using a modern stack of technologies like Grafana, Docker and Telegraf with Influxdb. The main goal for this article is to show how to start geting system metrics from your servers quick and easy, without spending … Read more Get system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

How to get fbprophet working on AWS Lambda

Solving package size issues of fbprophet serverless deployment Adi Goldstein / Unsplash I assume you’re reading this post because you’re looking for ways to use the awesome fbprophet (Facebook open source forecasting) library on AWS Lambda and you’re already familiar with the various issues around getting it done. I will be using a python 3.6 … Read more How to get fbprophet working on AWS Lambda

Machine Learning – Particle Swarm Optimization (PSO) and Twitter

We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our opinions … Read more Machine Learning – Particle Swarm Optimization (PSO) and Twitter

Multi-Layer perceptron using Tensorflow

Sep 11, 2018 In this blog, we are going to build a neural network(multilayer perceptron) using TensorFlow and successfully train it to recognize digits in the image. Tensorflow is a very popular deep learning framework released by, and this notebook will guide for build a neural network with this library. If you want to understand … Read more Multi-Layer perceptron using Tensorflow

Diving into K-Means…

Sep 9, 2018 We have completed our first basic supervised learning model i.e. Linear Regression model in the last post here. Thus in this post we get started with the most basic unsupervised learning algorithm- K-means Clustering. Let’s get started without further ado! Background: K-means clustering as the name itself suggests, is a clustering algorithm, … Read more Diving into K-Means…

3 approaches for backtesting historical data

Reading and processing data for statistical and quantitative analysis in trading Sep 8, 2018 Anyone interested in the statistical analysis of financial markets has the need to process historical data. Historical data is needed in order to backtest or train: Quantitative trading. Statistical trading. Price action replay/walkthrough. Each need comes from different goals. 3 examples on … Read more 3 approaches for backtesting historical data