R community update: announcing useR Delhi December meetup and CFP

Time really does fly. It’s been 5 months since Delhi NCR useR group had come into being and our first meetup. It was a successful event which included sessions featuring an R-core member and a veteran data scientist. More importantly, the 50+ community members who’d turned up took part in stimulating discussions and got to … Read moreR community update: announcing useR Delhi December meetup and CFP

Avoiding Parking Tickets in San Francisco Using Data Analytics

Although still not a perfect predictor, this model was more accurate than the first. The streets identified as best showed much less variability than those of the worst as well. We could also reduce the amount of tickets by over 50% if we chose the best population compared to the worst. Interestingly, parking density was … Read moreAvoiding Parking Tickets in San Francisco Using Data Analytics

Comparative study on Classic Machine learning Algorithms

2. Logistic Regression Just like linear regression, Logistic regression is the right algorithm to start with classification algorithms. Eventhough, the name ‘Regression’ comes up, it is not a regression model, but a classification model. It uses a logistic function to frame binary output model. The output of the logistic regression will be a probability (0≤x≤1), … Read moreComparative study on Classic Machine learning Algorithms

F# Advent Calendar — A Christmas Classifier

The ML.NET Model The model is defined in Program.fs The dataLoader specifies the schema of the input data. Input Data Schema The dataLoader is then used to load the training and test data views. Load Training and Test Data The dataPipeline specifies the transforms that should be applied to the input tsv. Since this is a … Read moreF# Advent Calendar — A Christmas Classifier

Gender Diversity in the R and Python Communities

Many (if not most) tech communities have far more representation from men than from women (and even fewer from nonbinary folk). This is a shame, because everybody uses software, and these projects would self-evidently benefit from the talent and expertise from across the entire community. Some projects are doing better than others, though, and data … Read moreGender Diversity in the R and Python Communities

How to determine the best model?

Machine learning models play a critical role in many aspects of today’s business. The use of a predictive model can improve the business bottom line, and a slightly improved model can result in an increase of millions of dollars. Although you may not know all the popular algorithms (and more powerful algorithms in the future), … Read moreHow to determine the best model?

Image Processing Class (EGBE443) #3 — Point Operation

The implement of the point operation affect on the histogram. Raising the brightness shift the histogram to right and increasing the contrast of the image expand the histogram. These point operations map the intensity by the mapping function contained the constant which is image content such as the highest intensity and the lowest intensity. Automatic … Read moreImage Processing Class (EGBE443) #3 — Point Operation

Part 2: Gradient descent and backpropagation

Dec 3, 2018 In this article you will learn how a neural network can be trained by using backpropagation and stochastic gradient descent. The theories will be described thoroughly and a detailed example calculation is included where both weights and biases is updated. This is the second part in a series of articles: I assume … Read morePart 2: Gradient descent and backpropagation

Machine Learning Introduction: A Comprehensive Guide

Dec 3, 2018 This is the first of a series of articles in which I will describe machine learning concepts, types, algorithms and python implementations. The main goals of this series are: Creating a comprehesive guide towards machine learning theory and intuition. Sharing and explaining machine learning projects, developed in python, to show in a … Read moreMachine Learning Introduction: A Comprehensive Guide

The Perceptron Algorithm

In my blog post Neural Nets: From Linear Regression to Deep Nets I talked about how a deep neural net is simply a sequence of simple building blocks of the form: \[\sigma(\underbrace{w^T}_{weights}x + \overbrace{b}^{bias}) = a\] and that a linear regression model is one of the most basic neural networks where the activation function \(\sigma\) … Read moreThe Perceptron Algorithm

Abstract:

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates Abstract: This post provides an overview of a phenomenon called “Super Convergence” where we can train a deep neural network in order of magnitude faster compared to conventional training methods. One of the key elements is training the network using “One-cycle policy” with maximum … Read moreAbstract:

The Hidden Dangers in Algorithmic Decision Making

A robot judge in Futurama was all fun and games, until COMPAS was created. The quiet revolution of artificial intelligence looks nothing like the way movies predicted; AI seeps into our lives not by overtaking our lives as sentient robots, but instead, steadily creeping into areas of decision-making that were previously exclusive to humans. Because it … Read moreThe Hidden Dangers in Algorithmic Decision Making

Because it’s Friday: If planets were as close as the moon

What would the sky look like if Mars, Jupiter, Saturn, or Neptune were as close to us as the Moon is now? Well, other than the global calamity caused by extreme tides and general astrophysical disruption, it looks quite pretty. If Planets were as close as the Moon pic.twitter.com/bBwIPtRQ1J — Physics & Astronomy Zone (@ZonePhysics) … Read moreBecause it’s Friday: If planets were as close as the moon

Image Processing Class #0.2 — Digital Image

Image File Format Any images are stored in memory, raster image contain pixel values arrange in regular matrix. Conversely, vector image represent geometric objects using continuous coordinates. If you scale up the raster image,resolution of the image will be lost but it does not happen in vector image. Raster image and Vector image Tagged Image File Format … Read moreImage Processing Class #0.2 — Digital Image

Full-Fledged Recommender System

Nov 29, 2018 The rapid rise in AI applications, decreasing processor and memory costs have allowed the last decade to show incredible progress with Recommender Systems. Given their rising importance in the retail industry, they are undoubtedly one of the more popular topics in Artificial Intelligence. https://thedatascientist.com/wp-content/uploads/2018/05/recommender_systems.png However, creating a full-fledged, ready-for-production, recommender system can … Read moreFull-Fledged Recommender System

We are Collage

Dada, Instagram, and the future of AI Collage is the language of the moment, but has been for over 100 years. Lets walk through where it came from (Dada), what it’s up to now (Instagram), and why it’s integral to the future of AI (Deep Fakes, GANS, and the ingrained copy). Yesterday: Dada While the technique … Read moreWe are Collage

Attention Seq2Seq with PyTorch: learning to invert a sequence

Nov 29, 2018 TL;DR: In this article you’ll learn how to implement sequence-to-sequence models with and without attention on a simple case: inverting a randomly generated sequence. You might already have come across thousands of articles explaining sequence-to-sequence models and attention mechanisms, but few are illustrated with code snippets. Below is a non-exhaustive list of articles … Read moreAttention Seq2Seq with PyTorch: learning to invert a sequence

Neural Networks II: First Contact

Gentle introduction on Neural Networks Nov 29, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Introduction Forward Pass Backward Propagation Learning Testing Conclusion 1. Introduction In the next illustration, an Artificial Neural Network is … Read moreNeural Networks II: First Contact

Part 1: A neural network from scratch — Foundation

Nov 27, 2018 In this series of articles I will explain the inner workings of a neural network. I will lay the foundation for the theory behind it as well as show how a competent neural network can be written in few and easy to understand lines of Java code. This is the first part … Read morePart 1: A neural network from scratch — Foundation

Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Nov 27, 2018 I recently leveled up to fatherhood. That’s why I am currently on 5 months of parental leave (thank’s to the awesome team @store2be for going along with this!). Every morning at around 5am, I leave the bedroom with my son for the kitchen so his mom can have two real hours of … Read moreMap the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Being a Machine Learning Engineer: 7-months in

What kind of data is there? Is it only numerical? Are there categorical features which could be incorporated into the model? Heads up, categorical features can be considered any type of data which isn’t immediately available in numerical form. In the problem of trying to predict housing prices, you might have number of bathrooms as … Read moreBeing a Machine Learning Engineer: 7-months in

The Power of Data

Reflections on how data (or lack thereof) helps (or fails) policy makers in developing countries Foreword When I stood up to speak last Friday at the Steering Committee meeting between the Ministry of Education of Ivory Coast and TRECC — a partnership for transforming Education in cocoa producing regions, led by the Jacobs Foundation –, it had … Read moreThe Power of Data

Neural Networks I: Notation and building blocks

Gentle introduction on Neural Networks Nov 25, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Neurons Connections Layers — Neurons vs Connections 3.1 Layers of Neurons 3.2. Layers of Connections — PyTorch Example 4. Notation ambiguity: Y = … Read moreNeural Networks I: Notation and building blocks

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read moreExploratory Data Analysis (EDA) techniques for kaggle competition beginners

Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read moreBlockchain can be the new paradigm of the net

Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read moreBlogging with Hugo and Jupyter

Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read moreCombating media bias with AWS Comprehend

Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read moreBecoming An Analytics Manager Isn’t A Promotion.

Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read moreTraining your staff in data science? Here’s how to pick the right programming language

Kaggle: TGS Salt Identification Challenge

Nov 13, 2018 A few weeks ago finished TGS Salt Identification Challenge on the Kaggle, a popular platform for data science competitions. The task was to accurately identify if a subsurface target is a salt or not on seismic images. Our team: Insaf Ashrapov, Mikhail Karchevskiy, Leonid Kozinkin We finished 28th top 1% and would … Read moreKaggle: TGS Salt Identification Challenge

DOGNET: can an AI model fool a human?

The experiment was simple: could a machine learning (ML) model produce Golden Retriever images that people would mistake for being real? The reason for choosing dogs… was because dogs are awesome! In our current climate, we often hear the term ‘fake news’, and with ML models becoming more advanced, their ability to create non-human content … Read moreDOGNET: can an AI model fool a human?

Telling Apart AI and Humans: #2 Photo VS GAN generated image

If you missed the 1st installement of this series, Humans vs Robots is here. Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures. Here is the updated version of that thread. A few of my tips are still … Read moreTelling Apart AI and Humans: #2 Photo VS GAN generated image

Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

Image Source: www.mapr.com/products/apache-hadoop/ There are many links on the web about install Hadoop 3. Many of them are not working well or need improvements. This article is taken from the official documentation and other articles in addition of many answers from Stackoverflow.com Note: All prerequisites must be applied on name node and data nodes First, … Read moreInstalling Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read morePyTorch 101 for Dummies like Me

The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Austrian Quant The Austrian Quant is named after the Austrian School of Economics which serves as the inspiration for how I structured the portfolio. I designed a trading strategy composed of 3 different investment funds to gain a better understanding of investments, machine learning and programming and how they all combine together in the world … Read moreThe Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Getting started – Azure SQL Server Managed Instance

There are a lot of options for data scientists to store data in the Azure cloud. In this blog post I will cover the pros and cons of Azure SQL Server Managed Instance and will provide a few tips so you can hit the ground running if you decide to take it for a test … Read moreGetting started – Azure SQL Server Managed Instance

Industrial strength Natural Language Processing

I have spent much of my career as a graduate student researcher, and now as a Data Scientist in the industry. One thing I have come to realize is that a vast majority of solutions proposed both in academic research papers and in the workplace are just not meant to ship — they just don’t scale! And … Read moreIndustrial strength Natural Language Processing

Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Introduction In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series. The Pieces Ruby Sinatra Google Cloud APIs Line (a chat client) Since I live in Japan: I’ll be using … Read moreBuilding a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Debugging a Machine Learning model written in TensorFlow and Keras

Things that could go wrong, and how to diagnose if they did. Oct 24, 2018 In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don’t judge. Cheat sheet. The numbers refer to sections in this article (https://bit.ly/2PXpzRh) 1 … Read moreDebugging a Machine Learning model written in TensorFlow and Keras

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read moreIntroduction to Linear Regression in Python

A line-by-line layman’s guide to Linear Regression using TensorFlow

Computing the Graph With generate_dataset() and linear_regression(), we are now ready to run the program and begin finding our optimal gradient W and bias b! [line 2, 3] x_batch, y_batch = generate_dataset()x, y, y_pred, loss = linear_regression() In this run() function, we start off by calling generate_dataset() and linear_regression() to get x_batch, y_batch, x, y, y_pred … Read moreA line-by-line layman’s guide to Linear Regression using TensorFlow