## Feel discouraged on the sparse data in your hand? Give Factorization Machine a shot (2)

By laying a solid foundation of Matrix Factorization, your exploration on a series of advanced models derived from the concept of matrix factorization will be much more smoother, such as LDA, LSI, PLSA and Tensor Factorization and etc. The models derived from the concept of Matrix Factorization In last session, we talked about the basic … Read more Feel discouraged on the sparse data in your hand? Give Factorization Machine a shot (2)

## Python Virtual Environment

Conda How to set up a virtual environments using conda for the Anaconda Python distribution A virtual environment is a named, isolated, working copy of Python that that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environmets … Read more Python Virtual Environment

## “Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

TLDR: Begin with N of 10, increase by 10 until p < 0.05 or max N reached. This design has inflated type-I error. Lower p-value threshold needed to ensure specified type-I error rate. The number of interim analyses and max N affect the type-I error rate. Threshold can be identified using simulation. A recent Facebook … Read more “Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

## Shortcoming of Under-sampling Algorithms: CCMUT and E-CCMUT

What, Why, Possible Solution and Ultimate Utility In one of my previous articles, “Under-sampling : A Performance Booster on Imbalanced Data”: I have applied Cluster Centroid based Majority Under-sampling Technique (CCMUT) on Adult Census Data and proved the Model Performance Improvement w.r.t State-of-the-Art Model, “A Statistical Approach to Adult Census Income Level Prediction”[1]. But there are … Read more Shortcoming of Under-sampling Algorithms: CCMUT and E-CCMUT

## “Artist” in Matplotlib — something I wanted to know before spending tremendous hours on googling…

Originally published at dev.to and modified a bit to fit Medium’s editing system. It’s true that matplotlib is a fantastic visualizing tool in Python. But it’s also true that tweaking details in matplotlib is a real pain. You may easily lose hours to find out how to change a small part of your plot. Sometimes … Read more “Artist” in Matplotlib — something I wanted to know before spending tremendous hours on googling…

## Avoiding Parking Tickets in San Francisco Using Data Analytics

Although still not a perfect predictor, this model was more accurate than the first. The streets identified as best showed much less variability than those of the worst as well. We could also reduce the amount of tickets by over 50% if we chose the best population compared to the worst. Interestingly, parking density was … Read more Avoiding Parking Tickets in San Francisco Using Data Analytics

## Why Kaggle will NOT make you a great data-scientist

Want to be an Eagle or Kaggle data scientist ? There is no doubt that Kaggle is a great place to learn data science. There are many data scientists who invest a lot of time in Kaggle. That is fantastic. But you should not rely only on Kaggle to learn data science skills. And here are … Read more Why Kaggle will NOT make you a great data-scientist

## Comparative study on Classic Machine learning Algorithms

2. Logistic Regression Just like linear regression, Logistic regression is the right algorithm to start with classification algorithms. Eventhough, the name ‘Regression’ comes up, it is not a regression model, but a classification model. It uses a logistic function to frame binary output model. The output of the logistic regression will be a probability (0≤x≤1), … Read more Comparative study on Classic Machine learning Algorithms

## How should we define AI?

In our very first section, we’ll become familiar with the concept of AI by looking into its definition and some examples. As you have probably noticed, AI is currently a “hot topic”: media coverage and public discussion about AI is almost impossible to avoid. However, you may also have noticed that AI means different things … Read more How should we define AI?

## F# Advent Calendar — A Christmas Classifier

The ML.NET Model The model is defined in Program.fs The dataLoader specifies the schema of the input data. Input Data Schema The dataLoader is then used to load the training and test data views. Load Training and Test Data The dataPipeline specifies the transforms that should be applied to the input tsv. Since this is a … Read more F# Advent Calendar — A Christmas Classifier

## Gender Diversity in the R and Python Communities

Many (if not most) tech communities have far more representation from men than from women (and even fewer from nonbinary folk). This is a shame, because everybody uses software, and these projects would self-evidently benefit from the talent and expertise from across the entire community. Some projects are doing better than others, though, and data … Read more Gender Diversity in the R and Python Communities

## I Can Be Your Heroku, Baby

Deploying a Python app in Heroku! Do you like Data Science? <Shakes head up and down> Do you like Data Science DIY deployment? <Shakes head left and right> Me neither. One of the most frustrating parts of early data science learning or personal work is deploying an app through free cloud applications. Your code is juuust … Read more I Can Be Your Heroku, Baby

## Roadmap for Conquering Computer Vision

It has become quite a tradition to write blogs on giving guidelines to ace Machine learning. I have had a hard time finding any such roadmap and to-do list for computer vision. As a vision enthusiast and consultant, I have found a lot of people asking about a concrete roadmap (in terms of skills, courses … Read more Roadmap for Conquering Computer Vision

## How to determine the best model?

Machine learning models play a critical role in many aspects of today’s business. The use of a predictive model can improve the business bottom line, and a slightly improved model can result in an increase of millions of dollars. Although you may not know all the popular algorithms (and more powerful algorithms in the future), … Read more How to determine the best model?

## Image Processing Class (EGBE443) #3 — Point Operation

The implement of the point operation affect on the histogram. Raising the brightness shift the histogram to right and increasing the contrast of the image expand the histogram. These point operations map the intensity by the mapping function contained the constant which is image content such as the highest intensity and the lowest intensity. Automatic … Read more Image Processing Class (EGBE443) #3 — Point Operation

## Part 2: Gradient descent and backpropagation

Dec 3, 2018 In this article you will learn how a neural network can be trained by using backpropagation and stochastic gradient descent. The theories will be described thoroughly and a detailed example calculation is included where both weights and biases is updated. This is the second part in a series of articles: I assume … Read more Part 2: Gradient descent and backpropagation

## Machine Learning Introduction: A Comprehensive Guide

Dec 3, 2018 This is the first of a series of articles in which I will describe machine learning concepts, types, algorithms and python implementations. The main goals of this series are: Creating a comprehesive guide towards machine learning theory and intuition. Sharing and explaining machine learning projects, developed in python, to show in a … Read more Machine Learning Introduction: A Comprehensive Guide

## The Perceptron Algorithm

In my blog post Neural Nets: From Linear Regression to Deep Nets I talked about how a deep neural net is simply a sequence of simple building blocks of the form: $\sigma(\underbrace{w^T}_{weights}x + \overbrace{b}^{bias}) = a$ and that a linear regression model is one of the most basic neural networks where the activation function $$\sigma$$ … Read more The Perceptron Algorithm

## Abstract:

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates Abstract: This post provides an overview of a phenomenon called “Super Convergence” where we can train a deep neural network in order of magnitude faster compared to conventional training methods. One of the key elements is training the network using “One-cycle policy” with maximum … Read more Abstract:

## The Hidden Dangers in Algorithmic Decision Making

A robot judge in Futurama was all fun and games, until COMPAS was created. The quiet revolution of artificial intelligence looks nothing like the way movies predicted; AI seeps into our lives not by overtaking our lives as sentient robots, but instead, steadily creeping into areas of decision-making that were previously exclusive to humans. Because it … Read more The Hidden Dangers in Algorithmic Decision Making

## Because it’s Friday: If planets were as close as the moon

What would the sky look like if Mars, Jupiter, Saturn, or Neptune were as close to us as the Moon is now? Well, other than the global calamity caused by extreme tides and general astrophysical disruption, it looks quite pretty. If Planets were as close as the Moon pic.twitter.com/bBwIPtRQ1J — Physics & Astronomy Zone (@ZonePhysics) … Read more Because it’s Friday: If planets were as close as the moon

## Simulating dinosaur populations, with R

So it turns out that the 1990 Michael Crichton novel Jurassic Park is, indeed, a work of fiction. (Personal note: despite the snark to follow, the book is one of my all-time favorites — I clearly remember devouring it in 24 hours straight while ill in a hostel in France.) If the monsters and melodrama … Read more Simulating dinosaur populations, with R

## Image Processing Class #0.2 — Digital Image

Image File Format Any images are stored in memory, raster image contain pixel values arrange in regular matrix. Conversely, vector image represent geometric objects using continuous coordinates. If you scale up the raster image,resolution of the image will be lost but it does not happen in vector image. Raster image and Vector image Tagged Image File Format … Read more Image Processing Class #0.2 — Digital Image

## Full-Fledged Recommender System

Nov 29, 2018 The rapid rise in AI applications, decreasing processor and memory costs have allowed the last decade to show incredible progress with Recommender Systems. Given their rising importance in the retail industry, they are undoubtedly one of the more popular topics in Artificial Intelligence. https://thedatascientist.com/wp-content/uploads/2018/05/recommender_systems.png However, creating a full-fledged, ready-for-production, recommender system can … Read more Full-Fledged Recommender System

## We are Collage

Dada, Instagram, and the future of AI Collage is the language of the moment, but has been for over 100 years. Lets walk through where it came from (Dada), what it’s up to now (Instagram), and why it’s integral to the future of AI (Deep Fakes, GANS, and the ingrained copy). Yesterday: Dada While the technique … Read more We are Collage

## Attention Seq2Seq with PyTorch: learning to invert a sequence

Nov 29, 2018 TL;DR: In this article you’ll learn how to implement sequence-to-sequence models with and without attention on a simple case: inverting a randomly generated sequence. You might already have come across thousands of articles explaining sequence-to-sequence models and attention mechanisms, but few are illustrated with code snippets. Below is a non-exhaustive list of articles … Read more Attention Seq2Seq with PyTorch: learning to invert a sequence

## Neural Networks II: First Contact

Gentle introduction on Neural Networks Nov 29, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Introduction Forward Pass Backward Propagation Learning Testing Conclusion 1. Introduction In the next illustration, an Artificial Neural Network is … Read more Neural Networks II: First Contact

## Part 1: A neural network from scratch — Foundation

Nov 27, 2018 In this series of articles I will explain the inner workings of a neural network. I will lay the foundation for the theory behind it as well as show how a competent neural network can be written in few and easy to understand lines of Java code. This is the first part … Read more Part 1: A neural network from scratch — Foundation

## What better time than now?

How art and wanting to change the world led me to data science Nov 27, 2018 Data plays a crucial role in understanding the world around us. I’ve been working with data in one way or another since before I could appreciate its value. Now I’m in an immersive data science program. Here’s a little bit … Read more What better time than now?

## Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Nov 27, 2018 I recently leveled up to fatherhood. That’s why I am currently on 5 months of parental leave (thank’s to the awesome team @store2be for going along with this!). Every morning at around 5am, I leave the bedroom with my son for the kitchen so his mom can have two real hours of … Read more Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

## AzureVM: managing virtual machines in Azure

This is the next article in my series on AzureR, a family of packages for working with Azure in R. I’ll give a short introduction on how to use AzureVM to manage Azure virtual machines, and in particular Data Science Virtual Machines (DSVMs). Creating a VM Creating a VM is as simple as using the … Read more AzureVM: managing virtual machines in Azure

## Being a Machine Learning Engineer: 7-months in

What kind of data is there? Is it only numerical? Are there categorical features which could be incorporated into the model? Heads up, categorical features can be considered any type of data which isn’t immediately available in numerical form. In the problem of trying to predict housing prices, you might have number of bathrooms as … Read more Being a Machine Learning Engineer: 7-months in

## The Power of Data

Reflections on how data (or lack thereof) helps (or fails) policy makers in developing countries Foreword When I stood up to speak last Friday at the Steering Committee meeting between the Ministry of Education of Ivory Coast and TRECC — a partnership for transforming Education in cocoa producing regions, led by the Jacobs Foundation –, it had … Read more The Power of Data

## Neural Networks I: Notation and building blocks

Gentle introduction on Neural Networks Nov 25, 2018 This series of posts on Neural Networks are part of the collection of notes during the Facebook PyTorch Challenge, previous to the Deep Learning Nanodegree Program at Udacity. Contents Neurons Connections Layers — Neurons vs Connections 3.1 Layers of Neurons 3.2. Layers of Connections — PyTorch Example 4. Notation ambiguity: Y = … Read more Neural Networks I: Notation and building blocks

## Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read more Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

## Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read more Blockchain can be the new paradigm of the net

## Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read more Blogging with Hugo and Jupyter

## Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read more Combating media bias with AWS Comprehend

## Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read more Becoming An Analytics Manager Isn’t A Promotion.

## Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read more Training your staff in data science? Here’s how to pick the right programming language

## Kaggle: TGS Salt Identification Challenge

Nov 13, 2018 A few weeks ago finished TGS Salt Identification Challenge on the Kaggle, a popular platform for data science competitions. The task was to accurately identify if a subsurface target is a salt or not on seismic images. Our team: Insaf Ashrapov, Mikhail Karchevskiy, Leonid Kozinkin We finished 28th top 1% and would … Read more Kaggle: TGS Salt Identification Challenge

## DOGNET: can an AI model fool a human?

The experiment was simple: could a machine learning (ML) model produce Golden Retriever images that people would mistake for being real? The reason for choosing dogs… was because dogs are awesome! In our current climate, we often hear the term ‘fake news’, and with ML models becoming more advanced, their ability to create non-human content … Read more DOGNET: can an AI model fool a human?

## Telling Apart AI and Humans: #2 Photo VS GAN generated image

If you missed the 1st installement of this series, Humans vs Robots is here. Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures. Here is the updated version of that thread. A few of my tips are still … Read more Telling Apart AI and Humans: #2 Photo VS GAN generated image

## Data Apocalypse!

The future of data storage What is Data? How is it stored, processed, transferred? What is the cloud? Will we eventually run out of space?! These are the questions that populated my fatigued mind as I tried to relax after a long day at the Flatiron School. [Disclaimer: an immersive program will do that you]. As … Read more Data Apocalypse!

Quantum computing is becoming visible in the tech world. There are over a dozen of hardware companies, each trying to build their own quantum computer, from small startups like Xanadu through medium-sized ones like D-Wave or Rigetti to large enterprises like Google, Microsoft or IBM. On top of that there are couple of dozens software … Read more Quantum advantage