Kuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

Plus our VGG-ResNet ensemble model with state-of-the-art results MNIST, a dataset with 70,000 labeled images of handwritten digits, has been one of the most popular datasets for image processing and classification for over twenty years. Despite its popularity, contemporary deep learning algorithms handle it easily, often surpassing an accuracy result of 99.5%. A new paper … Read moreKuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

Learning Data Science on Generic Datasets is Useless

Alright it’s most definitely not useless. But it is far more useless than it needs to be. This article will outline some of the potential roles that data plays in learning data science, with an argument against using generic (and static for that matter) datasets. All too often we see machine learning topics taught on … Read moreLearning Data Science on Generic Datasets is Useless

Closing the Sale: Predicting Home Prices via Linear Regression

Imports, Data Cleansing, and EDA Cleaning and EDA are important for this challenge as this data set contains many ordinal / categorical features that may be important in categorization and will need to be converted to numerical values. As a baseline, I imported the following libraries to clean, explore and model the training data. One of … Read moreClosing the Sale: Predicting Home Prices via Linear Regression

Nobody UNDERSTANDS Me … But Soon, Artificial Intelligence Just Might

Our faces and voices can be analyzed for emotion. As I mentioned, biomimicry, or imitating natural design in the things we create, is critical in recreating this human tendency in AI. Our end goal is artificial empathy, which (for now, at least) describes a machine’s ability to recognize and respond to human emotion. Going in line … Read moreNobody UNDERSTANDS Me … But Soon, Artificial Intelligence Just Might

Pandas for data.table Users

R and Python are both great languages for data analysis. While they are remarkably similar in some aspects, they are drastically different in others. In this post, I will focus on the similarities and differences between Pandas and data.table, two of the most prominent data manipulation packages in Python/R. There is alreay an excellent post … Read morePandas for data.table Users

An introduction to high-dimensional hyper-parameter tuning

If you ever struggled with tuning Machine Learning (ML) models, you are reading the right piece. Hyper-parameter tuning refers to the problem of finding an optimal set of parameter values for a learning algorithm. Usually, the process of choosing these values is a time-consuming task. Even for simple algorithms like Linear Regression, finding the best … Read moreAn introduction to high-dimensional hyper-parameter tuning

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read morePreprocessing with sklearn: a complete and comprehensive guide

IcoOmen: Using Machine Learning to Predict ICO Prices

Methodology Choose inputs and outputs. Collect and aggregate the data. Prepare the data. Explore and attempt to understand the data. Choose a Machine Learning Model. Measure the performance of the Model. Save the Model. Use the Model to make predictions. 1. Choosing Inputs and Outputs Inputs Choosing the right inputs and outputs (in the case of … Read moreIcoOmen: Using Machine Learning to Predict ICO Prices

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read moreHow to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

How to get the most out of Towards Data Science?

Our Readers’ Guide We have received feedback that some of you find it difficult to efficiently navigate our Medium publication. So we have put together a few bullet points that will hopefully aid your experience on our blog. Subscribe to our publication to receive our Monthly Edition and Weekly Selection directly in your mailbox. Follow us … Read moreHow to get the most out of Towards Data Science?

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read moreVaex: Out of Core Dataframes for Python and Fast Visualization

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read moreMusic Genre Classification with Python

Named Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep…

we are going to have a quick look at the architecture of four different state-of-the-art approaches by referring to the actual research paper and then we will move on to implement the one with the highest accuracy. Bidirectional LSTM-CRF: More details and implementation in keras. from the paper(Bidirectional LSTM-CRF Models for Sequence Tagging) 2. Bidirectional LSTM-CNNs: … Read moreNamed Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep…

The Importance of Being Recurrent for Modeling Hierarchical Structure

RNNs have inherent performance limitations For a while, it seemed that RNN’s were taking the Natural Language Processing (NLP) world by storm (from about 2014–17). However, we’ve recently started realizing the limitations of RNN’s, primarily that they are “inefficient and not scalable”. While there is great promise in overcoming these limitations by using more specialized … Read moreThe Importance of Being Recurrent for Modeling Hierarchical Structure

Text Summarization on the Books of Harry Potter

Hermione interrupted them. “Aren’t you two ever going to read Hogwarts, A History?” How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it’s a lot. How many nights do the three of them spend in the library, reading through every book … Read moreText Summarization on the Books of Harry Potter

Supervised Machine Learning: Classification

Machine Learning is the science (and art) of programming computers so they can learn from data. [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. Arthur Samuel, 1959 A better definition: A computer program is said to learn from experience E with respect to some task … Read moreSupervised Machine Learning: Classification

7 Tips to Getting a Data Science Job Faster

Data science is a booming field, but with great publicity comes great difficulty. Breaking into the data science field gets twice as hard each year. The growth of training programs like graduate degrees/certificates and bootcamps far exceeds the growth of new entry-level positions. Prior to 2015, it was a cake-walk to get multiple interviews. Now … Read more7 Tips to Getting a Data Science Job Faster

First Mile

The Electric Pulse Thomas Parker Electric Car (1895) | Fisker Karma (2012) The credit to who invented the first electric vehicle is also debated due to the fact that many scientists and tinkerers were working with various forms of electric sources (batteries and electric motors) around the same time. However, there is a prominent name in electric … Read moreFirst Mile

The Kernel Trick

The Kernel Trick We have seen how higher dimensional transformations can allow us to separate data in order to make classification predictions. It seems that in order to train a support vector classifier and optimize our objective function, we would have to perform operations with the higher dimensional vectors in the transformed feature space. In real … Read moreThe Kernel Trick

Amazon Customer Analysis

User review networks for customer segmentation Over the past decade or two, Americans have continued to prefer payment methods that are traceable, providing retailers and vendors with a rich source of data on their customers. This data is used by data scientists to help businesses make more informed decisions with respect to inventory, marketing, and … Read moreAmazon Customer Analysis

A Guide for Building Convolutional Neural Networks

Computer Vision it at the forefront of advancements in Artificial Intelligence (AI). It’s moving fast with new research coming out each and every day allowing us to do truly amazing things that we could’t do before with computers and AI. Convolutional Neural Networks (CNNs) are the driving force behind every advancement in Computer Vision research … Read moreA Guide for Building Convolutional Neural Networks

The invisible workers of the AI era

50 ways to label data There are different ways to get your data labeled. Some firms label their data themselves — although this can be costly, as hiring people simply for these tasks costs firms both money and flexibility. Other companies even find ways to get people to label their data for free. Ever wonder why Google’s reCAPTCHA … Read moreThe invisible workers of the AI era

AI and Machine Learning: Moving from Training to Education

The debate of whether AI will ever achieve capabilities at par or beyond human intelligence is ever ongoing. It certainly has intensified with the recent advancements in AI, Machine Learning (ML), and Deep Learning (DL) with some believing that the current technologies are already capable of paving the way for Artificial General Intelligence (AGI). You … Read moreAI and Machine Learning: Moving from Training to Education

Towards Ethical Machine Learning

https://initiatives.provost.uci.edu/event/philosophy-machine-learning-knowledge-causality/ I quit my job to enter an intensive data science bootcamp. I understand the value behind the vast amount of data available that enables us to create predictive machine learning algorithms. In addition to recognizing its value on a professional level, I benefit from these technologies as a consumer. Whenever I find myself in … Read moreTowards Ethical Machine Learning

How to give money to the R project

by Mark Niemann-Ross, an author, educator, and writer who teaches about R and Raspberry Pi at LinkedIn Learning I spend a LOT of time at r-project.org, in particular the sections for documentation and CRAN. But I hadn’t spent much time in the other areas: R Project, R Foundation, and links. When I recently wandered into the foundation area, … Read moreHow to give money to the R project

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Lynda.com Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read moreParsing XML, Named Entity Recognition in One-Shot

An introduction to web scraping with Python

Introduction As a data scientist, I often find myself looking for external data sources that could be relevant for my machine learning projects. The problem is that it is uncommon to find open source data sets that perfectly correspond to what you are looking for, or free APIs that give you access to data. In … Read moreAn introduction to web scraping with Python

Top Examples of Why Data Science is Not Just .fit().predict()

In this post, I’m going to review some of the top concepts I learned that turned me from a technical data scientist to a good data scientist Two months ago, I finished my second year as a data scientist at YellowRoad so I decided to do a retrospective analysis on my projects, what did I … Read moreTop Examples of Why Data Science is Not Just .fit().predict()

Pew Study Answers on Artificial Intelligence and the Future of Humans

The AI future is uncertain, but generally, I think it will improve life. I was one of the 900+ futurists interviewed for The Pew Research study released yesterday, “Artificial Intelligence and the Future of Humans.” Conducted with Elon University, the study revolved around AI and the 50th anniversary of the Internet. The report asked three questions … Read morePew Study Answers on Artificial Intelligence and the Future of Humans

Classification (Part 2) — Linear Discriminant Analysis

An explanation of Bayes’ theorem and linear discriminant analysis Photo by Jerry Kiesewetter on Unsplash Overview Previously, logistic regression was introduced for classification. Unfortunately, like any model, it presents some flaws: When classes are well separated, parameters estimate from logistic regression tend to be unstable When the data set is small, logistic regression is also unstable … Read moreClassification (Part 2) — Linear Discriminant Analysis

AWS Architecture For Your Machine Learning Solutions

The Undertaking Recently, I was involved in developing a machine learning solution for one of the largest North American steel manufacturers. The company wanted to leverage the power of ML to get insights on customer segmentation, order prediction and product-volume recommendations. This article revolves around why and how we leveraged AWS for deploying our deliverables … Read moreAWS Architecture For Your Machine Learning Solutions

How to tune a BigQuery ML classification model to achieve a desired precision or recall

Select the probability threshold based on the ROC curve BigQuery provides an incredibly convenient way to train machine learning models on large, structured datasets. In an earlier article, I showed you how to train a classification model to predict flight delays. Here’s the SQL query that will predict whether a flight is going to be late … Read moreHow to tune a BigQuery ML classification model to achieve a desired precision or recall

How to deploy a predictive service to Kubernetes with R and the AzureContainers package

It’s easy to create a function in R, but what if you want to call that function from a different application, with the scale to support a large number of simultaneous requests? This article shows how you can deploy an R fitted model as a Plumber web service in Kubernetes, using Azure Container Registry (ACR) and … Read moreHow to deploy a predictive service to Kubernetes with R and the AzureContainers package

Implementing Defensive Design in AI Deployments

A series of insights and battle scars from the world of medical device design With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also shows up once in a while with our consulting engagements, to a lesser degree, but still demands an … Read moreImplementing Defensive Design in AI Deployments

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read moreObject detection and tracking in PyTorch

10 Lessons Learned From Participating in Google AI Challenge

Key Points of My Work Disclaimers: I will present only a portion of the code I wrote for this competition, my teammates are absolutely not responsible for my awful and buggy code. A portion of this code is inspired by great Kagglers sharing their insights and code in Kaggle kernels and forums. I hope I did … Read more10 Lessons Learned From Participating in Google AI Challenge

AI: the silver bullet to stop Technical Debt from sucking you dry

You’ve heard a lot about student debt, but what about technical debt? It’s Friday evening in the Bahamas. You’re relaxing under a striped red umbrella with a succulent glass of wine and your favorite book — it’s a great read and you love the way the ocean breeze moves the pages like leaves on a tree. As … Read moreAI: the silver bullet to stop Technical Debt from sucking you dry

Pitching Artificial Intelligence to Business People

From silver bullet syndrome to silver linings In this article I plan to share with you our recent experience pitching AI to business folk, and what lessons we learned along the way. As a small firm of AI experts, we follow an awareness marketing approach. Rather than relying solely on one marketing channel, we attend conferences … Read morePitching Artificial Intelligence to Business People

A Thought on Using Machine Learning Models

During my training classes, after/during discussion on the common machine learning models I will usually bring up a topic and that is the usage of insights from these models or the implementation of the model into business /organization process. For instance, we can get the most accurate model where its very good at ‘predicting’ which … Read moreA Thought on Using Machine Learning Models

Improving Patient Flows With Data Science And Analytics

Reducing Costs By Improving Processes Our team was recently asked how data analytics and data science can be used to improve bottlenecks and patient flows in hospitals. Healthcare providers and hospitals can have very complex patient flows. Many steps can intertwine, resources have to shift in between tasks all the time, and severity of patients … Read moreImproving Patient Flows With Data Science And Analytics

How a High School Junior Made a Self-Driving Car

Questions related to this repository from a project I created almost three years ago are among the most numerous questions I receive. The repository itself is really nothing too special, just an implementation of an Nvidia paper that was released about a year prior. A graduate student later managed to implement my code in an … Read moreHow a High School Junior Made a Self-Driving Car

Simpson’s Paradox and Interpreting Data

The challenge of finding the right view through data Edward Hugh Simpson, a statistician and former cryptanalyst at Bletchley Park, described the statistical phenomenon that takes his name in a technical paper in 1951. Simpson’s paradox highlights one of my favourite things about data: the need for good intuition regarding the real world and how most … Read moreSimpson’s Paradox and Interpreting Data

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read moreWord Representation in Natural Language Processing Part II

AlphaZero implementation and tutorial

A walk-through of implementing AlphaZero using custom TensorFlow operations and a custom Python C module I describe here my implementation of the AlphaZero algorithm, available on Github, written in Python with custom Tensorflow GPU operations and a few accessory functions in C for the tree search. The AlphaZero algorithm has gone through three main iterations, first … Read moreAlphaZero implementation and tutorial

TensorFlow Filesystem — Access Tensors Differently

Tensorflow is great. Really, I mean it. The problem is it’s great up to a point. Sometimes you want to do very simple things, but tensorflow is giving you a hard time. The motivation I had behind writing TFFS (TensorFlow File System) can be shared by anyone who has used tensorflow, including you. All I … Read moreTensorFlow Filesystem — Access Tensors Differently

To all Data Scientists — The one Graph Algorithm you need to know

Dec 8, 2018 Photo by Alina Grubnyak on Unsplash Graphs provide us with a very useful data structure. They can help us to find structure within our data. With the advent of Machine learning and big data, we need to get as much information as possible about our data. Learning a little bit of graph theory … Read moreTo all Data Scientists — The one Graph Algorithm you need to know

Beating the Fantasy Premier League game with Python and Data Science

Our Moneyball approach to the EPL Fantasy League My friend and I have been playing the Official Fantasy English Premier League game for many years, and despite our firm belief that we know everything about English soccer, we tend to get “unlucky” year after year and somehow never seem to pick the winning team. So, we … Read moreBeating the Fantasy Premier League game with Python and Data Science