How To Create Simple Keyword-based Movie Recommender Models From Scratch

Introduction Have you ever tried to use a movie recommender? In theory, it is something useful that can help figure out what to watch next instead of browsing through Netflix for a few hours, but their results tend to be hit-or-miss. This is a problem that most people can relate to, so I decided to … Read more How To Create Simple Keyword-based Movie Recommender Models From Scratch

The basics of Deep Neural Networks

With the rise of libraries such as Tensorflow 2.0 and Fastai, implementing deep learning has become accessible to so many more people and it helps to understand the fundamentals behind deep neural networks. Hopefully this article will be of help people to people on the path of learning about deep neural networks. Back when I … Read more The basics of Deep Neural Networks

GANs vs. Autoencoders: Comparison of Deep Generative Models

DC-GAN on Anime Dataset The first thing we need to do is create anime directory and download the data. This can either be done from the link above or directly from Amazon Web Services (if this way of accessing the data is still available). # Create anime directory and download from AWS import zipfile!mkdir anime-faces && … Read more GANs vs. Autoencoders: Comparison of Deep Generative Models

Can you accurately predict MLB games based on home and away records?

Major League Baseball Case Study Since each game can reference the overall record, plus the record of the team’s home / away performance, you can think that as a season goes on, the results of the accuracy of the predictions should get better with time. Using the final winning percentages of each team, we can see … Read more Can you accurately predict MLB games based on home and away records?

How to Generate “Summaries” for Review Text without Neural Networks

Xiao MaBlockedUnblockFollowFollowing May 12 As a data scientist working in the consumer goods industry at Clorox, one of my “daily challenges” is to extract useful information from a large collection of user reviews about a specific product quickly and concisely. From its appearance, the task requires some kind of summarization of a large collection of … Read more How to Generate “Summaries” for Review Text without Neural Networks

Using Ant Colony and Genetic Evolution to Optimize Ride-Sharing Trip Duration

Urban transportation is going through a rapid and significant evolution. Since the birth of the Internet and smartphones, we have become increasingly connected and are able to plan and optimize our daily commute. Along with that, large amounts of data are gathered and used to improve the efficiency of existing transportation systems. Real-time ride-sharing companies … Read more Using Ant Colony and Genetic Evolution to Optimize Ride-Sharing Trip Duration

Building Gmail style smart compose with a char ngram language model

“OpenAI built a language model so good, it’s considered too dangerous to release” — Techcrunch Image Source: Open AI’s GPT-2 (SoTA Language Model) Learn how to build a simple and powerful language model and use it for text generation. If you are a Gmail user, by now you would have experienced the smart compose feature (may be even … Read more Building Gmail style smart compose with a char ngram language model

Big Data: Managing The Flow Of Data With Apache NiFi And Apache Kafka

In the Hadoop ecosystem, Apache NiFi is commonly used for the ingestion phase. Apache NiFi offers a scalable way of managing the flow of data between systems. When you’re trying to get information from point A to B, numerous issues can occur. For instance, networks can fail, software crashes, people make mistakes, the data can … Read more Big Data: Managing The Flow Of Data With Apache NiFi And Apache Kafka

A primer on *args, **kwargs, decorators for Data Scientists

What are **kwargs? In simple terms, you can use **kwargs to give an arbitrary number of Keyworded inputs to your function and access them using a dictionary. A simple example: Let’s say you want to create a print function that can take a name and age as input and print that. def myprint(name,age):print(f'{name} is {age} years … Read more A primer on *args, **kwargs, decorators for Data Scientists

Interesting Properties and Use Cases of the Covariance Matrix

Essential information about the covariance matrix for data scientists The covariance matrix has many interesting properties, and it can be found in mixture models, component analysis, Kalman filters, and more. Developing an intuition for how the covariance matrix operates is useful in understanding its practical implications. This article will focus on a few important properties, … Read more Interesting Properties and Use Cases of the Covariance Matrix

How to explain the components of machine learning projects to anyone who’s ever cooked

As a machine learning team lead in the pharmaceutical industry, I often find myself educating non-technical audiences on how machine learning projects work. This analogy to cooking has really resonated with people and helped them understand the role and importance of subject matter expertise, quality data, data engineering and why putting a successful proof-of-concept to … Read more How to explain the components of machine learning projects to anyone who’s ever cooked

Discovering the essential tools for Named Entities Recognition

“The letter is E …. Start!…” — One of my brothers said. We began to crazily write down words that start with E in each category. Everyone wanted to win as many points as possible in that afternoon game. “Stop!!!!!” — My sister announced suddenly — “I’m already done with all the categories”. We stared at each other with disbelief. “Ok. Let’s … Read more Discovering the essential tools for Named Entities Recognition

Hypothesis Testing European Soccer Data Using Python

Home Field Advantage, Ideal Formations, and Inter-League Attributes Explored in Python by Connor Anderson, Kevin Velasco, and Alex Shropshire Introduction & Hypotheses Does the traditional belief in the existence of an underlying advantage for European soccer teams playing at home have statistical significance? What formation has a better overall rate of victory, the 4–4–2 or the … Read more Hypothesis Testing European Soccer Data Using Python

Master Data Management: an Essential Part of Data Strategy

First of all, what is Master Data Management (MDM)? Master data refers to the critical data that are essential to an enterprise’s business and often used in multiple disciplines and departments. MDM is the establishment and maintenance of an enterprise level data service that provides accurate, consistent and complete master data across the enterprise and … Read more Master Data Management: an Essential Part of Data Strategy

Get any US music chart listing from history in your R console

Learn about R’s scraping capabilities and write a simple function to grab a US music chart from any date in the past We are lucky enough to live in an age where we can get pretty much any factoid we want. If we want to find out the Top Billboard 200 albums from 1980, we just … Read more Get any US music chart listing from history in your R console

How to Successfully Install Anaconda on a Mac (and Actually Get it to Work)

You know you need it. As you’re getting started in data science, machine learning, or artificial intelligence, you’re quickly going to realize that you need to be able to use Anaconda. You might want to use Jupyter Notebooks, Spyder, or another awesome program, but one way or another, you’re going to need this thing to … Read more How to Successfully Install Anaconda on a Mac (and Actually Get it to Work)

Don’t Convince Your Boss To Use Machine Learning Until You Have Done This…

“We have to do AI stuff!” “How can we implement AI in our company to bring more profits?” “What machine learning models can we use to solve this problem?” Okay… Maybe you’ve heard of one of these statements or questions from your upper management side (aka your boss). Maybe you’ve faced these questions on a … Read more Don’t Convince Your Boss To Use Machine Learning Until You Have Done This…

Review: QSA+QNT — Neural Network with Incremental Quantization (Biomedical Image Segmentation)

Incremental Quantization on the Neural Network, Act as Regularization Term, Reduce Overfitting A Photo Taken by Me in the Seminar Talk by Author Dr. Yiyu Shi More photos In this story, A paper called “Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation”, by Univerity of Notre Dame and Huazhong University of Science and Technology, … Read more Review: QSA+QNT — Neural Network with Incremental Quantization (Biomedical Image Segmentation)

Implementing a Data Warehouse with Django

In this article, we will cover how to leverage Django and its rest framework to implement a data warehouse. We will particularly focus on data sources that come from external APIs but the same principle could be applied to any other types of data sources: flat files or direct ODBC connections. One of the main … Read more Implementing a Data Warehouse with Django

Building a Multi-label Text Classifier using BERT and TensorFlow

In a multi-label classification problem, the training set is composed of instances each can be assigned with multiple categories represented as a set of target labels and the task is to predict the label set of test data. e.g., A text might be about any of religion, politics, finance or education at the same time … Read more Building a Multi-label Text Classifier using BERT and TensorFlow

Hypothesis Testing European Soccer Data: Home Field Advantage, Ideal Formations, and Inter-League…

Using Python, Historical Match Data, & EA Sports’ FIFA Team Ratings by Connor Anderson, Kevin Velasco, and Alex Shropshire Introduction & Hypotheses Does the traditional belief in the existence of an underlying advantage for European soccer teams playing at home have statistical significance? What formation has a better overall rate of victory, the 4–4–2 or the … Read more Hypothesis Testing European Soccer Data: Home Field Advantage, Ideal Formations, and Inter-League…

Fun with analyzing @BillGates tweets Twitter API’s-Step by Step analysis

This is the 2nd post of the web scraping and API’s series. The first post is here. Please check it out. In this post, we can see how to extract the twitter data using Twitter API’s and then do some basic visualization using word cloud, pie charts and then sentiment analysis using Textblob and Vader. … Read more Fun with analyzing @BillGates tweets Twitter API’s-Step by Step analysis

Learn AI Collaboratively by Saving the Planet

Aren’t you tired of competing with everyone to solve a problem? Don’t get me wrong, competition is important and drives innovation, but if you want to learn, maybe it’s not the best way to do it. That’s why when the people at omdena.com contacted us at Ciencia y Datos to work with them in a … Read more Learn AI Collaboratively by Saving the Planet

Predicting campaign outcome on Kickstarter with classification algorithms

The data for this analysis came from Web Robots site which compiles Kickstarter data on an ongoing basis, I examined the period covering April 2018 through March 2019, for a full year of successful and failed campaigns. An example of a wildly successful Kickstarter campaign. The type of information used was the data that would … Read more Predicting campaign outcome on Kickstarter with classification algorithms

What is Cognitive Computing? How are Enterprises benefitting from Cognitive Technology?

AI has truly been a far-flung goal ever since the conception of computing, and every day we seem to be getting closer and closer to that goal with new cognitive computing models. Coming from the amalgamation of cognitive science and based on the basic premise of simulating the human thought process, the concept, as well … Read more What is Cognitive Computing? How are Enterprises benefitting from Cognitive Technology?

Ethical Storyboarding for Machine Learning

[A comic-style storyboard with scenes of technical ML content and human-machine interactions] Picturing the systems we build within the systems we live Machine learning is gradient-descending its way into more and more places, and with its arrival comes both increasing demand for skilled ML practitioners and increasingly disruptive challenges to the basic assumptions we hold about … Read more Ethical Storyboarding for Machine Learning

It’s 2019 — Make Your Data Visualizations Interactive with Plotly

Find the path to make awesome figures quickly with Express and Cufflinks If you’re still using Matplotlib to make data visualizations in Python, it’s time to check out the view from an interactive visualization library. Plotly allows you to make beautiful, interactive, exportable figures in just a few lines of code. Plotly Express examples from … Read more It’s 2019 — Make Your Data Visualizations Interactive with Plotly

Supercharged Excel for startup analytics with PowerBI

How to use Excel as a Data Analyst and not as a Data Monkey Excel seems to be the most hated tool I ever encountered. That’s a shame because if you look past this bad reputation it’s one of the best tools you can have on your belt for analytics. In my experience, Excel is great … Read more Supercharged Excel for startup analytics with PowerBI

Happiness & GDP per capita in Africa

1. Import Libraries First off, let’s import the required libraries: Pandas for data structuring, Matplotlib and Seaborn for graph plotting and statistics, and GeoPandas for geographical map plotting. 2. Import the Data Let’s import and clean it in order to remove any unwanted variables and to organize the ones we want. , where life ladder is … Read more Happiness & GDP per capita in Africa

Breaking Into Data Science in 2019

Introduction I remember thinking about breaking into data science as if it were yesterday. I had just started my semester abroad in Shanghai and attended several talks and guest lectures about data science and machine learning. However, I had never coded before (except for some basic SQL) and did not really know where to start. … Read more Breaking Into Data Science in 2019

Querying the Premier League using Python and SQL Combined

From Excel to MySQL via Python, and then back to Excel It may be useful sometimes to convert an Excel sheet into a MySQL database. This conversion will enable querying operations to be undertaken using straightforward SQL queries. Using Python, we can simply convert a limited Excel spreadsheet into a MySQL database. To demonstrate how this … Read more Querying the Premier League using Python and SQL Combined

Pre-training BERT from scratch with cloud TPU

Step 1: setting up training environment First and foremost, we get the packages required to train the model. The Jupyter environment allows executing bash commands directly from the notebook by using an exclamation mark ‘!’, like this: !pip install sentencepiece!git clone https://github.com/google-research/bert I will be exploiting this approach to make use of several other bash commands … Read more Pre-training BERT from scratch with cloud TPU

How are the predicted food trends of 2019 holding up so far in the US?

Towards the end of every year, industry experts, local businesses, journalists, basically everybody will try to predict which foods will be popular in the following year. A variety of things are predicted from the next big beverage, to the new ‘kale’, and even to the new hot restaurant trends. There are extensive lists like this … Read more How are the predicted food trends of 2019 holding up so far in the US?

Metrics for Imbalanced Classification

The notion of metrics in Data Science is extremely important. If you don’t know how to estimate current results properly, you are unable to improve them either. The wrong understanding of metrics also leads to the wrong estimate of the model capacity and an insight to the state of the problem. The current story will … Read more Metrics for Imbalanced Classification

The Remarkable world of Recommender Systems

Recommender Systems Recommendation Engines try to make a product or service recommendation to people. In a way, Recommenders try to narrow down choices for people by presenting them with suggestions that they are most likely to buy or use. Recommendation systems are almost everywhere from Amazon to Netflix; from Facebook to Linkedin. In fact, a … Read more The Remarkable world of Recommender Systems

TCAV: Interpretability Beyond Feature Attribution

Working TCAV essentially learns ‘concepts’ from examples. For instance, TCAV needs a couple of examples of ‘female’, and something ‘not female’ to learn a “gender” concept. The goal of TCAV is to determine how much a concept (e.g., gender, race) was important for a prediction in a trained model. …even if the concept was not part … Read more TCAV: Interpretability Beyond Feature Attribution

Truly Understanding the Kernel Trick

Here, we learn the fundamentals behind the Kernel Trick. How it works? How the Kernel Trick does the dot product (or similarity) in infinite dimension without increase in computation? What is a Kernel Trick? In spite of its profound impact on the Machine Learning world, little is found that explains the fundamentals behind the Kernel … Read more Truly Understanding the Kernel Trick

Do not rush to code. 4 principles for AI projects in enterprise.

Think together before doing alone, an ants principle. No the AI doesn’t understand by itself. No Data Science is not automatic. Agile method doesn’t mean chaos. In a word no it’s not magic. What need to be done before rush to code ? Here I share with you 4 principles I learned from my professional and … Read more Do not rush to code. 4 principles for AI projects in enterprise.

ML Models — Prototype to Production

So you have a model, now what? Through the powers of machine learning and the promise of deep learning, today’s conferences, thought leaders and experts in ML and AI have been painting a vision of businesses powered by data. However, despite the groundbreaking research and the constant flood of new papers in the fields of … Read more ML Models — Prototype to Production

AI & Ethics: Are We Making It More Difficult On Ourselves?

Not too long ago we discussed the AI Apocalypse as it pertained to the Facebook #TenYearChallenge. Is Facebook evil? Are we evil for helping usher in our own demise? As we put it: not quite. However, AI & ethics seem inexorably linked and for good reason. This is part of an ongoing series on the … Read more AI & Ethics: Are We Making It More Difficult On Ourselves?

Robot Thinking Will Power New Frontiers in Deep Learning AI

© Agsandrew | Dreamstime.com ID 36480720 Deep learning has advanced to the point where we’re seeing computers do things that would have been considered science fiction just a few years ago. Areas such as language translation, image captioning, picture generation, and facial recognition display major advances on a regular basis. But certain artificial intelligence problems don’t … Read more Robot Thinking Will Power New Frontiers in Deep Learning AI

Rapid Computer Vision Prototyping with Azure

This post has been co-authored by Alex Akulov and Ryan Peyman from Omnia AI, Deloitte Canada’s AI practice Object-detecting trucks using a computer vision model on Azure Custom Vision Imagine building and deploying a state-of-the-art object detection model quickly and without writing a single line of code. Computer Vision tools and software have come a long … Read more Rapid Computer Vision Prototyping with Azure

Comprehensive Introduction to Turing Learning and GANs: Part 2

Building an Image GAN As we have already discussed several times, training a GAN can be frustrating and time-intensive. We will walk through a clean minimal example in Keras. The results are only on the proof-of-concept level to enhance understanding. In the code example, if you don’t tune parameters carefully, you won’t surpass this level (see … Read more Comprehensive Introduction to Turing Learning and GANs: Part 2

The ultimate guide to Google Sheets as a reliable data source

Keep calm and use lots of data validation I occasionally need to grant a non-technical colleague the ability to input information into our data warehouse on an ad-hoc basis. For example, our customer service team at Milk Bar maintains a list of special wedding cake orders in Google Sheets that we need to collect data … Read more The ultimate guide to Google Sheets as a reliable data source

Outlier Detection and Treatment: A Beginner’s Guide

One of the most important steps in data pre-processing is outlier detection and treatment. Machine learning algorithms are very sensitive to the range and distribution of data points. Data outliers can deceive the training process resulting in longer training times and less accurate models. Outliers are defined as samples that are significantly different from the … Read more Outlier Detection and Treatment: A Beginner’s Guide

In the future, you may be fired by an algorithm

Photo by Adam Fossier on Unsplash Algorithms determine the people we meet on Tinder, recognize your face to open the keyless door or fire you when your productivity drops. Machines are used to make decisions about health, employment, education, vital financial and criminal sentencing. Algorithms are used to decide, who gets a job interview, who gets … Read more In the future, you may be fired by an algorithm