Simple House Price Predictor using ML through TensorFlow in Python

The profession of reality is moving into the 21st century, and as you can imagine home listings are flooding the internet. If you have ever looked at buying a home, renting an apartment, or just wanted to see what the most expensive home in town is (we have all been there), then chances are you … Read more

Introduction to Interactive Time Series Visualizations with Plotly in Python

Introduction to Plotly Plotly is a company that makes visualization tools including a Python API library. (Plotly also makes Dash, a framework for building interactive web-based applications with Python code). For this article, we’ll stick to working with the plotly Python library in a Jupyter Notebook and touching up images in the online plotly editor. When … Read more

Introduction to Web Scraping with BeautifulSoup

Find specific elements in the page The created BeautifulSoup object can now be used to find elements in the HTML. When we inspected the website we saw that every list item in the content section has a class that starts with tocsection- and we can us BeautifulSoup’s find_all method to find all list items with that … Read more

The complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

A recurring subject in NLP is to understand large corpus of texts through topic extraction. Whether you analyze users’ online reviews, product descriptions, or text entered in search bars, understanding key topics will always come in handy. Popular picture explaining LDA Before going into the LDA method, let me remind you that not reinventing the … Read more

Unpacking (**PCA)

Alright, better to implement PCA to get the image. Let’s start by making 5 *10 matrix, and take steps of the process. Matrix X import numpy as npX = np.random.rand(5,10) The column are variables (characteristics) and the row are samples(say, ‘cat’ or ‘dog’). What we want to do with this matrix is to get eigenvalues … Read more

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read more

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read more

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read more

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read more

Text Summarization on the Books of Harry Potter

Hermione interrupted them. “Aren’t you two ever going to read Hogwarts, A History?” How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it’s a lot. How many nights do the three of them spend in the library, reading through every book … Read more

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Lynda.com Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more

An introduction to web scraping with Python

Introduction As a data scientist, I often find myself looking for external data sources that could be relevant for my machine learning projects. The problem is that it is uncommon to find open source data sets that perfectly correspond to what you are looking for, or free APIs that give you access to data. In … Read more

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read more

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read more

TensorFlow Filesystem — Access Tensors Differently

Tensorflow is great. Really, I mean it. The problem is it’s great up to a point. Sometimes you want to do very simple things, but tensorflow is giving you a hard time. The motivation I had behind writing TFFS (TensorFlow File System) can be shared by anyone who has used tensorflow, including you. All I … Read more

Beating the Fantasy Premier League game with Python and Data Science

Our Moneyball approach to the EPL Fantasy League My friend and I have been playing the Official Fantasy English Premier League game for many years, and despite our firm belief that we know everything about English soccer, we tend to get “unlucky” year after year and somehow never seem to pick the winning team. So, we … Read more

A short guide to using Docker for your data science environment

WHY One of the most time consuming part of starting your work on a new system/starting a new job or just plain sharing your work is the variation of tools available (or lack thereof) due to differences in hardware/software/security policies and what not. Containerization has risen up in recent years as a ready to use … Read more

I Can Be Your Heroku, Baby

Deploying a Python app in Heroku! Do you like Data Science? <Shakes head up and down> Do you like Data Science DIY deployment? <Shakes head left and right> Me neither. One of the most frustrating parts of early data science learning or personal work is deploying an app through free cloud applications. Your code is juuust … Read more

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read more

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read more

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read more

The intuition behind Shannon’s Entropy

Now, back to our formula 3.49: The definition of Entropy for a probability distribution (from The Deep Learning Book) I(x) is the information content of X. I(x) itself is a random variable. In our example, the possible outcomes of the War. Thus, H(x) is the expected value of every possible information. Using the definition of expected … Read more

Forecasting Exchange Rates Using ARIMA In Python

Sep 29, 2018 Nearly all sectors use time series data to forecast future time points. Forecasting future can assist analysts and management in making better calculated decisions to maximise returns and minimise risks. I will be demonstrating how we can forecast exchange rates in this article. If you are new to finance and want to … Read more

Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read more

Automatic Image Quality Assessment in Python

Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read more

Google’s AutoML Killer: Auto-Keras Opensource Automated ML

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search … Read more

Categories Python Excerpt

Multiplicative RNN-LSTM for Sequence-based Recommenders

Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost everyone in different domains like e-commerce, social media, or entertainment. Quite often the dimension of time plays a dominant role in the generation of a relevant recommendation. Which user interaction occurred just before … Read more

Categories Python Excerpt

A Guide to Restricted Boltzmann Machines Using Pytorch

A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output layer. The other key difference is that all the hidden and visible nodes are all connected with each other. Due to this interconnection, Boltzmann machines … Read more

Categories Python Excerpt

PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset

Jul 8, 2018 In this tutorial we will discuss about integrating PySpark and XGBoost using a standard machine learing pipeline. We will use data from the Titanic: Machine learning from disaster one of the many Kaggle competitions. Before getting started please know that you should be familiar with Apache Spark and Xgboost and Python. The … Read more

R vs Python: Image Classification with Keras

Many data professionals are strict on the language to be used for ANN models limiting their dev. environment exclusively to Python. I decided to test performance of Python vs. R in terms of time required to train a convolutional neural network based model for image recognition. As the starting point, I took the blog post … Read more

Automatic GPUs

A reproducible R / Python approach to getting up and running quickly on GCloud with GPUs in Tensorflow “A high view of a sea of clouds covering a mountain valley in the Dolomites” by paul morris on Unsplash Backstory After completing Google’s excellent Data Engineering Certified Specialization on Coursera recently (*which I highly recommend), I … Read more

Python WebServer With Flask and Raspberry Pi

Let’s create a simple WebServer to control things in your home. There are a lot of ways to do that. For example, on my tutorial: IoT — Controlling a Raspberry Pi Robot Over Internet With HTML and Shell Scripts Only, we have explored how to control a robot over the local network using the LIGHTTPD WebServer. For … Read more

AWS EC2 for Beginners

Discover why you should use Amazon Web Services Elastic Compute Cloud (EC2) and how you can set up a basic data science environment on a Windows Virtual Machine (Windows Server). There are times when one is limited by the capabilities of a desktop or laptop. Suppose a data scientist has a large dataset that they … Read more

Visualising high-dimensional datasets using PCA and t-SNE in Python

Oct 29, 2016 Update: April 29, 2019. Updated some of the code to not use ggplot but instead use seaborn and matplotlib. I also added an example for a 3d-plot. I also changed the syntax to work with Python3. The first step around any data related challenge is to start by exploring the data itself. … Read more