Master Python through building real-world applications (Part 4)

Build and Deploy a Website using Flask and Heroku App Every once in a while, there comes a new programming language and along with that great community to support that. Python has been around for a while now so it is safe for me to say that Python is not a language, it is a religion. … Read more Master Python through building real-world applications (Part 4)

Monty Hall Problem using Python

Understanding mathematical proofs with the help of programming We have all heard the probability brain teaser for the three door game show. Each contestant guesses whats behind the door, the show host reveals one of the three doors that didn’t have the prize and gives an opportunity to the contestant to switch doors. It is … Read more Monty Hall Problem using Python

Extract features of Music

Different type of audio features and how to extract them. Dec 30, 2018 MFCC feature extraction Extraction of features is a very important part in analyzing and finding relations between different things. The data provided of audio cannot be understood by the models directly to convert them into an understandable format feature extraction is used. It … Read more Extract features of Music

2. Machine Learning 101 – Problem solving workflow

How do you go from raw data to a fully working machine learning solutions? Dec 30, 2018 If you are a software engineer, I’m sure at some point you wanted to do ‘some machine learning’, crack the secrets of the Universe and find the ultimate answer to life, the Universe and everything. However, machine learning … Read more 2. Machine Learning 101 – Problem solving workflow

Total Least Squares in comparison with OLS and ODR

Total least squares(aka TLS) is one of the methods of regression analysis to minimize the sum of squared errors between response variable(or, an observation) and estimated variable(we often say a fitted value). The most popular and standard methods of this is Ordinary least squares(aka OLS) for the same purpose, and TLS is one of other … Read more Total Least Squares in comparison with OLS and ODR

Spotify ReWrapped

Spotify surprises us every December with their cool end-of-the-year specials. Nevertheless, this year some of the reports smelled fishy. This humble Medium account decided to investigate. Photoshop skills: level 9000 Every year I expect Spotify’s summary. 2016 came with the usual top 5 of songs, artists and genres. The amount of minutes spent listening to music, … Read more Spotify ReWrapped

Web Scraping using Selenium and BeautifulSoup

Scrape data using Selenium Selenium is able to simulate the browser, and so we can make it wait until the page finished loading before we are getting the data. First we will import the libraries needed for scraping and processing the webdata. We will also define the url of the website we want to scrape the … Read more Web Scraping using Selenium and BeautifulSoup

How fuzzy matching improve your NLP model

One of the challenge when dealing with NLP tasks is text fuzzy matching alignment. You can still build your NLP model when skipping this text process text but the trade-off is you may not achieve good result. Someone may argue that there is not necessary to have preprocessing when using deep learning. From my experience, … Read more How fuzzy matching improve your NLP model

Let’s Find Donors For Charity With Machine Learning Models

An application of Supervised Learning Algorithms Somewhere in the Philippines Welcome to my second medium post about Data Science. I will write here about a project I’ve done using Machine Learning algorithms. I will explain what I did without relying heavily on technical language, but I will show snippets of my code. Code matters 🙂 The … Read more Let’s Find Donors For Charity With Machine Learning Models

Andrew Ng’s Machine Learning Course in Python (Neural Networks)

In assignment 4, we worked towards implementing a neural network from scratch. We start off by computing the cost function and gradient of theta. def sigmoidGradient(z):”””computes the gradient of the sigmoid function”””sigmoid = 1/(1 + np.exp(-z))return sigmoid *(1-sigmoid) def nnCostFunction(nn_params,input_layer_size, hidden_layer_size, num_labels,X, y,Lambda):”””nn_params contains the parameters unrolled into a vectorcompute the cost and gradient of … Read more Andrew Ng’s Machine Learning Course in Python (Neural Networks)

How I got Matplotlib to plot Apple Color Emojis

And why the library currently cannot Are you a data scientist who is interested in analyzing and visualizing text messages or any other conversational medium that may include emojis? Your plotting options may be limited. This post investigates why the popular Python library Matplotlib cannot plot emojis from the Apple Color Emoji font, and how … Read more How I got Matplotlib to plot Apple Color Emojis

Are new movies longer than they were 10, 20, 50 year ago?

Crunching data from IMDb.com If you like to watch movies and I mean a lot of movies, there is a chance that you noticed that movies are getting longer and longer nowadays. When was the last time you went to the cinema and watched blockbuster which was shorter than 120 minutes? More and more movies (thank … Read more Are new movies longer than they were 10, 20, 50 year ago?

Building and Testing Recommender Systems With Surprise, Step-By-Step

Learn how to build your own recommendation engine with the help of Python and Surprise Library, Collaborative Filtering Recommender systems are one of the most common used and easily understandable applications of data science. Lots of work has been done on this topic, the interest and demand in this area remains very high because of … Read more Building and Testing Recommender Systems With Surprise, Step-By-Step

Web Scraping Apartment Listings in Stockholm

Me and by partner have sold our apartment and are in search of a new apartment and since the majority of the people searching for a new apartment manually go through https://www.hemnet.se/. This, to me, seems to tedious and exhausting, so I thought — why not use my Python knowledge and bottomless crave for these types of … Read more Web Scraping Apartment Listings in Stockholm

I Worked With A Data Scientist, Here’s What I Learned.

Background In late 2017, I started to develop interest in the Machine Learning field. I talked about my experience when I started my journey. In summary, it has been filled with fun challenges and lots of learning. I am an Android Engineer, and this is my experience working on ML projects with our data scientist. … Read more I Worked With A Data Scientist, Here’s What I Learned.

Python Plotting API: Expose your scientific python plots through a flask API

In my daily work as a data scientist, I often have the need to integrate relatively complex plots into back-office applications. These plots are mainly used to illustrate algorithmic decisions and give data intuitions to operational departments. A possible approach here would be to build an API that returns data and let the front-end of … Read more Python Plotting API: Expose your scientific python plots through a flask API

A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

2. Categorical Analysis We can start reading the data using pd.read_csv() . By doing a .head() on the data frame, we could have a quick peek at the top 5 rows of our data. For those who are not familiar with pandas or the concept of a data frame, I would highly recommend spending half a day … Read more A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

Understand Text Summarization and create your own summarizer in python

How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural … Read more Understand Text Summarization and create your own summarizer in python

Simulating Tennis Matches with Python or Moneyball for Tennis

Control flow: The code above has a section which runs all the code. We consider default values for most of the important parameters such as player name, ps1 and ps2, and bigpoint1 and bigpoint2. I liked to think of ps1 and ps2 as first serve percentage but we can do a lot of interesting feature … Read more Simulating Tennis Matches with Python or Moneyball for Tennis

Word2Vec For Phrases — Learning Embeddings For More Than One Word

Learning Phrases From Unsupervised Text (Collocation Extraction) We can easily create bi-grams with our unsupervised corpus and take it as an input to Word2Vec. For example, the sentence “I walked today to the park” will be converted to “I_walked walked_today today_to to_the the_park” and each bi-gram will be treated as a uni-gram in the Word2Vec … Read more Word2Vec For Phrases — Learning Embeddings For More Than One Word

Distributed TensorFlow using Horovod

Reduce training time for deep neural networks by using many GPUs Marenostrum Supercomputer — Barcelona Supercomputing Center https://bsc.es (This post will be used in my master course SA-MIRI at UPC Barcelona Tech with the support of Barcelona Supercomputing Center) “Methods that scale with computation are the future of Artificial Intelligence” — Rich Sutton, father of reinforcement learning (video 4:49) In … Read more Distributed TensorFlow using Horovod

Develop a NLP Model in Python & Deploy It with Flask, Step by Step

Flask API, Document Classification, Spam Filter By far, we have developed many machine learning models, generated numeric predictions on the testing data, and tested the results. And we did everything offline. In reality, generating predictions is only part of a machine learning project, although it is the most important part in my opinion. Considering a system … Read more Develop a NLP Model in Python & Deploy It with Flask, Step by Step

Simple House Price Predictor using ML through TensorFlow in Python

The profession of reality is moving into the 21st century, and as you can imagine home listings are flooding the internet. If you have ever looked at buying a home, renting an apartment, or just wanted to see what the most expensive home in town is (we have all been there), then chances are you … Read more Simple House Price Predictor using ML through TensorFlow in Python

Introduction to Interactive Time Series Visualizations with Plotly in Python

Introduction to Plotly Plotly is a company that makes visualization tools including a Python API library. (Plotly also makes Dash, a framework for building interactive web-based applications with Python code). For this article, we’ll stick to working with the plotly Python library in a Jupyter Notebook and touching up images in the online plotly editor. When … Read more Introduction to Interactive Time Series Visualizations with Plotly in Python

Introduction to Web Scraping with BeautifulSoup

Find specific elements in the page The created BeautifulSoup object can now be used to find elements in the HTML. When we inspected the website we saw that every list item in the content section has a class that starts with tocsection- and we can us BeautifulSoup’s find_all method to find all list items with that … Read more Introduction to Web Scraping with BeautifulSoup

The complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

A recurring subject in NLP is to understand large corpus of texts through topic extraction. Whether you analyze users’ online reviews, product descriptions, or text entered in search bars, understanding key topics will always come in handy. Popular picture explaining LDA Before going into the LDA method, let me remind you that not reinventing the … Read more The complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read more Preprocessing with sklearn: a complete and comprehensive guide

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read more How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read more Vaex: Out of Core Dataframes for Python and Fast Visualization

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read more Music Genre Classification with Python

Text Summarization on the Books of Harry Potter

Hermione interrupted them. “Aren’t you two ever going to read Hogwarts, A History?” How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it’s a lot. How many nights do the three of them spend in the library, reading through every book … Read more Text Summarization on the Books of Harry Potter

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Lynda.com Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more Parsing XML, Named Entity Recognition in One-Shot

An introduction to web scraping with Python

Introduction As a data scientist, I often find myself looking for external data sources that could be relevant for my machine learning projects. The problem is that it is uncommon to find open source data sets that perfectly correspond to what you are looking for, or free APIs that give you access to data. In … Read more An introduction to web scraping with Python

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read more Object detection and tracking in PyTorch

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read more Word Representation in Natural Language Processing Part II

TensorFlow Filesystem — Access Tensors Differently

Tensorflow is great. Really, I mean it. The problem is it’s great up to a point. Sometimes you want to do very simple things, but tensorflow is giving you a hard time. The motivation I had behind writing TFFS (TensorFlow File System) can be shared by anyone who has used tensorflow, including you. All I … Read more TensorFlow Filesystem — Access Tensors Differently

Beating the Fantasy Premier League game with Python and Data Science

Our Moneyball approach to the EPL Fantasy League My friend and I have been playing the Official Fantasy English Premier League game for many years, and despite our firm belief that we know everything about English soccer, we tend to get “unlucky” year after year and somehow never seem to pick the winning team. So, we … Read more Beating the Fantasy Premier League game with Python and Data Science

A short guide to using Docker for your data science environment

WHY One of the most time consuming part of starting your work on a new system/starting a new job or just plain sharing your work is the variation of tools available (or lack thereof) due to differences in hardware/software/security policies and what not. Containerization has risen up in recent years as a ready to use … Read more A short guide to using Docker for your data science environment

Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

A hands on guide for beginners on EDA and Data Science competitions Exploratory Data Analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Following are the different steps involved in EDA : Data Collection Data Cleaning Data Preprocessing Data Visualisation Data Collection Data collection is the process … Read more Exploratory Data Analysis (EDA) techniques for kaggle competition beginners

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read more PyTorch 101 for Dummies like Me

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read more Introduction to Linear Regression in Python

The intuition behind Shannon’s Entropy

Now, back to our formula 3.49: The definition of Entropy for a probability distribution (from The Deep Learning Book) I(x) is the information content of X. I(x) itself is a random variable. In our example, the possible outcomes of the War. Thus, H(x) is the expected value of every possible information. Using the definition of expected … Read more The intuition behind Shannon’s Entropy

Forecasting Exchange Rates Using ARIMA In Python

Sep 29, 2018 Nearly all sectors use time series data to forecast future time points. Forecasting future can assist analysts and management in making better calculated decisions to maximise returns and minimise risks. I will be demonstrating how we can forecast exchange rates in this article. If you are new to finance and want to … Read more Forecasting Exchange Rates Using ARIMA In Python

Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read more Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Automatic Image Quality Assessment in Python

Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read more Automatic Image Quality Assessment in Python

Google’s AutoML Killer: Auto-Keras Opensource Automated ML

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search … Read more Google’s AutoML Killer: Auto-Keras Opensource Automated ML