Deep Learning with Magnetic Resonance and Computed Tomography Images

Getting started with applying deep learning to magnetic resonance (MR) or computed tomography (CT) images is not straightforward; finding appropriate data sets, preprocessing the data, and creating the data loader structures necessary to do the work is a pain to figure out. In this post I hope to alleviate some of that pain for newcomers. … Read more

Survival Analysis: Intuition & Implementation in Python

Table of Contents Introduction Definitions Mathematical Intuition Kaplan-Meier Estimate Cox Proportional Hazard Model End Note Additional Resources Introduction Survival Analysis is a set of statistical tools, which addresses questions such as ‘how long would it be, before a particular event occurs’; in other words we can also call it as a ‘time to event’ analysis. This … Read more

Optimize Data Science Models with Feature Engineering: Cluster Analysis, Metrics Development, and…

Cluster Analysis, Metrics Development, and PCA with Baby Names Data While baby name articles are mandatory reading for soon to be parents, the U.S. Social Security’s (SSA’s) Baby Names data set should be a required for budding data scientists. The data set can be sliced and diced in many different ways, including language and time based … Read more

How BERT leverage attention mechanism and transformer to learn word contextual relations

After ELMo (Embeddings from Language Model) and Open AI GPT (Generative Pre-trained Transformer), a new state-of-the-art NLP paper is released by Google. They call this approach as BERT (Bidirectional Encoder Representations from Transformers). Both Open AI GPT and BERT use transformer architecture to learn the text representations. One of the difference is BERT use bidirectional … Read more

Regularization in Gradient Point of View [ Manual Back Propagation in Tensorflow ]

Results Train/Test Accuracy When we view the accuracy plots for both training and testing data, we can observe that…..Highest Training Accuracy is Achieved when adding: sqrt(θ²)/θHighest Testing Accuracy is Achieved when adding: -tanh(θ)Lowest Performance is Achieved when adding: θ When we add the term θ the derivative value to each weight just becomes one, and … Read more

Research Oriented Code in AI/ML projects

Photo by Vitaly Taranov on Unsplash Conceptualization of research-oriented code for bridging the gap between data science and engineering Recently, web application engineers have more opportunities to work with data scientists or researchers than before. At the same time, they are often faced with the research-oriented code written by data scientists or researchers. The code tends … Read more

Distributed Data Pre-processing using Dask, Amazon ECS and Python (Part 2)

Source: Using Dask for EDA and Hyperparameters Optimization (HPO) In Part 1 of this series, I explained how to build a serverless cluster of Dask scheduler and workers on AWS Fargate. Scaling the number of workers up and down is quite simple. You can achieve that by running the below AWS CLI commands: bash~# … Read more

Predicting Invasive Ductal Carcinoma using Convolutional Neural Network (CNN) in Keras

Tackling data imbalance by random undersampling y_train.count(1) #counting the number of 1y_train.count(0) #counting the number of 0 Counting the number of 1’s and 0’s in the array Y, we find that there are 44478 images of class 0 and 15522 images of class 1. This problem is known as data imbalance and can cause our … Read more

Ridge Regression for Better Usage

The goal of this post is to let you better use ridge regression than just use what libraries provide. Then, “What is Ridge Regression?”. The simplest way to answer the question is “Variation of Linear Regression”. The worst way is to start with the following mathematical equations not many can understand at first glance. Bad … Read more

Custom TensorFlow Loss Functions for Advanced Machine Learning

And few-shot transfer learning example In this article, we’ll look at: The use of custom loss functions in advanced ML applications Defining a custom loss function and integrating to a basic Tensorflow neural net model A brief example of knowledge distillation learning using a Gaussian Process reference applied to a few-shot learning problem Links to my … Read more

Building a Perfect Mushroom Classifier

Introduction to classification using logistic regression, linear discriminant analysis and quadratic discriminant analysis with Python Photo by Florian van Duyn on Unsplash On a pizza or in a risotto, mushrooms simply taste great! But with over 10 000 species of mushrooms only in North America, how can we tell which are edible? This is the objective of … Read more

Optimizing Jupyter Notebooks — A Comprehensive Guide

While we all know that premature micro optimizations are the root of all evil, thanks to Donald Knuth’s paper “Structured Programming With Go To Statements” [1], eventually at some point in your data exploration process you grasp for more than just the current “working” solution. The heuristic approach we usually follow considers: Make it work. … Read more

10 Data Science Tools I Explored in 2018

Source: geralt (pixabay) New Languages, Libraries, and Services Dec 31, 2018 In 2018, I invested a good amount of time in learning and writing about data science methods and technologies. In the first half of 2018, I wrote a blog series on data science for startups, which I turned into a book. In the second half, … Read more

Numpy Guide for People In a Hurry

Dec 31, 2018 Photo by Chris Ried on Unsplash The NumPy library is an important Python library for Data Scientists and it is one that you should be familiar with. Numpy arrays are like Python lists, but much better! It’s much easier manipulating a Numpy array than manipulating a Python list. You can use one Numpy … Read more

Supercharging word vectors

A simple technique to boost fastText and other word vectors in your NLP projects Over the last few years, word vectors have been transformative in their ability to create semantic linkages between words. It is now the norm for these to be fed into deep learning models for tasks such as classification or sentiment analysis. Despite … Read more

Monty Hall Problem using Python

Understanding mathematical proofs with the help of programming We have all heard the probability brain teaser for the three door game show. Each contestant guesses whats behind the door, the show host reveals one of the three doors that didn’t have the prize and gives an opportunity to the contestant to switch doors. It is … Read more

Extract features of Music

Different type of audio features and how to extract them. Dec 30, 2018 MFCC feature extraction Extraction of features is a very important part in analyzing and finding relations between different things. The data provided of audio cannot be understood by the models directly to convert them into an understandable format feature extraction is used. It … Read more

Total Least Squares in comparison with OLS and ODR

Total least squares(aka TLS) is one of the methods of regression analysis to minimize the sum of squared errors between response variable(or, an observation) and estimated variable(we often say a fitted value). The most popular and standard methods of this is Ordinary least squares(aka OLS) for the same purpose, and TLS is one of other … Read more

Grid Search for model tuning

A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot be estimated from data. The value of the hyperparameter has to be set before the learning process begins. For example, c in Support Vector Machines, k in k-Nearest Neighbors, the number of hidden layers in Neural … Read more

Spotify ReWrapped

Spotify surprises us every December with their cool end-of-the-year specials. Nevertheless, this year some of the reports smelled fishy. This humble Medium account decided to investigate. Photoshop skills: level 9000 Every year I expect Spotify’s summary. 2016 came with the usual top 5 of songs, artists and genres. The amount of minutes spent listening to music, … Read more

How fuzzy matching improve your NLP model

One of the challenge when dealing with NLP tasks is text fuzzy matching alignment. You can still build your NLP model when skipping this text process text but the trade-off is you may not achieve good result. Someone may argue that there is not necessary to have preprocessing when using deep learning. From my experience, … Read more

Let’s Find Donors For Charity With Machine Learning Models

An application of Supervised Learning Algorithms Somewhere in the Philippines Welcome to my second medium post about Data Science. I will write here about a project I’ve done using Machine Learning algorithms. I will explain what I did without relying heavily on technical language, but I will show snippets of my code. Code matters 🙂 The … Read more

Andrew Ng’s Machine Learning Course in Python (Neural Networks)

In assignment 4, we worked towards implementing a neural network from scratch. We start off by computing the cost function and gradient of theta. def sigmoidGradient(z):”””computes the gradient of the sigmoid function”””sigmoid = 1/(1 + np.exp(-z))return sigmoid *(1-sigmoid) def nnCostFunction(nn_params,input_layer_size, hidden_layer_size, num_labels,X, y,Lambda):”””nn_params contains the parameters unrolled into a vectorcompute the cost and gradient of … Read more

How I got Matplotlib to plot Apple Color Emojis

And why the library currently cannot Are you a data scientist who is interested in analyzing and visualizing text messages or any other conversational medium that may include emojis? Your plotting options may be limited. This post investigates why the popular Python library Matplotlib cannot plot emojis from the Apple Color Emoji font, and how … Read more

Building and Testing Recommender Systems With Surprise, Step-By-Step

Learn how to build your own recommendation engine with the help of Python and Surprise Library, Collaborative Filtering Recommender systems are one of the most common used and easily understandable applications of data science. Lots of work has been done on this topic, the interest and demand in this area remains very high because of … Read more

Web Scraping Apartment Listings in Stockholm

Me and by partner have sold our apartment and are in search of a new apartment and since the majority of the people searching for a new apartment manually go through This, to me, seems to tedious and exhausting, so I thought — why not use my Python knowledge and bottomless crave for these types of … Read more

Python Plotting API: Expose your scientific python plots through a flask API

In my daily work as a data scientist, I often have the need to integrate relatively complex plots into back-office applications. These plots are mainly used to illustrate algorithmic decisions and give data intuitions to operational departments. A possible approach here would be to build an API that returns data and let the front-end of … Read more

A Starter Pack to Exploratory Data Analysis with Python, pandas, seaborn, and scikit-Learn

2. Categorical Analysis We can start reading the data using pd.read_csv() . By doing a .head() on the data frame, we could have a quick peek at the top 5 rows of our data. For those who are not familiar with pandas or the concept of a data frame, I would highly recommend spending half a day … Read more

Understand Text Summarization and create your own summarizer in python

How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural … Read more

Word2Vec For Phrases — Learning Embeddings For More Than One Word

Learning Phrases From Unsupervised Text (Collocation Extraction) We can easily create bi-grams with our unsupervised corpus and take it as an input to Word2Vec. For example, the sentence “I walked today to the park” will be converted to “I_walked walked_today today_to to_the the_park” and each bi-gram will be treated as a uni-gram in the Word2Vec … Read more

Progress Bars in Python

Just like a watched pot never boils, a watched for loop never ends. When dealing with large datasets, even the simplest operations can take hours. Progress bars can help make data processing jobs less of a headache because: You get a reliable estimate of how long it will take. You can see immediately if it’s … Read more

Distributed TensorFlow using Horovod

Reduce training time for deep neural networks by using many GPUs Marenostrum Supercomputer — Barcelona Supercomputing Center (This post will be used in my master course SA-MIRI at UPC Barcelona Tech with the support of Barcelona Supercomputing Center) “Methods that scale with computation are the future of Artificial Intelligence” — Rich Sutton, father of reinforcement learning (video 4:49) In … Read more

Develop a NLP Model in Python & Deploy It with Flask, Step by Step

Flask API, Document Classification, Spam Filter By far, we have developed many machine learning models, generated numeric predictions on the testing data, and tested the results. And we did everything offline. In reality, generating predictions is only part of a machine learning project, although it is the most important part in my opinion. Considering a system … Read more

Introduction to Interactive Time Series Visualizations with Plotly in Python

Introduction to Plotly Plotly is a company that makes visualization tools including a Python API library. (Plotly also makes Dash, a framework for building interactive web-based applications with Python code). For this article, we’ll stick to working with the plotly Python library in a Jupyter Notebook and touching up images in the online plotly editor. When … Read more

The complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

A recurring subject in NLP is to understand large corpus of texts through topic extraction. Whether you analyze users’ online reviews, product descriptions, or text entered in search bars, understanding key topics will always come in handy. Popular picture explaining LDA Before going into the LDA method, let me remind you that not reinventing the … Read more

Unpacking (**PCA)

Alright, better to implement PCA to get the image. Let’s start by making 5 *10 matrix, and take steps of the process. Matrix X import numpy as npX = np.random.rand(5,10) The column are variables (characteristics) and the row are samples(say, ‘cat’ or ‘dog’). What we want to do with this matrix is to get eigenvalues … Read more

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read more

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read more

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read more

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read more