Evaluating A Real-Life Recommender System, Error-Based and Ranking-Based

A recommender system aims to find and suggest items of likely interest based on the users’ preferences Recommender system is one of the most valuable applications in machine learning today. Amazon attributes its 35% of revenue to its recommender system. Evaluation is an integral part of researching and developing any recommender system. Depends on your … Read more

Master Python through building real-world applications (Part 6)

Scraping data from FIFA.com using BeautifulSoup Most people think data science is about cool machine learning algorithms and self-driving cars. Let me tell you something, it’s not. Almost 80% of the time you are searching and cleaning the data, and if successful, remaining 20% in those cool stuff you see upfront. “Find data and play … Read more

Setting Up AWS EC2 Instance for Beginners

Access Your EC2 Instance via SSH EC2 instance, check; .pem key, check. Before proceeding, you need to locate your public DNS highlighted in green. Click on your newly created instance and a description box should appear like the own below. You use the ssh (secure shell) command to access your instance. Open a Terminal window and … Read more

Web Development of NLP Model in Python & Deployed in Flask

Source Google Introduction on NLP spam Architecture Considering a system using machine learning to detect spam SMS text messages. Our ML systems workflow is like this: Train offline -> Make model available as a service -> Predict online. A classifier is trained offline with spam and non-spam messages. The trained model is deployed as a … Read more

TimeSeries Data Munging — Lagging Variables that are Distributed Across Multiple Groups

2. Lag one variable across multiple groups — using “unstack” method This method is slightly more involved because there are several groups, but manageable because only one variable needs to be lagged. Overall, we should be aware that we want to index the data first, then unstack to separate the groups before applying the lag function. Failure … Read more

Getting Started with Randomized Optimization in Python

How to use randomized optimization algorithms to solve simple optimization problems with Python’s mlrose package mlrose provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss what is meant by an optimization problem and step through … Read more

Sentiment Classification with Natural Language Processing on LSTM

Google So Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. LSA is an information retrieval technique which analyzes and identifies the pattern in unstructured collection of text and the relationship between them. LSA itself is … Read more

How to Perform Lasso and Ridge Regression in Python

A quick tutorial on how to use lasso and ridge regression to improve your linear model. Photo by Zhen Hu on Unsplash Previously, I introduced the theory underlying lasso and ridge regression. We now know that they are alternate fitting methods that can greatly improve the performance of a linear model. In this quick tutorial, we revisit … Read more

Stock Market Prediction by Recurrent Neural Network on LSTM Model

The art of forecasting stock prices has been a difficult task for many of the researchers and analysts. In fact, investors are highly interested in the research area of stock price prediction. For a good and successful investment, many investors are keen on knowing the future situation of the stock market. Good and effective prediction … Read more

Practical NumPy — Understanding Python library through its functions

Before embarking on the journey of data science and machine learning, it is very important to learn a few python libraries which are ubiquitous in the world of data science like Numpy, Pandas and Matplotlib. Numpy is one such powerful library for array processing along with a large collection of high-level mathematical functions to operate … Read more

Having Fun with TextBlob

A Python library for processing textual data, NLP framework, sentiment analysis As an NLP library for Python, TextBlob has been around for a while, after hearing many good things about it such as part-of-speech tagging and sentiment analysis, I decided to give it a try, therefore, this is the first time I am using TextBlob … Read more

Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention

A new visualization tool shows how BERT forms its distinctive attention patterns. In Part 1 of this series, I described 6 key patterns that appear in BERT’s self-attention layers. For example, one pattern focuses nearly all of the attention on the next word in the sequence; another focuses on the previous word (see illustration below). … Read more

PyTorch Autograd : Understanding the heart of PyTorch’s magic

Source: http://bumpybrains.com/comics.php?comic=34 Let’s just agree, we are all bad at calculus when it comes to large neural networks. It is impractical to calculate gradients of such large composite functions by explicitly solving mathematical equations especially because these curves exist in a large number of dimensions and are impossible to fathom. To deal with hyper-planes in … Read more

From raw images to real-time predictions with Deep Learning

In my opinion, one of the most exciting fields in Artificial Intelligence is computer vision. I find it very interesting how we can now automatically extract knowledge from complex raw data structures such as images. The goal of this article is to explore a complete example of a computer vision application: building a face expression … Read more

Deep Learning with Magnetic Resonance and Computed Tomography Images

Getting started with applying deep learning to magnetic resonance (MR) or computed tomography (CT) images is not straightforward; finding appropriate data sets, preprocessing the data, and creating the data loader structures necessary to do the work is a pain to figure out. In this post I hope to alleviate some of that pain for newcomers. … Read more

Survival Analysis: Intuition & Implementation in Python

Table of Contents Introduction Definitions Mathematical Intuition Kaplan-Meier Estimate Cox Proportional Hazard Model End Note Additional Resources Introduction Survival Analysis is a set of statistical tools, which addresses questions such as ‘how long would it be, before a particular event occurs’; in other words we can also call it as a ‘time to event’ analysis. This … Read more

Optimize Data Science Models with Feature Engineering: Cluster Analysis, Metrics Development, and…

Cluster Analysis, Metrics Development, and PCA with Baby Names Data While baby name articles are mandatory reading for soon to be parents, the U.S. Social Security’s (SSA’s) Baby Names data set should be a required for budding data scientists. The data set can be sliced and diced in many different ways, including language and time based … Read more

How BERT leverage attention mechanism and transformer to learn word contextual relations

After ELMo (Embeddings from Language Model) and Open AI GPT (Generative Pre-trained Transformer), a new state-of-the-art NLP paper is released by Google. They call this approach as BERT (Bidirectional Encoder Representations from Transformers). Both Open AI GPT and BERT use transformer architecture to learn the text representations. One of the difference is BERT use bidirectional … Read more

Regularization in Gradient Point of View [ Manual Back Propagation in Tensorflow ]

Results Train/Test Accuracy When we view the accuracy plots for both training and testing data, we can observe that…..Highest Training Accuracy is Achieved when adding: sqrt(θ²)/θHighest Testing Accuracy is Achieved when adding: -tanh(θ)Lowest Performance is Achieved when adding: θ When we add the term θ the derivative value to each weight just becomes one, and … Read more

Research Oriented Code in AI/ML projects

Photo by Vitaly Taranov on Unsplash Conceptualization of research-oriented code for bridging the gap between data science and engineering Recently, web application engineers have more opportunities to work with data scientists or researchers than before. At the same time, they are often faced with the research-oriented code written by data scientists or researchers. The code tends … Read more

Distributed Data Pre-processing using Dask, Amazon ECS and Python (Part 2)

Source: pixabay.com Using Dask for EDA and Hyperparameters Optimization (HPO) In Part 1 of this series, I explained how to build a serverless cluster of Dask scheduler and workers on AWS Fargate. Scaling the number of workers up and down is quite simple. You can achieve that by running the below AWS CLI commands: bash~# … Read more

Predicting Invasive Ductal Carcinoma using Convolutional Neural Network (CNN) in Keras

Tackling data imbalance by random undersampling y_train.count(1) #counting the number of 1y_train.count(0) #counting the number of 0 Counting the number of 1’s and 0’s in the array Y, we find that there are 44478 images of class 0 and 15522 images of class 1. This problem is known as data imbalance and can cause our … Read more

Ridge Regression for Better Usage

The goal of this post is to let you better use ridge regression than just use what libraries provide. Then, “What is Ridge Regression?”. The simplest way to answer the question is “Variation of Linear Regression”. The worst way is to start with the following mathematical equations not many can understand at first glance. Bad … Read more

Custom TensorFlow Loss Functions for Advanced Machine Learning

And few-shot transfer learning example In this article, we’ll look at: The use of custom loss functions in advanced ML applications Defining a custom loss function and integrating to a basic Tensorflow neural net model A brief example of knowledge distillation learning using a Gaussian Process reference applied to a few-shot learning problem Links to my … Read more

Building a Perfect Mushroom Classifier

Introduction to classification using logistic regression, linear discriminant analysis and quadratic discriminant analysis with Python Photo by Florian van Duyn on Unsplash On a pizza or in a risotto, mushrooms simply taste great! But with over 10 000 species of mushrooms only in North America, how can we tell which are edible? This is the objective of … Read more

Optimizing Jupyter Notebooks — A Comprehensive Guide

While we all know that premature micro optimizations are the root of all evil, thanks to Donald Knuth’s paper “Structured Programming With Go To Statements” [1], eventually at some point in your data exploration process you grasp for more than just the current “working” solution. The heuristic approach we usually follow considers: Make it work. … Read more

10 Data Science Tools I Explored in 2018

Source: geralt (pixabay) New Languages, Libraries, and Services Dec 31, 2018 In 2018, I invested a good amount of time in learning and writing about data science methods and technologies. In the first half of 2018, I wrote a blog series on data science for startups, which I turned into a book. In the second half, … Read more

Numpy Guide for People In a Hurry

Dec 31, 2018 Photo by Chris Ried on Unsplash The NumPy library is an important Python library for Data Scientists and it is one that you should be familiar with. Numpy arrays are like Python lists, but much better! It’s much easier manipulating a Numpy array than manipulating a Python list. You can use one Numpy … Read more

Supercharging word vectors

A simple technique to boost fastText and other word vectors in your NLP projects Over the last few years, word vectors have been transformative in their ability to create semantic linkages between words. It is now the norm for these to be fed into deep learning models for tasks such as classification or sentiment analysis. Despite … Read more

Monty Hall Problem using Python

Understanding mathematical proofs with the help of programming We have all heard the probability brain teaser for the three door game show. Each contestant guesses whats behind the door, the show host reveals one of the three doors that didn’t have the prize and gives an opportunity to the contestant to switch doors. It is … Read more

Extract features of Music

Different type of audio features and how to extract them. Dec 30, 2018 MFCC feature extraction Extraction of features is a very important part in analyzing and finding relations between different things. The data provided of audio cannot be understood by the models directly to convert them into an understandable format feature extraction is used. It … Read more

Total Least Squares in comparison with OLS and ODR

Total least squares(aka TLS) is one of the methods of regression analysis to minimize the sum of squared errors between response variable(or, an observation) and estimated variable(we often say a fitted value). The most popular and standard methods of this is Ordinary least squares(aka OLS) for the same purpose, and TLS is one of other … Read more

Grid Search for model tuning

A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot be estimated from data. The value of the hyperparameter has to be set before the learning process begins. For example, c in Support Vector Machines, k in k-Nearest Neighbors, the number of hidden layers in Neural … Read more

Spotify ReWrapped

Spotify surprises us every December with their cool end-of-the-year specials. Nevertheless, this year some of the reports smelled fishy. This humble Medium account decided to investigate. Photoshop skills: level 9000 Every year I expect Spotify’s summary. 2016 came with the usual top 5 of songs, artists and genres. The amount of minutes spent listening to music, … Read more

How fuzzy matching improve your NLP model

One of the challenge when dealing with NLP tasks is text fuzzy matching alignment. You can still build your NLP model when skipping this text process text but the trade-off is you may not achieve good result. Someone may argue that there is not necessary to have preprocessing when using deep learning. From my experience, … Read more

Let’s Find Donors For Charity With Machine Learning Models

An application of Supervised Learning Algorithms Somewhere in the Philippines Welcome to my second medium post about Data Science. I will write here about a project I’ve done using Machine Learning algorithms. I will explain what I did without relying heavily on technical language, but I will show snippets of my code. Code matters 🙂 The … Read more

Andrew Ng’s Machine Learning Course in Python (Neural Networks)

In assignment 4, we worked towards implementing a neural network from scratch. We start off by computing the cost function and gradient of theta. def sigmoidGradient(z):”””computes the gradient of the sigmoid function”””sigmoid = 1/(1 + np.exp(-z))return sigmoid *(1-sigmoid) def nnCostFunction(nn_params,input_layer_size, hidden_layer_size, num_labels,X, y,Lambda):”””nn_params contains the parameters unrolled into a vectorcompute the cost and gradient of … Read more

How I got Matplotlib to plot Apple Color Emojis

And why the library currently cannot Are you a data scientist who is interested in analyzing and visualizing text messages or any other conversational medium that may include emojis? Your plotting options may be limited. This post investigates why the popular Python library Matplotlib cannot plot emojis from the Apple Color Emoji font, and how … Read more

Building and Testing Recommender Systems With Surprise, Step-By-Step

Learn how to build your own recommendation engine with the help of Python and Surprise Library, Collaborative Filtering Recommender systems are one of the most common used and easily understandable applications of data science. Lots of work has been done on this topic, the interest and demand in this area remains very high because of … Read more