Application Of Deep Learning In Identifying Road Cracks

Recently I had a chance to work with a really cool road crack detection dataset as part of my research. A company (lets call it Ministry of Road Cracks and Other Important Stuff (MRCOIS for short) 😑) was seeking an autonomous system to localize the road cracks and classify them according to 3 crack severity … Read moreApplication Of Deep Learning In Identifying Road Cracks

Permutation Feature Importance (PFI) of GRNN

In the post https://statcompute.wordpress.com/2019/10/13/assess-variable-importance-in-grnn, it was shown how to assess the variable importance of a GRNN by the decrease in GoF statistics, e.g. AUC, after averaging or dropping the variable of interest. The permutation feature importance evaluates the variable importance in a similar manner by permuting values of the variable, which attempts to break the … Read morePermutation Feature Importance (PFI) of GRNN

How Computers Think

It’s the final frontier in artificial intelligence. It’s the star of countless films and novels, both the greatest villain and hero of modern-day fantasies. I’m talking about true “intelligent” machines, sometimes called “hard” AI, or “general intelligence”. That is, an artificial intelligence that is intelligent “like us”, that is “conscious” or “self aware”. This is … Read moreHow Computers Think

Dueling Deep Q Networks

Dueling Network Architectures for Deep Reinforcement Learning Let’s go over some important definitions before going through the Dueling DQN paper. Most of these should be familiar. Given the agent’s policy π, the action value and state value are defined as, respectively: The above Q function can also be written as: The Advantage is a quantity … Read moreDueling Deep Q Networks

Machine learning: introduction, monumental failure, and hope

Photo by Ahmed Hasan on Unsplash Wikipedia tells us that Machine learning is, “a field of computer science that gives computers the ability to learn without being explicitly programmed.” It goes on to say, “machine learning explores the study and construction of algorithms that can learn from and make predictions on data — such algorithms … Read moreMachine learning: introduction, monumental failure, and hope

Data Science’s Most Misunderstood Hero

Why treating analytics like a second-class citizen will hurt you This article is an extended 2-in-1 remix of my HBR article and TDS article about analysts. Be careful which skills you put on a pedestal, since the effects of unwise choices can be devastating. In addition to mismanaged teams and unnecessary hires, you’ll see the … Read moreData Science’s Most Misunderstood Hero

Partial Dependence Plot (PDP) of GRNN

The function grnn.margin() (https://github.com/statcompute/yager/blob/master/code/grnn.margin.R) was my first attempt to explore the relationship between each predictor and the response in a General Regression Neural Network, which usually is considered the Black-Box model. The idea is described below: First trained a GRNN with the original training dataset Created an artificial dataset from the training data by keeping … Read morePartial Dependence Plot (PDP) of GRNN

Classifying Rare Events Using Five Machine Learning Techniques

Machine Learning is the crown of Data Science; Supervised learning is the crown jewel of Machine Learning. Supervised learning is the machine learning task or process of producing a function that predicts output variables. It has been adopted widely in the industry. For example, banks apply supervised models to detect credit card fraud. Quantitative traders … Read moreClassifying Rare Events Using Five Machine Learning Techniques

Difficulties with self-learning and 3 systems to solve them

After 2 years of being a self-learner who taught himself everything he wanted and another year of being lost, unmotivated and kind of depressed, I have come to realize that — Teaching yourself anything can be difficult, draining and quite a lonely process But when you care about what you are learning, you change it … Read moreDifficulties with self-learning and 3 systems to solve them

Exploratory Data Analysis in R for beginners (Part 2)

A more advanced method of doing EDA with ‘ggplot2’ and ‘tidyverse’ ggplot2 In my previous article, ‘Exploratory Data Analysis in R for beginners (Part 1)’, I have introduced a basic step-by-step approach from data importing to cleaning and visualization. Here is a quick summary of Part 1: Import data appropriately with fileEncoding and na.strings arguments. … Read moreExploratory Data Analysis in R for beginners (Part 2)

Machine Learning- Predicting House prices with Regression

Running algorithms to get the most accurate results This article is the last of my series, on the Housing dataset. For the uninitiated, I had already covered EDA and Feature Engineering in previous two articles. Summarising the work so far- we covered the awfully mundane work of data munging in EDA and meticulous re-engineering of … Read moreMachine Learning- Predicting House prices with Regression

Deploying your first Deep Learning Model: MNIST in production environment

At the very beginning of the code, we are importing every required libraries and modules. Imports Every imports are self explanatory and also i have commented out the important sections, consider looking at it. from django.shortcuts import renderfrom scipy.misc.pilutil import imread, imresizeimport numpy as npimport reimport sys## Apending MNIST model pathimport ossys.path.append(os.path.abspath(“./model”))## custom utils file … Read moreDeploying your first Deep Learning Model: MNIST in production environment

Step-by-Step R-CNN Implementation From Scratch In Python

Classification and object detection are the main parts of computer vision. Classification is finding what is in an image and object detection and localisation is finding where is that object in that image. Detection is a more complex problem to solve as we need to find the coordinates of the object in an image. To … Read moreStep-by-Step R-CNN Implementation From Scratch In Python

Soccer Player Detection in Overhead Images using RetinaNet

Computer Vision for Sports Straight-to-the-point guide to building a computer vision model to detect players and ball in overhead camera images. We will be using the RetinaNet model as describe in the Focal Loss for Dense Object Detection paper by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár. We will also use Fizyr’s … Read moreSoccer Player Detection in Overhead Images using RetinaNet

Attention for time series classification and forecasting

Harnessing the most recent advances in NLP for time series forecasting and classification Transformers (specifically self-attention) have powered significant recent progress in NLP. They have enabled models like BERT, GPT-2, and XLNet to form powerful language models that can be used to generate text, translate text, answer questions, classify documents, summarize text, and much more. … Read moreAttention for time series classification and forecasting

Python eval() built-in-function

Let us understand the eval() built-in-function in python. This would be a short article about eval function in python, wherein I would be explaining to you about eval function, its syntax, and few questions that are often asked in interviews so that you clearly understand it and answer those questions in ease. Let’s get started: … Read morePython eval() built-in-function

Forward and Backpropagation in GRUs — Derived | Deep Learning

An explanation of Gated Recurrent Units (GRUs) with the math behind how the loss backpropagates through time. Read this story without paywall Source : xkcd The GRU Network, or Gated Recurrent Units were initially proposed Cho et. al, 2014 and is a very interesting type of recurrent neural network. It improved on the simple RNN … Read moreForward and Backpropagation in GRUs — Derived | Deep Learning

Artificial General Intelligence in plain English

The holy grail of AI, the pinnacle of human achievement, the ultimate weapon, our future master, and the end of the world as we know it. Is that plain enough? In the words of Douglas Adams, Don’t Panic. Siri, Cortana, Alexa and Google Assistant are not plotting to take over the human race. Let’s be … Read moreArtificial General Intelligence in plain English

Deep Prognosis: Predicting Mortality in the ICU

Data Preprocessing First, patients under 16 years old and those who stayed in the ICU less than 7 hours are removed from the dataset. Different cleaning steps were needed for each type of dataset used: diagnoses, treatments, past medical history, periodic vital signs, aperiodic vitals, and lab results. Each was loaded and pre-processed independently according … Read moreDeep Prognosis: Predicting Mortality in the ICU

Group Sparse Regularization for Deep Neural Networks

How to automatically prune nodes in neural networks? Paper summary of Group Sparse Regularization for Deep Neural Networks With the advancement in hardware technologies and thereby reduced costs, training large neural networks has become a lot easier. Due to this, it has become common to throw in large networks for even simple tasks. Having too … Read moreGroup Sparse Regularization for Deep Neural Networks

How to build your home infrastructure for data collection and visualization and be the real owner…

First, you’ll need a RaspberryPi (or any similar clone) with platypush. I assume that you’ve already got platypush installed and configured. If not, please head to my previous article on how to get started with platypush. You’ll also need a relational database installed on your device. The example in this article will rely on PostgreSQL, … Read moreHow to build your home infrastructure for data collection and visualization and be the real owner…

Calculating Customer Lifetime Values using a Shifted-Beta-Geometric model

Customer lifetime values (CLV) are a critical metric considered to be a company’s north star metric and all in-compassing KPI, used to inform: marketing campaigns — how much should we spend to acquire customers? customer segmentation — who are our most valuable customers and what are their demographic and behavioral traits? overall health of the … Read moreCalculating Customer Lifetime Values using a Shifted-Beta-Geometric model

Web Scraping Board Game Descriptions with Python

I’ll say it again, I love collecting data! Although reading through HTML to find the tags you want can be a little tedious at times, scraping data from web pages provides a treasure trove of information. In a previous article, I discussed some of the techniques I use to manually collect web data. This article … Read moreWeb Scraping Board Game Descriptions with Python

Clustering & Machine Learning Combination in Sales Prediction

PREDICTION is an essential component to successful inventory management in retail. Identifying and understanding past trends in sales transactions can be used for defining a better marketing strategy. Here, I will show how to combine forecasts using the concepts of Clustering. Clusters of items are identified based on the similarity in their sales forecasts and … Read moreClustering & Machine Learning Combination in Sales Prediction

Decoding the output of a hybrid recommendation system

Making sense of recommendations provided by a collaborative filter based recommendation system in medical context With the rise of emerging analytical techniques, an effective recommendation system has become a game changer in the world of data science. Companies like Netflix and Amazon have surpassed their competitors by building some reliable recommendation systems. But these black … Read moreDecoding the output of a hybrid recommendation system

How confident are you? Assessing the uncertainty in forecasting

Introduction Some people think that the main idea of forecasting is in predicting the future as accurately as possible. I have bad news for them. The main idea of forecasting is in decreasing the uncertainty. Think about it: any event that we want to predict has some systematic components \(\mu_t\), which could potentially be captured … Read moreHow confident are you? Assessing the uncertainty in forecasting

Visualizing the Emotional Arcs of Movie Scripts Using Rule-Based Sentiment Analysis

Almost 72 years ago, acclaimed American writer Kurt Vonnegut came up with a novel method for graphing the plot lines of stories as part of his master’s thesis in anthropology. Although his work was ultimately rejected by the University of Chicago “because it was so simple and looked like too much fun,” according to Vonnegut, … Read moreVisualizing the Emotional Arcs of Movie Scripts Using Rule-Based Sentiment Analysis

AWS CloudHSM is now available in the AWS South America (Sao Paulo) Region

CloudHSM provides fully managed hardware security module (HSM) instances in the AWS Cloud. With CloudHSM, you can manage and use your own encryption keys using FIPS 140-2 Level 3 validated HSMs. Applications can be built using using industry-standard APIs, such as PKCS#11, Java Cryptography Extensions (JCE) and Windows Cryptography API: Next Generation (CNG).   With … Read moreAWS CloudHSM is now available in the AWS South America (Sao Paulo) Region

New geospatial data comes to BigQuery public datasets with CARTO collaborationNew geospatial data comes to BigQuery public datasets with CARTO collaborationProgram Manager, Google Cloud Public Dataset ProgramProduct Marketing Manager, Google Cloud

At Google Cloud, we host many public datasets, including weather, traffic, housing and other data, in BigQuery, our enterprise data warehousing platform. You can use this public data to experiment with data analytics and join it with your own data to find insights. We’re pleased to announce a new collaboration with CARTO to bring valuable … Read moreNew geospatial data comes to BigQuery public datasets with CARTO collaborationNew geospatial data comes to BigQuery public datasets with CARTO collaborationProgram Manager, Google Cloud Public Dataset ProgramProduct Marketing Manager, Google Cloud

Predict soccer matches with 50% accuracy

The bars show the real amount of goals in the season 2018/19. The lines show our Poisson calculated values that seem to fit the real distribution pretty well. Predict a match of the Premier League Now we have learned the basic concept of the Poisson distribution. But how can we predict the number of goals … Read morePredict soccer matches with 50% accuracy

Plantsnap and Imagga Use Machine Learning to Put a Botanist in Your Pocket

A classifier trained on 320,000 classes and 90 million images yields surprisingly accurate results Credit: Gado Images AI and Machine Learning are usually all about computers and cutting-edge tech. Self-driving cars! Smart speakers! Robots stealing your job! But what if there was an AI-driven system that took you out of the world of tech and … Read morePlantsnap and Imagga Use Machine Learning to Put a Botanist in Your Pocket

Silicon Valley’s Brain-Meddling: A New Frontier For Tech Gadgetry

Introducing his students to the study of the human brain Jeff Lichtman, a Harvard Professor of Molecular and Cellular Biology, once asked: “If understanding everything you need to know about the brain was a mile, how far have we walked?”. He received answers like ‘three-quarters of a mile’, ‘half a mile’, and ‘a quarter of … Read moreSilicon Valley’s Brain-Meddling: A New Frontier For Tech Gadgetry

Estimating Uncertainty in Machine Learning Models — Part 3

Check out part 1 (here)and part 2 (here) of this series Author: Dhruv Nair, data scientist, Comet.ml In the last part of our series on uncertainty estimation, we addressed the limitations of approaches like bootstrapping for large models, and demonstrated how we might estimate uncertainty in the predictions of a neural network using MC Dropout. … Read moreEstimating Uncertainty in Machine Learning Models — Part 3

The Basics: KNN for classification and regression

Data Science from the ground up Building an intuition for how KNN models work Data science or applied statistics courses typically start with linear models, but in its way, K-nearest neighbors is probably the simplest widely used model conceptually. KNN models are really just technical implementations of a common intuition, that things that share similar … Read moreThe Basics: KNN for classification and regression

Install a Kafka Cluster on Ubuntu in AWS

Build Real-Time Data Capability Through a Kafka Message Backbone in AWS Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. These companies include the top ten travel companies, 7 of the top ten banks, 8 of the top ten insurance companies, 9 of the top … Read moreInstall a Kafka Cluster on Ubuntu in AWS

Avoid These Deadly Modeling Mistakes that May Cost You a Career

I love seeing data scientists using advanced packages, creating dazzling exhibits and experimenting with different algorithms. Data scientists can keep the computer burning for a whole day. A cool T-shirt, a cup of coffee and a laptop — that’s all he or she needs. While as crazy as their titles sound, some novice data scientists … Read moreAvoid These Deadly Modeling Mistakes that May Cost You a Career

How Python Helped Select My New Home — Part 2

Before we begin, you will have to understand a little bit about the basics of how computers perceive images. This will be helpful for you to understand my approach. Image as an array. Photo by Stanford. Unlike humans, computers see images as numbers in a 3-dimensional array, i.e. width x height x channels. Each number … Read moreHow Python Helped Select My New Home — Part 2

Docker + Jupyter for Machine Learning in 1 Minute

Many data science applications require an isolated model training/development environment from your host environment. The lightweight solution for this would be to integrate Jupyter with Docker. The best practice for setting up such a container is using a docker file, which I have written following the best practices in less than 1 minute. I hope … Read moreDocker + Jupyter for Machine Learning in 1 Minute

Kaplan Meier Mistakes

Kaplan Meier analysis is often misused when analyzing survival data. This article highlights common Kaplan Meier errors and provides suggestions to improve prognostic factor research. Photo Credit: Louis Reed A patient’s prognosis refers to the risk of future medical outcomes such as surgery complications, tumor recurrence, or death. Prognostic factor research is the study of … Read moreKaplan Meier Mistakes

How air pollution is measured in Ulaanbaatar

4 different sources, 2 different methods Air pollution is a big issue during the winter months in Ulaanbaatar, Mongolia. The effects of air pollution are quite severe, with those living in the most polluted areas suffering from lowered lung function, increased rates of respiratory infection, and shortened lifespans. In an attempt to solve this issue, … Read moreHow air pollution is measured in Ulaanbaatar