Predict malignancy in breast cancer tumors with your own neural network and the Wisconsin Dataset

In the final part of this series, we predict malignancy in breast cancer tumors using the network we coded from scratch. In part 1 of this series, we understood in depth the architecture of our neural network. In part 2, we built it using Python. We also understood in depth back-propagation and the gradient descent optimization … Read more

Quantum rush — New light on pushing AI & Machine learning boundaries

Quantum computers, It’s a successor to the classical -digital computer, Which adds an extra dimension to existing computers which lacks in solving optimization problems & doing things in parallel. Quantum computers work on the principle of quantum mechanics Quantum mechanics is the body of scientific laws that describe the motion and interaction of photons, electrons, … Read more

Towards Fast Neural Style Transfer

The seminal paper of Neural Style Transfer presented by Gatys et al. [1] demonstrates a remarkable characteristic of Deep Convolutional Neural Networks. The sequential representations learned from layers of parametric convolutions can be separated into ‘content’ and ‘style’. The fundamental idea behind Style Transfer is that pre-trained DCNNs on tasks such as ImageNet classification can … Read more

Embeddings-free Deep Learning NLP model

Word embeddings (e.g. word2vec, GloVe) are introduced several years ago and changing NLP tasks fundamentally. Having embeddings, we do not need one-hot encoding which causing very high-dimensional feature in most of the NLP tasks. We can use 300 dimensions to represent over 1 million words. Different kinds of embeddings such as character embedding, sentence embeddings … Read more

How to make GDPR and ONA work together?

Searching online for ONA (Organizational Network Analysis) gets you various definitions, including those with curly math symbols and graph theory. However, in a nutshell, ONA is about who communicates to who in an organization. Although ONA is regarded as one of late buzzwords, it can be traced way back at least in the 80’s. In … Read more

How to “farm” Kaggle in the right way

This article describes the advice and approaches on how to effectively use Kaggle as a competition platform to improve practitioner skills in Data Science with maximum efficiency and profitability. farm (farming) — gaming tactic where a player performs repetitive actions to gain experience, points or some form of in-game currency. Description These methods helped me with getting … Read more

Where should you live in San Francisco?

Moving to San Francisco would be awesome except for the cost of rent. I use Zillow data in this project to find cheap/high-value rental opportunities in San Francisco. The following analysis was scripted in R, and analysis is neighborhood-based. Credits to Ken Steif and Keith Hassel for their great tutorial on exploring home prices in … Read more

Cultural overfitting and underfitting. Or why the “Netflix Culture” won’t work in your company.

A couple of weeks back, I gave a talk in the SFELC (San Francisco Engineering Leadership Conference). As I was preparing the slides for the talk I reflected on something pretty interesting: I have been managing technical teams for over 25 years. I have also been giving public talks for about the same time. However, … Read more

Holy Grail for Bias-Variance tradeoff, Overfitting & Underfitting

There are two more important terms related to bias and variance that we must understand now- Overfitting and Underfitting. I am again going to use a real life analogy here. I have referred to the blog of Machine learning@Berkeley for this example. There is a very delicate balancing act when machine learning algorithms try to … Read more

An Introduction on Time Series Forecasting with Simple Neura Networks & LSTM

How to develop Artificial Neural Networks and LSTM recurrent neural networks for time series prediction in Python with the Keras deep learning network The purpose of this article is to explain Artificial Neural Network (ANN) and Long Short-Term Memory Recurrent Neural Network (LSTM RNN) and enable you to use them in real life and build the … Read more

Why isn’t out-of-time validation more ubiquitous?

Train, validate and test partitions for out-of-time performance take planning and thought The purpose of supervised machine learning is to classify unlabeled data. We want algorithms to tell us whether a borrower will default, a customer make a purchase, an image contains a cat, dog, malignant tumor or a benign polyp. The algorithms “learn” how to … Read more

XLM — Enhancing BERT for Cross-lingual Language Model

Cross-lingual Language Model Pretraining Attention models, and BERT in particular, have achieved promising results in Natural Language Processing, in both classification and translation tasks. A new paper by Facebook AI, named XLM, presents an improved version of BERT to achieve state-of-the-art results in both types of tasks. XLM uses a known pre-processing technique (BPE) and … Read more

Explaining Feature Importance by example of a Random Forest

Source: In many (business) cases it is equally important to not only have an accurate, but also an interpretable model. Oftentimes, apart from wanting to know what our model’s house price prediction is, we also wonder why it is this high/low and which features are most important in determining the forecast. Another example might … Read more

Train neural networks using AMD GPUs and Keras

Train a neural network with Keras In the last section, of this tutorial, we will train a simple neural network on the MNIST dataset. We will firstly build a fully connected neural network. Fully connected neural network Let’s create a new notebook, by selecting Python3 from the upper-right menu in Jupyter root directory. The upper-right menu in … Read more

Tikhonov regularization. An example other than L2.

Example: Exploiting the underlying structure of spectrometer data Consider a machine learning model that uses spectrometer data as its input features. Example of spectrometer readings of skin A scientist starts the process of collecting data, and after a while she has , say 10 datapoints with spectrometer readings from 400 to 700 nm with a spacing of … Read more

A Comprehensive Introduction to Different Types of Convolutions in Deep Learning

Towards intuitive understanding of convolutions through visualizations If you’ve heard of different kinds of convolutions in Deep Learning (e.g. 2D / 3D / 1×1 / Transposed / Dilated (Atrous) / Spatially Separable / Depthwise Separable / Flattened / Grouped / Shuffled Grouped Convolution), and got confused what they actually mean, this article is written for … Read more

Predicting Friendship

Who’s Most Likely Your Next Friend? Source: Introduction Friendship is extremely important. Having friends as a child can increase your chances of happiness as an adult (Holder and Coleman, 2009). At the same time, we aren’t taught a lot about friendship. Friendships form due to circumstance and chance and develop intuitively. But do they really? … Read more

Inference using EM algorithm

Learn in-depth about the magic of EM algorithm and start training your own graphical models What happens when your inference algorithm is not good ?[Source:] Introduction The goal of this post is to explain a powerful algorithm in statistical analysis: the Expectation-Maximization (EM) algorithm. It is powerful in the sense that it has the … Read more

Test Time Augmentation (TTA) and how to perform it with Keras

What is Test Time Augmentation ? Similar to what Data Augmentation is doing to the training set, the purpose of Test Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, “clean” images, only once to the trained model, we will show it the augmented images several times. We … Read more

Deceiving Your Mind With Your Eyes

A few months ago, I was asked to assist in keeping an eye on a specific group of people on Twitter because there was a perception they might present a danger to others. I have a trigger point for people who do bad things so I figured I would do what I do best and … Read more

6 steps to understanding a heap with Python

Today I will explain the heap, which is one of the basic data structures. Also, the famous search algorithms like Dijkstra’s algorithm or A* use the heap. A* can appear in the Hidden Malkov Model (HMM) which is often applied to time-series pattern recognition. Please note that this post isn’t about search algorithms. I’ll explain … Read more

Hands-On Big Data Streaming, Apache Spark at scale

Introduction In a world where data is being generated at an extremely fast rate, the correct analysis of the data and providing useful and meaningful results at the right time can provide helpful solutions for many domains dealing with data products. This can be applied in Health Care and Finance to Media, Retail, Travel Services … Read more

The magic behind the perceptron network

Preface note This story is part of a series I am creating about neural networks. This chapter is dedicated to a very simple type of neural network whose creation is attributed to Frank Rosenblatt after his research during the 50s and 60s. I am going to talk about the perceptron. The perceptron neuron The perceptron … Read more

Pump up the Volumes: Data in Docker

Spices Pushing the food metaphor running through these articles to the breaking point, let’s compare data in Docker to spices. Just as there are many spices in the world, there are many ways to save data with Docker. Quick FYI: this guide is current for Docker Engine Version 18.09.1 and API version 1.39. Data in … Read more

Predicting bankruptcy

Vikram Devatha & Devashish Dhiman The economic meltdown of 2008, initiated a conversation about market sustainability, and the tools that can be used to predict it. The need for better predictive models become apparent, in order to avoid such devastating events in the future. Bankruptcy of companies and enterprises effects the financial market at multiple … Read more

Training Regression Models

Training a Regression Model — Deciding loss function as an evaluation metric for Regression Models Assume that we have a training data set of employees happiness index and employee’s productivity, and plotting them yields the following graph. The graph shows that the underlying pattern of training data is a Linear Relationship between the two variables. Hence, training … Read more

Creating a Smart Scale with Tensorflow

XiaowenBlockedUnblockFollowFollowing Feb 10 Looking to get your feet wet with machine learning and build something useful at the same time? Here’s a fun weekend project to use TensorFlow to automatically read your weight from pictures of your scale and chart it over time. You’ll learn the basics of the TensorFlow Object Detection API and be … Read more

Modern Parallel and Distributed Python: A Quick Tutorial on Ray

Starting Ray The ray.init() command starts all of the relevant Ray processes. On a cluster, this is the only line that needs to change (we need to pass in the cluster address). These processes include the following: A number of worker processes for executing Python functions in parallel (roughly one worker per CPU core). A … Read more

Linear Regression

Likelihood Function To apply maximum likelihood, we first need to derive the likelihood function. First, let’s rewrite our model from above as a single conditional distribution given x: Given x, y is drawn from a Gaussian centered on our line. This is equivalent to pushing our x through the equation of the line and then adding … Read more

Finding Familiar Faces with a Tensorflow Object Detector, Pytorch Feature Extractor, and Spotify’s…

In a few different posts I have put together facial recognition pipelines or something similar using object detectors and something like a Siamese network for feature extraction to find similar images but I haven’t really dug into how to use those feature vectors in a more practical larger scale way. What I mean here is … Read more

Transform Grayscale Images to RGB Using Python’s Matplotlib

Learn about image data structures while adding two dimensions for computer vision & deep learning pipelines R, G, & B — Arabic numeral ‘3’ Data pre-processing is critical for computer vision applications, and properly converting grayscale images to the RGB format expected by current deep learning frameworks is an essential technique. What does that mean? Understanding Color Image … Read more

Introduction to Augmented Random Search.

A way to make MuJoCo learning fast and fun Photo by jean wimmerlin on Unsplash This article is based on a paper published on March, 2018 by Horia Mania, Aurelia Guy, and Benjamin Recht from the University of California, Berkeley. The authors assert that they have built an algorithm that is at least 15 times more efficient … Read more

Just how good is Novak Djokovic?

A Data Analyst-Tennis Player breakdown of the events that conspired at the Australian Open Finals 2019 This year’s Australian Open Finals was a rematch of the finals of the tournament from the year 2012 between Novak Djokovic and Rafael Nadal. The match from 2012 had all the ingredients to be considered one of the greatest matches … Read more

Data is not the new oil

About the reality of working with data If you work in data science or a related field, you probably have heard this quote before: “Data is the new oil.” The quote goes back to 2006, and is credited to Mathematician Clive Humby, but has recently picked up more steam after the Economist published a 2017 report … Read more

Make your own Smart Home Security Camera

Approach In layman’s term, the PIR sensor will detect moments and the camera will capture frames where Raspberry PI will perform facial recognition and provide final output. In more details, I have connected PIR sensor with Raspberry PI and WebCam is facing the front door of my apartment. I am not using AWS or any … Read more

A Gentle Introduction to Graph Neural Network (Basics, DeepWalk, and GraphSage)

Recently, Graph Neural Network (GNN) has gained increasing popularity in various domains, including social network, knowledge graph, recommender system, and even life science. The power of GNN in modeling the dependencies between nodes in a graph enables the breakthrough in the research area related to graph analysis. This article aims to introduce the basics of … Read more

Statistics is the Grammar of Data Science — Part 4/5

Covariance Covariance is a measure of association between two (or more) random variables. As the name ‘co + variance’ implies, it is like the variance, but applied to a comparison of two variables — in place of the sum of squares, we have a sum of cross-products.While Variance tells us how a single variable varies from the … Read more

2 Questions for a Junior Data Scientist

source: Hiring a data scientist is in general a difficult process, in my opinion. There are a lot of people coming from widely different backgrounds, levels of academic degrees and experience. The profile requirements for a “data scientist” differs a lot from company to company. In addition to these, we see more and more … Read more

Brief on Recommender Systems

Different types of recommendation methods used in industries. Nowadays, people used to buy products online more than from stores. Previously, people used to buy products based on the reviews given by relatives or friends but now as the options increased and we can buy anything digitally we need to assure people that the product is … Read more

Presidential Charisma: Who Should You Vote for?

Be it a presidential candidate or TV show host, people with charisma get their message across, communicate better with others and gain their trust. But for a president, does Charisma give away anything about how good of a president they’ll likely be? When you decide to vote for a candidate, do you vote for the … Read more

Reflections on the State of AI: 2018

Today, with hundreds of companies deeply engaged in the AI space, and even more working to figure out their strategy as related to the field, it might be a bit hard to pinpoint specific players that are best positioned to lead the way in the future. Still, if we look at any of the many … Read more

The Other Type of Machine Learning

A brief introduction to Reinforcement Learning This is the first in a series of articles on reinforcement learning and OpenAI Gym. Introduction Suppose you’re playing a video game. You enter a room with two doors. Behind Door 1 are 100 gold coins, followed by a passageway. Behind Door 2 is 1 gold coin, followed by … Read more

The power of Brain-Computer interface: use your brain to play your video game

Here we show one application of machine learning and signal processing in Neuroscience; translating thoughts into actions with our Game-Based Brain-Computer interface (BCI). Video-Game based BCI. The gaming part increases user engagement and makes it easier to acquire the new skill of controlling the BCI device. BCI enables direct control of brain activity over external … Read more

What A.I. Isn’t

(¹) A lazy leap into the future It isn’t intuitive, creative, inspired, generalized, or conscious. Will it ever be like us? Will it ever think like us? As I study data science I learn a little more about artificial intelligence each day. I practice wielding the tools in my machine learning tool box, and I read … Read more

PySpark in Google Colab

Creating a simple linear regression model with PySpark in Colab Photo by Ashim D’Silva on Unsplash With broadening sources of the data pool, the topic of Big Data has received an increasing amount of attention in the past few years. Besides dealing with the gigantic data of all kinds and shapes, the target turnaround time of the … Read more