My first contribution to Data Science -A Convolutional Neural Network that recognizes images of…

Epoch 1/251000/1000 [==============================] – 913s 913ms/step – loss: 0.3476 – acc: 0.8502 – val_loss: 2.2280 – val_acc: 0.5000Epoch 2/251000/1000 [==============================] – 907s 907ms/step – loss: 0.1354 – acc: 0.9564 – val_loss: 0.5738 – val_acc: 0.8629Epoch 3/251000/1000 [==============================] – 904s 904ms/step – loss: 0.0675 – acc: 0.9825 – val_loss: 0.6880 – val_acc: 0.8710Epoch 4/251000/1000 [==============================] – … Read more My first contribution to Data Science -A Convolutional Neural Network that recognizes images of…

Avoiding the “Automatic Hand-off” Syndrome in Data Science Products

Simple evolution guidelines for data science products Evolving data science products for new teams can be a daunting task. There are conflicting requirements embedded in the nature of data science products. First constraint: product teams want to move proof of concepts to market as fast as possible. Second constraint: data science (DS) teams need a … Read more Avoiding the “Automatic Hand-off” Syndrome in Data Science Products

Bringing big data to the science of community: Minecraft Edition

Credit Looking at today’s Internet, it is easy to wonder: whatever happened to the dream that it would be good for democracy? Well, looking past the scandals of big social media and scary plays of autocracy’s hackers, I think there’s still room for hope. The web remains full of small experiments in self-governance. It’s still … Read more Bringing big data to the science of community: Minecraft Edition

How to Get Started as a Developer in AI

Dream about a job connected to AI? This guide is your must-read. Artificial Intelligence. Well, it looks like this cutting-edge technology is now the most popular and at the same time the most decisive one for humanity. We are ceaselessly amazed at the AI capabilities and the effective way they can be used in almost … Read more How to Get Started as a Developer in AI

Working as a Data Scientist in Blockchain Startup

How did I get started working with data Even before I started working in ICORating, in my previous work as a blockchain developer, I had to make a service to build graphs based on data in the blockchain and based on data from exchanges. It all sounds pretty simple, but the problem starts exactly at … Read more Working as a Data Scientist in Blockchain Startup

Overall Equipment Effectiveness and Topic Modeling

Suppose we are given a 6-days operational log file from a piece of particular equipment. The equipment is used for 10 hours a day. Our objective is to find if there are days with issues. The only problem is that the log status is written in a foreign language (let’s say it happens that the … Read more Overall Equipment Effectiveness and Topic Modeling

How to efficiently propagate activations in a massive neural network

An event-driven approach In traditional neural networks using the sigmoid activation function, all neurons are more or less activated. There is no clear case of an inactive neuron here. That can be problematic if you want to compute extremely large networks, because in each round you’d have to update all the neurons. Intuitively, it would … Read more How to efficiently propagate activations in a massive neural network

Life of a model after deployment

In order to monitor a feature, we need to compute a single metric that will compare the training and inference distributions. There are many different similarity distributions to choose from but few that can be applied to both categorical and numerical variables. Computing similarity metrics The Wasserstein distance [1] is a similarity measure that can … Read more Life of a model after deployment

How do I get Someone to Work on my Research?

You don’t. Photo by Jamie Street Collaborations are at the heart of what we do as researchers, and success in one’s field has often equally to do with forging effective collaborations as coming up with innovative ideas. Yet collaborations are such fragile things, it’s sometimes a wonder that they can be sustained at all in … Read more How do I get Someone to Work on my Research?

Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning

Let’s make logreg great again! Nowadays there are a lot of pre-trained nets for NLP which are SOTA and beat all benchmarks: BERT, XLNet, RoBERTa, ERNIE… They are successfully applied to various datasets even when there is little data available. At the end of July (23.07.2019–28.07.2019) there was a small online hackathon on Analytics Vidhya … Read more Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning

One Class Learning in Manufacturing: Autoencoder and Golden Units Baselining

Recently I’ve been working with manufacturing customers (both OEM and CM) who want to jump on the bandwagon of machine learning. One common use case is to better detect products (or Device Under Test/DUT) that are defective in their production line. Using machine learning’s terminology, this falls under the problem of binary classification as a … Read more One Class Learning in Manufacturing: Autoencoder and Golden Units Baselining

K-Means Clustering for Unsupervised Machine Learning

The Beginner’s Guide to Unsupervised Learning Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized every aspect of our life and disrupted how we do business, unlike any other technology in the the history of mankind. Such disruption brings many challenges for professionals and businesses. In this article, I will provide an introduction to one … Read more K-Means Clustering for Unsupervised Machine Learning

How to create data-driven presentations with jupyter notebooks, reveal.js,

… in which I discuss a workflow where you can start writing your contents on a jupyter notebook, create a reveal.js slide deck, and host it on github for presentations. This is for a very simple presentation that you can fully control yourself A first simple slide deck Part I: Basic slide deckPart II: Basic … Read more How to create data-driven presentations with jupyter notebooks, reveal.js,

Stochastic Processes Analysis

An introduction to Stochastic processes and how they are applied every day in Data Science and Machine Learning. “The only simple truth is that there is nothing simple in this complex universe. Everything relates. Everything connects” — Johnny Rich, The Human Script One of the main application of Machine Learning is modelling stochastic processes. Some … Read more Stochastic Processes Analysis

Limericking part 3: text summarization

The Limericking Project Identifying key summary sentences from text Welcome to part 3 in my ongoing Limericking series, where I explore the great potential of Natural Language Processing to parse news text and write poetry. As I explained in part 1 of the series, I am an avid fan of the Twitter account Limericking which … Read more Limericking part 3: text summarization

Visualizing Different NFL Player Styles

CC by 2.0 Different players have different strengths and weaknesses — is there a way to visualize them? Back in the early 2000’s, the New York Giants had an exciting running back duo. Tiki Barber (“Lightning”) went for about 1000 yards rushing and 550 yards receiving a year. That’s impressive on its own, but even … Read more Visualizing Different NFL Player Styles

Natural Language Processing and Sports Subreddits

One of my main interests during my modeling process was to determine which words were more weighted with greater importance when predicting the subreddit of origin, rather the simply count of certain predictive words. So while I did utilize a Count Vectorization model, I found that the weighted Vectorization produced with Scikit-Learn’s TF-IDF functionality, TfidfVectorizer, … Read more Natural Language Processing and Sports Subreddits

RAPIDS cuGraph — The vision and journey to version 1.0 and beyond

The vision of RAPIDS cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. This is a goal that many of us on the cuGraph team have been working on for almost twenty years. Many of the early attempts focused on solving … Read more RAPIDS cuGraph — The vision and journey to version 1.0 and beyond

Intro to Reading and Writing Spreadsheets with Python

First we are going to check if Python is installed and install another library that will help us deal with spreadsheets. A library is a collection of code that has implemented (usually) hard things to do in a simpler way. We need to first open up the Terminal which will let us interact with our … Read more Intro to Reading and Writing Spreadsheets with Python

Understanding backpropagation algorithm

Learn the nuts and bolts of a neural network’s most important ingredient This article is inspired by my frustration over the inability to find a simple and concise explanation of backpropagation which includes the necessary mathematics and covers the essentials. So I decided to write it here. Enjoy! Backpropagation algorithm is probably the most fundamental … Read more Understanding backpropagation algorithm

What are Progressive Neural Networks?

“TELL ME AND I FORGET. TEACH ME AND I REMEMBER. INVOLVE ME AND I LEARN.” –BENJAMIN FRANKLIN Life is a journey through learning experiences. As, we are continuously learning new tasks and acquiring new knowledge and we have a magical, and purely understood ability to leverage previous experiences to optimize how we build new knowledge. … Read more What are Progressive Neural Networks?

The Treasures of Python’s built in Libraries

Discover the treasures of Python! Discover the treasures of Python! Python is a beautiful language. Simple to use yet powerfully expressive. But are you using everything that it has to offer? Every well experienced developer knows that knowing the hidden treasures of their programming language of choice helps them get around many common bugs and … Read more The Treasures of Python’s built in Libraries

Understanding Multiple Regression

The fundamental basis behind this commonly used algorithm Linear regression, while a useful tool, has significant limits. As it’s name implies, it can’t easily match any data set that is non-linear. It can only be used to make predictions that fit within the range of the training data set. And, most importantly for this article, … Read more Understanding Multiple Regression

4 Product-Driven Steps to an AI Roadmap

How to teach products to make decisions What’s so transformative about AI, anyway? Artificial Intelligence (AI) is regularly breaking new ground, from DeepMind’s AlphaGo Zero teaching itself to play Go and beating human champions to text-generating algorithms so powerful that their creators at OpenAI decided not to release them publicly for fear of malicious use. … Read more 4 Product-Driven Steps to an AI Roadmap

How to manage impostor syndrome in data science

What if they find out you’re clueless? Impostor syndrome is the elephant in the data science lab. Everyone has it, no one thinks other people have it, and no one talks about it. I’m amazed that more people don’t discuss it openly. I work at a data science mentorship startup where I probably spend 20% … Read more How to manage impostor syndrome in data science

Hybrid Intelligence

Machine as Creative Partners https://www.nytimes.com/2016/05/07/arts/design/harold-cohen-a-pioneer-of-computer-generated-art-dies-at-87.html Humans are the natural maker; we enjoy the freedom of making things. However, in the context of automation, machines challenge humans’ role in fabrication. It wastes the human’s unique skills and makes people disconnect to real-world materials. To address this issue, researchers proposed the hybrid workflow, which starts from studying … Read more Hybrid Intelligence

How to Meaningfully Play With Data

[image of a ballpit with some slides and other play-ground like things; bright colors and whimsy] Ballpits are probably a public health crisis, but I think they get across the magical feeling of “playtime” quite well. Let’s talk about play! I’ve held some interesting jobs since graduating from college. While working as a cognitive science … Read more How to Meaningfully Play With Data

Variable selection using LASSO

This is a Lasso; it is used to pick and capture animals. As a non-native English speaker, my first exposure to this word is in supervised learning. In this LASSO data science tutorial, we discuss the strengths of the Lasso logistic regression by stepping through how to apply this useful statistical method for classification problems … Read more Variable selection using LASSO

Freeing the data scientist mind from the curse of vectoRization

Julia to the rescue! Photo by Debby Hudson on Unsplash Nowadays, most data scientists use either Python or R as their main programming language. That was also my case until I met Julia earlier this year. Julia promises performance comparable to statically typed compiled languages (like C) while keeping the rapid development features of interpreted … Read more Freeing the data scientist mind from the curse of vectoRization

The King of Serving: Tennis Web Scraping with Selenium

The Power of Selenium, Excel and Pandas Josh Calabrese via unsplash Serving in Tennis often determines the outcome of the match. A break of serve or a hold of serve at a key moment in a set can ultimately decide whether silverware beckons. Through the combination of measured web scraping with appropriately placed explicit waits, … Read more The King of Serving: Tennis Web Scraping with Selenium

Introduction to Amazon Lambda, Layers and boto3 using Python3

A serverless approach for Data Scientists Photo by Daniel Eledut on Unsplash Amazon Lambda is probably the most famous serverless service available today offering low cost and practically no cloud infrastructure governance needed. It offers a relatively simple and straightforward platform for implementing functions on different languages like Python, Node.js, Java, C# and many more. … Read more Introduction to Amazon Lambda, Layers and boto3 using Python3

Visual Product Search for Smart Retail Checkout

Doing cool things with data! Introduction Artificial Intelligence is transforming several sectors of the economy such as automotive, marketing and healthcare. Retail could be next. The essential retail experience of shopping in store has remained unchanged for decades. AI could radically transform this experience by making it cost-effective to deliver a completely personalized, immersive and … Read more Visual Product Search for Smart Retail Checkout

Believe Me When I Say To You, I Hope You Love a Trade War Too!

How can I save my White House joy, from 2020’s deadly toy? A trade war is good politics. I wrote in January, May, and June about a global battle being waged between Freedom and Autocracy. The President, despite being attacked for his own supposed autocratic instincts, has successfully framed himself at the center of this … Read more Believe Me When I Say To You, I Hope You Love a Trade War Too!

How a Computerized Chess Opponent “Thinks” — The Minimax Algorithm

In 1997, a computer named “Deep Blue” defeated reigning world chess champion Garry Kasparov — a defining moment in the history of AI theory. But the great minds behind the chess computer problem had started publishing in the subject nearly 6 decades earlier. Known as the father of modern computer science, Alan Turing is credited … Read more How a Computerized Chess Opponent “Thinks” — The Minimax Algorithm

6 amateur mistakes I’ve made working with train-test splits

In the last weeks he went together into a journey about Recommendation Systems. We saw a gentle introduction to the topic and also an introduction to the most important similarity measures around it (remember that the whole repository about recommendation system and other projects are always available on my GitHub profile). And yes, I know, … Read more 6 amateur mistakes I’ve made working with train-test splits

5 Resources Every Data Scientist (and Programmer) Should Use

Photo by Bernd Klutsch on Unsplash One of the first resources I like to look through as I’m learning a new library, a new function within a library, a new programming language, etc. is the documentation for that specific library / function / language. In many cases, the documentation has been meticulously curated to provide … Read more 5 Resources Every Data Scientist (and Programmer) Should Use

Why we all have to start being nicer to each other online.

Artificial Intelligence Turn your online presence from a negative to a positive environment to protect us from AI and ourselves. Online presence photo from Pexels Artificial Intelligence is generating a lot of press in recent times and rightly so. Data is now the most valuable resource in the world and Artificial Intelligence (AI) can utilise … Read more Why we all have to start being nicer to each other online.

Experimentation in Data Science

When AB testing doesn’t cut it Today I am going to talk about experimentation in data science, why it is so important and some of the different techniques that we might consider using when AB testing is not appropriate. Experiments are designed to identify causal relationships between variables and this is a really important concept … Read more Experimentation in Data Science

Online Machine Learning with Tensorflow.js

An end to end guide on how to create, train and test a Machine Learning model in your browser using Tensorflow.js. Thanks to recent advancements in Artificial Intelligence it is now becoming relatively easy to build and train Machine Learning models. Although, these models can only benefit society by sharing them and making them ready … Read more Online Machine Learning with Tensorflow.js

5 Bad Habits of Absolutely Ineffective Programmers.

Dick Brandon hit it bang on the nail when he observed. “Documentation is like sex; when it’s good, it’s very, very good, and when it’s bad, it’s better than nothing.” Documentation is the castor oil of programming. Managers think it is good for programmers and programmers love to hate it! But that said, great developers, … Read more 5 Bad Habits of Absolutely Ineffective Programmers.

Feature selection using Python for classification problem

Including more features in the model makes the model more complex, and the model may be overfitting the data. Some features can be the noise and potentially damage the model. By removing those unimportant features, the model may generalize better. The Sklearn website listed different feature selection methods. This article is mainly based on the … Read more Feature selection using Python for classification problem