30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less

Photo by Jantine Doornbos on Unsplash Python represents one of the most popular languages that many people use it in data science and machine learning, web development, scripting, automation, etc. Part of the reason for this popularity is its simplicity and easiness to learn it. If you are reading this, then it is highly likely … Read more 30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less

Convolutional Neural Network for Breast Cancer Classification

Deep Learning for solving the most commonly diagnosed cancer in women Photo by Tamara Bellis on Unsplash Breast cancer is the second most common cancer in women and men worldwide. In 2012, it represented about 12 percent of all new cancer cases and 25 percent of all cancers in women. Breast cancer starts when cells … Read more Convolutional Neural Network for Breast Cancer Classification

Modelling Efficient Military Deployments with Machine Learning — K-Means Clustering in R

Efficiently Deploying Naval Resources (If using a smart-phone this article is best viewed in landscape mode) Armed forces in Latin America & the Caribbean are faced with the challenge of having to operate with a multi-dimensional mandate. In times of heightened civil unrest they are required to undertake peace-keeping operations, gang warfare driven by the … Read more Modelling Efficient Military Deployments with Machine Learning — K-Means Clustering in R

Identifying The Most Dog-Friendly Neighborhoods in Manhattan, New York City

An illustration of dogs in Manhattan by Julia Rothman Introduction Whether you are a dog owner or a business in the dog care industry, identifying locations that are friendly to dogs in the city is always an important mission. Among various metropolitans across the world, New York City (“NYC”) is perceived as one of the … Read more Identifying The Most Dog-Friendly Neighborhoods
in Manhattan, New York City

Prior over functions: Gaussian process

In this post we discuss working of Gaussian process. Gaussian process fall under kernel methods, and are model free. Gaussian process are specially useful for low data regimen to “learn” complex functions. We shall review a very practical real world application (not related to deep learning or neural networks). The discuss follows from the talks … Read more Prior over functions: Gaussian process

Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage

Thanks to tools like Azure Databricks, we can build simple data pipelines in the cloud and use Spark to get some comprehensive insights into our data with relative ease. Combining this with the Apache Spark connector for Cosmos DB, we can leverage the power of Azure Cosmos DB to gain and store some incredible insights … Read more Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage

Exploring Best-Selling Books of the Last Five Years

Every week, the New York Times (NYT) publishes a list of the top ten highest-selling novels in each of the fiction and non-fiction categories. As an avid reader, I wanted to determine whether these lists might offer suggestions for future reads (i.e. whether they might cater more strongly to a particular sub-genre) and to finally … Read more Exploring Best-Selling Books of the Last Five Years

Clustering metrics better than the elbow-method

We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method. Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it helps to identify congregations of closely related (by … Read more Clustering metrics better than the elbow-method

What Distinguishes Us from AI?

Breakthroughs in artificial intelligence complicate the belief that our intellect—even our creativity—will remain unrivaled The concept of human distinctiveness is central to our self-understanding; however, time and again, what was believed to be our distinguishing feature proved no more than a mirage. For millennia, humans have asserted themselves as superior to other living forms. From … Read more What Distinguishes Us from AI?

Advanced Methods for Automatic Image Quality Assessment

Blind Image Quality Assessment Based on High Order Statistics Aggregation (HOSA) The HOSA methodology is a hybrid algorithm that takes advantage of an unsupervised learning stage that detects similar patches in a set of distorted images. This step is called codebook construction. Then, a second step uses a training dataset to find the similarities between … Read more Advanced Methods for Automatic Image Quality Assessment

TensorFlow.JS — Using JavaScript Web Worker to Run ML Predict Function

This post is about Machine Learning on client-side. I will explain how to run ML model in JavaScript Web Worker. The model was trained in TensorFlow/Keras (using Python) to detect sentiment for a hotel review. I’m JavaScript developer and I feel great when the Machine Learning model runs on client-side (in the browser). I will … Read more TensorFlow.JS — Using JavaScript Web Worker to Run ML Predict Function

The Intuition for the Poisson Distribution Formula

Re-interpreting the Binomial distribution in the time domain As you progress through your apple testing career, you deduce another fact. On average, you seem to be running into λ number of bad apples during your hour long apple testing sessions. Since these λ bad apples are assumed to be uniformly distributed across the 60 minutes … Read more The Intuition for the Poisson Distribution Formula

Futarama: Bender’s NLP

The Data and Exploratory Data Analysis I scraped the scripts from the website TheInfosphere.org using BeautifulSoup, where fans of the show have organized transcripts from the first six seasons of the show. In total, there were about 21,000 lines of script. The three main characters shared the bulk of the lines, at around 3500–4000 apiece, … Read more Futarama: Bender’s NLP

Attention Networks

Types of Attention Models: Global and Local attention(local-m, local-p) Hard and Soft Attention Self-attention Global Attention Model This is the same attention model as discussed above. Input from every source state(encoder) and decoder states prior to the current state is taken into account to compute the output. Below is the diagram for the global attention … Read more Attention Networks

MDLI Report: The Israeli Machine Learning review 2019

A full report on the Israeli Machine Learning landscape including average Salaries, Demographics, Most used libraries and more. This year, just like the last, we circulated a comprehensive survey among Machine & Deep Learning Israel community members. The survey, aimed at data scientists and adjacent roles, gauges professionals’ work conditions, commonly faced challenges, popular developer … Read more MDLI Report: The Israeli Machine Learning review 2019

Bar Chart Race in Python with Matplotlib

~In roughly less than 50 lines of code Bar Chart Race animation showing the 10 biggest cities in the world Bar chart races have been around for a while. This year, they took social media by storm. It began with Matt Navarra’s tweet, which was viewed 10 million times. Then, John Burn-Murdoch created reproducible notebook … Read more Bar Chart Race in Python with Matplotlib

Machine Learning Marketing — 10 Applications for Growing Your Business

Photo by Franki Chamaki on Unsplash The relationship between machine learning and marketing has been flourishing in the past years, giving birth to a new set of strategies and tools that optimize the process. The modern marketer has no choice but to jump on the bandwagon in order to stay competitive and maintain a desirable … Read more Machine Learning Marketing — 10 Applications for Growing Your Business

How important is DATA for your business?

A human body has five sensory organs, each one transmits and receives information from every interaction every second. Today, scientists can determine how much information does a human brain receive and guess what! Humans receive 10 million bitsof information in one second. Similar for a computer when it downloads a document from the web over … Read more How important is DATA for your business?

Python List Comprehension in 3 Minutes and 3 Reasons why you should use it

Let´s create our own animal park to learn how to use List Comprehension List comprehension is a powerful method to create new lists from existing lists. If you start using Python, List Comprehension might look complicated but you should get familiar with it as fast as you can. You can select specific elements from a … Read more Python List Comprehension in 3 Minutes and 3 Reasons why you should use it

Python Data Science & Analytics / Consulting Project Overview

Python is the most dynamic language for data science today. From backend development, to in-depth ML learning and statistical analysis. It is intuitive, flexible, and perhaps most importantly- wildly supported from an open-source perspective. It is fairly easy to learn, and incredibly powerful. It is rivaled perhaps only by R in terms of analytics and … Read more Python Data Science & Analytics / Consulting Project Overview

Basic data structures of xarray

Okay, let’s see some code! # customary importsimport numpy as npimport pandas as pdimport xarray as xr First, we’ll create some toy temperature data to play with: We generated an array of random temperature values, along with arrays for the coordinates latitude and longitude (2 dimensions). First, let’s see how we can represent this data … Read more Basic data structures of xarray

Visualizing Climate Change

Plotly and Dash for creating interactive visual dashboards Photo by Frederik Schönfeldt on Unsplash A couple of months ago, I wrote about dealing with climate change denial as something of an applied data science problem: collecting data, visualizing trends and performing statistical tests to illustrate the extent to which the effects of climate change were … Read more Visualizing Climate Change

Bootstrapping for Inferential Statistics

In real world — we don’t really know about our true population. For that it could be the entire population of planet or past, present and future transactions of a company. We just don’t know the real value of parameter. So we rely on sampling distributions to infer something about the parameter for these large … Read more Bootstrapping for Inferential Statistics

Delivering Business Impact with Analytics, Quickly

The data set deals with agency performance for a set of property and casualty insurance agencies. The data contains, among other things, a list of agencies over the periods 2005–2015, their premiums by product, and losses incurred by product. To have a business impact, it’s important to understand the context of the data and the … Read more Delivering Business Impact with Analytics, Quickly

CelebA Attribute Prediction and Clustering with Keras

The model we are going to build is heavily based on the MobileNetV2 architecture, basically is the same model but without the top classification layers (MobileNetV2 is built to output 1000 class probabilities). First, the model takes one image (3 channels, 224×224 in size) at a time as input, and outputs a vector of probabilities … Read more CelebA Attribute Prediction and Clustering with Keras

Can Monte Carlo Simulations Dispel the ‘Difficult Third Album’?

Monte Carlo simulations are a non-parametric statistical test, which can tell us whether two samples come from the same population (within a given degree of certainty —typically 95%). In other words, they can tell us if the difference between the means of two samples is statistically significant. In this sense, they have a similar function … Read more Can Monte Carlo Simulations Dispel the ‘Difficult Third Album’?

How to Create an Interactive Geographic Map Using Python and Bokeh

Interactive Data Visualization with Choropleth Maps If you are looking for a powerful way to visualize geographic data then you should learn to use interactive Choropleth maps. A Choropleth map represents statistical data through various shading patterns or symbols on predetermined geographic areas such as countries, states or counties. Static Choropleth maps are useful for … Read more How to Create an Interactive Geographic Map Using Python and Bokeh

3 Questions to Ask for Jr. Data Scientist Initial Interviews

1. A.I. vs. Machine Learning Question: “Imagine you need a solution that will navigate you from where you are to the airport. Which method would you use?” Photo by Franck V. on Unsplash More information: Make sure the applicant doesn’t perceive this as a self-driving car problem or something complex. This question is not about … Read more 3 Questions to Ask for Jr. Data Scientist Initial Interviews

What is the best method for Automatic Text Classification?

I decided to perform this comparison “experiment” using a corpus of texts in Italian; in my professional activity I deal almost exclusively with texts in Italian, and, considering that it is often easier to access Natural Language Processing (NLP) tools for English, it is important for me to make this comparison on an Italian corpus, … Read more What is the best method for Automatic Text Classification?

Drawing Architecure: Building Deep Convolutional GAN’s In Pytorch

What is a GAN: General adversarial networks are two neural networks competing against each other to create an output that closely resembles the input. These two networks — the generator and the discriminator— are playing adversarial roles. The generator network creates a new image from random noise based off of the input image. The random … Read more Drawing Architecure: Building Deep Convolutional GAN’s In Pytorch

Avoiding the vanishing gradients problem using gradient noise addition

Neural networks are computational models used to approximate a function that models the relationship between the dataset features x and labels y, i.e. f(x) ≈ y. A neural net achieves this by learning the best parameters θ such that the difference between the prediction f(x; θ) and the label y is minimal. They typically learn … Read more Avoiding the vanishing gradients problem using gradient noise addition

The 5 Most Important Logs An Application Should Write

A breakdown of the types of data used to drive intelligent businesses The usefulness of logs is often underestimated. Most businesses rely on logs solely for the purpose of troubleshooting operational and availability problems. What many people fail to realize is that proactive logging also enables improved business decisions. Business intelligence is directly fueled by … Read more The 5 Most Important Logs An Application Should Write

The best tool for better Recommendations Systems

In a previous article introducing Recommendation Systems, we saw that the tool has evolved enormously in the last year. Emerging as a tool for maintaining a website or application audience engaged and using its services. Usually, Recommendation Systems use our previous activity to make specific recommendations for us (this is known as Content-based Filtering). Now, … Read more The best tool for better Recommendations Systems

Merging with AI: How to Make a Brain-Computer Interface to Communicate with Google using Keras and…

Once you have enough data, it’s time to prepare it for use in machine learning. Combine and preprocess your data as appropriate so that it has the following format: Example Data Table Words are indices from 1 to NumIntervals, which is the sum of SessionDuration/2 over the total number of sessions Terms correspond to the … Read more Merging with AI: How to Make a Brain-Computer Interface to Communicate with Google using Keras and…

Gartner 2019 Hype Cycle for Emerging Technologies. What’s in it for AI leaders?

https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging-technologies-2019/ Gartner’s 2019 Hype Cycle for Emerging Technologies is out, so it is a good moment to take a deep look at the report and reflect on our AI strategy as a company. You can find a brief summary of the complete report here. First of all, and before going into details about the content … Read more Gartner 2019 Hype Cycle for Emerging Technologies. What’s in it for AI leaders?

Turbocharging SVD with JAX

In the previous post, I wrote about the fundamentals of two commonly used dimensionality reduction approaches, singular value decomposition (SVD) and principal component analysis (PCA). I also explained their relationships using numpy. To quickly recap, the singular values (Σ) of a 0-centered matrix X (n samples × m features), equals the square root of its … Read more Turbocharging SVD with JAX

Getting Started with Natural Language Processing: US Airline Sentiment Analysis

Introduction to NLP Dataset Exploration NLP Processing Training Hyperparameter Optimization Resources for Future Learning Natural Language Processing (NLP) is a subfield of machine learning concerned with processing and analyzing natural language data, usually in the form of text or audio. Some common challenges within NLP include speech recognition, text generation, and sentiment analysis, while some … Read more Getting Started with Natural Language Processing: US Airline Sentiment Analysis

Fine-grained Sentiment Analysis in Python (Part 2)

To understand the predictions, the file explainer.py is run for each of the six trained classifiers — this outputs an HTML file with visual content that helps us interpret the models’ feature understanding. From the EDA done in the previous post, we know that classes 1 and 3 are the minority classes in the SST-5 … Read more Fine-grained Sentiment Analysis in Python (Part 2)

Artificial Neural Networks for Total Beginners

Easy and Clear Explanation of Neural Nets (with Pictures!) Machine Learning drives much of the technology we interact with nowadays, with applications in everything from search results on Google to ETA prediction on the road to tumor diagnosis. But despite its clear importance to our every-day life, most of us are left wondering how this … Read more Artificial Neural Networks for Total Beginners

Fine-grained Sentiment Analysis in Python (Part 1)

In this post, we’ll evaluate and compare the results of several text classification results for the 5-class Stanford Sentiment Treebank (SST-5) dataset. “Learning to choose is hard. Learning to choose well is harder. And learning to choose well in a world of unlimited possibilities is harder still, perhaps too hard.” — Barry Schwartz When starting … Read more Fine-grained Sentiment Analysis in Python (Part 1)