Photo by Jantine Doornbos on Unsplash Python represents one of the most popular languages that many people use it in data science and machine learning, web development, scripting, automation, etc. Part of the reason for this popularity is its simplicity and easiness to learn it. If you are reading this, then it is highly likely … Read more 30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less
Deep Learning for solving the most commonly diagnosed cancer in women Photo by Tamara Bellis on Unsplash Breast cancer is the second most common cancer in women and men worldwide. In 2012, it represented about 12 percent of all new cancer cases and 25 percent of all cancers in women. Breast cancer starts when cells … Read more Convolutional Neural Network for Breast Cancer Classification
Efficiently Deploying Naval Resources (If using a smart-phone this article is best viewed in landscape mode) Armed forces in Latin America & the Caribbean are faced with the challenge of having to operate with a multi-dimensional mandate. In times of heightened civil unrest they are required to undertake peace-keeping operations, gang warfare driven by the … Read more Modelling Efficient Military Deployments with Machine Learning — K-Means Clustering in R
An illustration of dogs in Manhattan by Julia Rothman Introduction Whether you are a dog owner or a business in the dog care industry, identifying locations that are friendly to dogs in the city is always an important mission. Among various metropolitans across the world, New York City (“NYC”) is perceived as one of the … Read more Identifying The Most Dog-Friendly Neighborhoods
in Manhattan, New York City
In this post we discuss working of Gaussian process. Gaussian process fall under kernel methods, and are model free. Gaussian process are specially useful for low data regimen to “learn” complex functions. We shall review a very practical real world application (not related to deep learning or neural networks). The discuss follows from the talks … Read more Prior over functions: Gaussian process
Thanks to tools like Azure Databricks, we can build simple data pipelines in the cloud and use Spark to get some comprehensive insights into our data with relative ease. Combining this with the Apache Spark connector for Cosmos DB, we can leverage the power of Azure Cosmos DB to gain and store some incredible insights … Read more Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage
Inspiration: The guy who hit my car and got away with it! Backstory: After a memorable evening with friends as we were about to leave for our home there was something that made that evening even more memorable, a huge dent in my car’s front bumper, seemed it was hit by another vehicle, but who … Read more AI-based Indian license plate detector.
Every week, the New York Times (NYT) publishes a list of the top ten highest-selling novels in each of the fiction and non-fiction categories. As an avid reader, I wanted to determine whether these lists might offer suggestions for future reads (i.e. whether they might cater more strongly to a particular sub-genre) and to finally … Read more Exploring Best-Selling Books of the Last Five Years
Let’s begin the journey of simulating Blackjack by coding it all out in Python. We won’t be going over every single line of code for the sake of brevity but the Github for the code will be available at the end of this article. It’s very likely that something like this has been done before, … Read more Beating the Dealer with Simple Statistics
How modifying three social systems can lead to easier, greener living In discussing global warming, everyone talks about systemic change; few propose ways to actually do it. Here I present the results of some research I’ve engaged in for the last few months, which are initial steps into the kind of societal systems re-thinking we … Read more On Eliminating Rush Hours
We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method. Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it helps to identify congregations of closely related (by … Read more Clustering metrics better than the elbow-method
Breakthroughs in artificial intelligence complicate the belief that our intellect—even our creativity—will remain unrivaled The concept of human distinctiveness is central to our self-understanding; however, time and again, what was believed to be our distinguishing feature proved no more than a mirage. For millennia, humans have asserted themselves as superior to other living forms. From … Read more What Distinguishes Us from AI?
Blind Image Quality Assessment Based on High Order Statistics Aggregation (HOSA) The HOSA methodology is a hybrid algorithm that takes advantage of an unsupervised learning stage that detects similar patches in a set of distorted images. This step is called codebook construction. Then, a second step uses a training dataset to find the similarities between … Read more Advanced Methods for Automatic Image Quality Assessment
Re-interpreting the Binomial distribution in the time domain As you progress through your apple testing career, you deduce another fact. On average, you seem to be running into λ number of bad apples during your hour long apple testing sessions. Since these λ bad apples are assumed to be uniformly distributed across the 60 minutes … Read more The Intuition for the Poisson Distribution Formula
A field guide to the expanding data science universe The data universe is expanding rapidly — it’s time we started recognizing just how big this field is and that working in one part of it doesn’t automatically require us to be experts of all of it. Instead of expecting data people to be able to … Read more Which flavor of data professional are you?
The Data and Exploratory Data Analysis I scraped the scripts from the website TheInfosphere.org using BeautifulSoup, where fans of the show have organized transcripts from the first six seasons of the show. In total, there were about 21,000 lines of script. The three main characters shared the bulk of the lines, at around 3500–4000 apiece, … Read more Futarama: Bender’s NLP
Types of Attention Models: Global and Local attention(local-m, local-p) Hard and Soft Attention Self-attention Global Attention Model This is the same attention model as discussed above. Input from every source state(encoder) and decoder states prior to the current state is taken into account to compute the output. Below is the diagram for the global attention … Read more Attention Networks
A full report on the Israeli Machine Learning landscape including average Salaries, Demographics, Most used libraries and more. This year, just like the last, we circulated a comprehensive survey among Machine & Deep Learning Israel community members. The survey, aimed at data scientists and adjacent roles, gauges professionals’ work conditions, commonly faced challenges, popular developer … Read more MDLI Report: The Israeli Machine Learning review 2019
There are lots of traders use Bollinger Bands. I love Bollinger Bands as well. It uses and brings statistics into the trading world. But how accurate is it? Is it a correct way to use standard deviation in time-series? Let’s find out together! There is no leading indicators As traders, most of us use OHLC … Read more Bollinger Bands Statistics in Trading
~In roughly less than 50 lines of code Bar Chart Race animation showing the 10 biggest cities in the world Bar chart races have been around for a while. This year, they took social media by storm. It began with Matt Navarra’s tweet, which was viewed 10 million times. Then, John Burn-Murdoch created reproducible notebook … Read more Bar Chart Race in Python with Matplotlib
Photo by Franki Chamaki on Unsplash The relationship between machine learning and marketing has been flourishing in the past years, giving birth to a new set of strategies and tools that optimize the process. The modern marketer has no choice but to jump on the bandwagon in order to stay competitive and maintain a desirable … Read more Machine Learning Marketing — 10 Applications for Growing Your Business
A human body has five sensory organs, each one transmits and receives information from every interaction every second. Today, scientists can determine how much information does a human brain receive and guess what! Humans receive 10 million bitsof information in one second. Similar for a computer when it downloads a document from the web over … Read more How important is DATA for your business?
Let´s create our own animal park to learn how to use List Comprehension List comprehension is a powerful method to create new lists from existing lists. If you start using Python, List Comprehension might look complicated but you should get familiar with it as fast as you can. You can select specific elements from a … Read more Python List Comprehension in 3 Minutes and 3 Reasons why you should use it
Calculating Entropy and Information gain by hand This post is second in the “Decision tree” series, the first post in this series develops an intuition about the decision trees and gives you an idea of where to draw a decision boundary. In this post, we’ll see how a decision tree does it. 🙊 Spoiler: It … Read more Decision tree: Part 2
Python is the most dynamic language for data science today. From backend development, to in-depth ML learning and statistical analysis. It is intuitive, flexible, and perhaps most importantly- wildly supported from an open-source perspective. It is fairly easy to learn, and incredibly powerful. It is rivaled perhaps only by R in terms of analytics and … Read more Python Data Science & Analytics / Consulting Project Overview
Okay, let’s see some code! # customary importsimport numpy as npimport pandas as pdimport xarray as xr First, we’ll create some toy temperature data to play with: We generated an array of random temperature values, along with arrays for the coordinates latitude and longitude (2 dimensions). First, let’s see how we can represent this data … Read more Basic data structures of xarray
Plotly and Dash for creating interactive visual dashboards Photo by Frederik Schönfeldt on Unsplash A couple of months ago, I wrote about dealing with climate change denial as something of an applied data science problem: collecting data, visualizing trends and performing statistical tests to illustrate the extent to which the effects of climate change were … Read more Visualizing Climate Change
After finding the final set of 194 tweets, I developed a straightforward custom trading sentiment strategy to backtest on the S&P 500. Before diving into the details of the trading logic, it’s important to cover some key details and assumptions I made prior to developing the trading strategy: The SPDR S&P 500 ETF (Ticker: SPY) … Read more Trump, Tweets, and Trade
In real world — we don’t really know about our true population. For that it could be the entire population of planet or past, present and future transactions of a company. We just don’t know the real value of parameter. So we rely on sampling distributions to infer something about the parameter for these large … Read more Bootstrapping for Inferential Statistics
The data set deals with agency performance for a set of property and casualty insurance agencies. The data contains, among other things, a list of agencies over the periods 2005–2015, their premiums by product, and losses incurred by product. To have a business impact, it’s important to understand the context of the data and the … Read more Delivering Business Impact with Analytics, Quickly
Root Mean Square Error (RMSE) is a standard way to measure the error of a model in predicting quantitative data. Formally it is defined as follows: Let’s try to explore why this measure of error makes sense from a mathematical perspective. Ignoring the division by n under the square root, the first thing we can … Read more What does RMSE really mean?
The model we are going to build is heavily based on the MobileNetV2 architecture, basically is the same model but without the top classification layers (MobileNetV2 is built to output 1000 class probabilities). First, the model takes one image (3 channels, 224×224 in size) at a time as input, and outputs a vector of probabilities … Read more CelebA Attribute Prediction and Clustering with Keras
Monte Carlo simulations are a non-parametric statistical test, which can tell us whether two samples come from the same population (within a given degree of certainty —typically 95%). In other words, they can tell us if the difference between the means of two samples is statistically significant. In this sense, they have a similar function … Read more Can Monte Carlo Simulations Dispel the ‘Difficult Third Album’?
A data set is a collection of data. This is a crucial part to get right since it serves as the foundation which your model will be built upon. If it’s not in the data, a model will not learn what it doesn’t know. To train a machine learning model, a high-quality data set is … Read more X-Ray Image Classification: The Easy Way
Interactive Data Visualization with Choropleth Maps If you are looking for a powerful way to visualize geographic data then you should learn to use interactive Choropleth maps. A Choropleth map represents statistical data through various shading patterns or symbols on predetermined geographic areas such as countries, states or counties. Static Choropleth maps are useful for … Read more How to Create an Interactive Geographic Map Using Python and Bokeh
1. A.I. vs. Machine Learning Question: “Imagine you need a solution that will navigate you from where you are to the airport. Which method would you use?” Photo by Franck V. on Unsplash More information: Make sure the applicant doesn’t perceive this as a self-driving car problem or something complex. This question is not about … Read more 3 Questions to Ask for Jr. Data Scientist Initial Interviews
I decided to perform this comparison “experiment” using a corpus of texts in Italian; in my professional activity I deal almost exclusively with texts in Italian, and, considering that it is often easier to access Natural Language Processing (NLP) tools for English, it is important for me to make this comparison on an Italian corpus, … Read more What is the best method for Automatic Text Classification?
What is a GAN: General adversarial networks are two neural networks competing against each other to create an output that closely resembles the input. These two networks — the generator and the discriminator— are playing adversarial roles. The generator network creates a new image from random noise based off of the input image. The random … Read more Drawing Architecure: Building Deep Convolutional GAN’s In Pytorch
Neural networks are computational models used to approximate a function that models the relationship between the dataset features x and labels y, i.e. f(x) ≈ y. A neural net achieves this by learning the best parameters θ such that the difference between the prediction f(x; θ) and the label y is minimal. They typically learn … Read more Avoiding the vanishing gradients problem using gradient noise addition
A breakdown of the types of data used to drive intelligent businesses The usefulness of logs is often underestimated. Most businesses rely on logs solely for the purpose of troubleshooting operational and availability problems. What many people fail to realize is that proactive logging also enables improved business decisions. Business intelligence is directly fueled by … Read more The 5 Most Important Logs An Application Should Write
In a previous article introducing Recommendation Systems, we saw that the tool has evolved enormously in the last year. Emerging as a tool for maintaining a website or application audience engaged and using its services. Usually, Recommendation Systems use our previous activity to make specific recommendations for us (this is known as Content-based Filtering). Now, … Read more The best tool for better Recommendations Systems
Once you have enough data, it’s time to prepare it for use in machine learning. Combine and preprocess your data as appropriate so that it has the following format: Example Data Table Words are indices from 1 to NumIntervals, which is the sum of SessionDuration/2 over the total number of sessions Terms correspond to the … Read more Merging with AI: How to Make a Brain-Computer Interface to Communicate with Google using Keras and…
https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging-technologies-2019/ Gartner’s 2019 Hype Cycle for Emerging Technologies is out, so it is a good moment to take a deep look at the report and reflect on our AI strategy as a company. You can find a brief summary of the complete report here. First of all, and before going into details about the content … Read more Gartner 2019 Hype Cycle for Emerging Technologies. What’s in it for AI leaders?
When starting to dive into the data world you will see that there are a lot of approaches you can go for and a lot of tools you can use. It may make you feel a little overwhelmed at first. On this post, I will try to help you to understand how to pick the … Read more Building a data pipeline from scratch on AWS
In the previous post, I wrote about the fundamentals of two commonly used dimensionality reduction approaches, singular value decomposition (SVD) and principal component analysis (PCA). I also explained their relationships using numpy. To quickly recap, the singular values (Σ) of a 0-centered matrix X (n samples × m features), equals the square root of its … Read more Turbocharging SVD with JAX
Introduction to NLP Dataset Exploration NLP Processing Training Hyperparameter Optimization Resources for Future Learning Natural Language Processing (NLP) is a subfield of machine learning concerned with processing and analyzing natural language data, usually in the form of text or audio. Some common challenges within NLP include speech recognition, text generation, and sentiment analysis, while some … Read more Getting Started with Natural Language Processing: US Airline Sentiment Analysis
To understand the predictions, the file explainer.py is run for each of the six trained classifiers — this outputs an HTML file with visual content that helps us interpret the models’ feature understanding. From the EDA done in the previous post, we know that classes 1 and 3 are the minority classes in the SST-5 … Read more Fine-grained Sentiment Analysis in Python (Part 2)
Easy and Clear Explanation of Neural Nets (with Pictures!) Machine Learning drives much of the technology we interact with nowadays, with applications in everything from search results on Google to ETA prediction on the road to tumor diagnosis. But despite its clear importance to our every-day life, most of us are left wondering how this … Read more Artificial Neural Networks for Total Beginners
In this post, we’ll evaluate and compare the results of several text classification results for the 5-class Stanford Sentiment Treebank (SST-5) dataset. “Learning to choose is hard. Learning to choose well is harder. And learning to choose well in a world of unlimited possibilities is harder still, perhaps too hard.” — Barry Schwartz When starting … Read more Fine-grained Sentiment Analysis in Python (Part 1)