What is a Deepfake and Why Should You Care?

Deepfakes are the result of a generative adversarial network (GAN) and unsupervised machine learning. After getting initial data, the computer teaches itself henceforth, using a Generator and a Discriminator. The generator creates the initial fake image, audio, or video and sends it over to the Discriminator for examination. If it determines the image is fake, … Read moreWhat is a Deepfake and Why Should You Care?

Automate Data Cleaning with Unsupervised Learning

Cleaning text for your NLP projects has never been so fun and easy! I like working with textual data. As for Computer Vision, in NLP nowadays there are a lot of ready accessible resources and opensource projects, which we can directly download or consume. Some of them are realy cool and permit us to speed … Read moreAutomate Data Cleaning with Unsupervised Learning

Instacart Market Basket Analysis Part 1: Which Grocery Items Are Popular?

Exploratory Data Analysis of Instacart orders via Saturn Cloud As a Data Scientist, a big part of my responsibility is sharing my results with the business stakeholders. As a Data Journalist, a big part of my role is sharing code chunks with my readers. Now the reports that I share need to be rendered properly … Read moreInstacart Market Basket Analysis Part 1: Which Grocery Items Are Popular?

Hierarchical Neural Architecture Search

Many researchers and developers are interested in what Neural Architecture Search can offer their Deep Learning models, but are deterred by monstrous computational costs. Many techniques have been developed to promote more efficient search, notably Differentiable Architecture Search, parameter sharing, predictive termination, and hierarchical representations of architectures. This article will explain the idea of hierarchical … Read moreHierarchical Neural Architecture Search

What to expect from a causal inference business project: an executive’s guide III

Part III: Where causal inference stands in the current AI, Big Data, Data Science, Statistics, and Machine Learning scene? This is the third part of the post “What to expect from a causal inference business project: an executive’s guide”. You will find the second one here. Most of these words have fuzzy meaning, at least … Read moreWhat to expect from a causal inference business project: an executive’s guide III

What to expect from a causal inference business project: an executive’s guide II

Part II: Which are the project key points you need to know This is the second part of the post “What to expect from a causal inference business project: an executive’s guide”. You will find the third part here. Casual inference models how variables affect each other. Based on this information, uses some calculation tools … Read moreWhat to expect from a causal inference business project: an executive’s guide II

What to expect from a causal inference business project: an executive’s guide I

Part I: When do you need casual inference? This is the fifth post on a series about causal inference and data science. The previous one was “Solving Simpson’s Paradox”. You will find the second part of this post here. Causal inference is a new language to model causality to help understand better causes and impacts … Read moreWhat to expect from a causal inference business project: an executive’s guide I

Introduction to Web Scraping with Selenium And Python

Practical tutorial on how to get started with Selenium Web scraping is a fast, affordable and reliable way to get data when you need it. What is even better, the data is usually up-to-date. Now, bear in mind that when scraping a website, you might be violating its usage policy and can get kicked out … Read moreIntroduction to Web Scraping with Selenium And Python

Temporal-Difference learning

Reinforcement Learning using Temporal Difference Learning In this article I will cover Temporal-Difference Learning methods. Temporal-Difference(TD) method is a blend of Monte Carlo (MC) method and Dynamic Programming (DP) method. Below are key characteristics of Monte Carlo (MC) method: There is no model (agent does not know state MDP transitions) agent learn from sampled experience … Read moreTemporal-Difference learning

Monte Carlo Learning

Reinforcement Learning using Monte Carlo Method In this article I will cover Monte Carlo Method of reinforcement learning. I have briefly covered Dynamic programming (Value Iteration and Policy Iteration) method in earlier article. In Dynamic programming we need a model(agent knows the MDP transition and rewards) and agent does planning (once model is available agent … Read moreMonte Carlo Learning

How to master Python’s main data analysis library in 20 Minutes

Image by xresch from Pixabay Now that we are comfortable with filtering and sorting the data front to back and vice versa, let’s move to some more advanced analytical functionalities. Standard Functions: Like the read functions, there are also a lot of analytical functions implemented in Pandas. I will highlight and explain the ones I … Read moreHow to master Python’s main data analysis library in 20 Minutes

Mastering the art of web scraping with Selenium and Python [Part 2/2]

Selenium is a powerful tool for advanced interactions with websites: login, clicks… Let’s use it for web scraping Alright let’s do something ‘simple’ here: collect all the artists available on Spotify. That’s a robot scrolling through Spotify’s catalog of artists ⚠️Obviously, I need to put a disclaimer here ⚠️Don’t use this method to resell data … Read moreMastering the art of web scraping with Selenium and Python [Part 2/2]

12 Best AI & ML Based App Ideas that’ll Make Money in 2020

According to recent research of PWC, 72% of business leaders said they believe AI is going to be fundamental in the future and they termed it a “business advantage”. No doubt, AI is one of the most crucial future technology which is being opted by many businesses — small or big — rapidly. 12 Best … Read more12 Best AI & ML Based App Ideas that’ll Make Money in 2020

Classifying pregnancy test results

My first attempt at Lesson 2 of “Practical Deep Learning for Coders” by fast.ai I’m a math adjunct and aspiring data scientist working through the “Practical Deep Learning for Coders” course by fast.ai (you can read about my experience with Lesson 1 here), and for Lesson 2, we’re to gather a set of images from … Read moreClassifying pregnancy test results

A guide for selecting an appropriate metric for your A/B test

And avoiding the common mistakes that derail most test efforts. This article is the 3rd one in my series of articles about A/B Testing. In the first article, I presented the intuition behind A/B testing and the importance of establishing the magnitude of the effect you hope to observe and corresponding sample size. In the … Read moreA guide for selecting an appropriate metric for your A/B test

5 Best Practices for AI- and Data-Driven Call Centers

Call centers have been revolutionized in the past decade. While some static call scripts and one-size-fits-all strategies still remain, technology has drastically changed the way call centers are capable of functioning. Today, call centers have the unique ability to leverage all available data to drive each customer interaction. These data sources include which digital marketing … Read more5 Best Practices for AI- and Data-Driven Call Centers

4 Ways Automation Is Altering Data Science

Automation has uprooted countless things, and it’ll do more of the same in the future. Data science is one of the things automation changed. Whether you’re already working in the field or identify as an aspiring data scientist, knowing about the changes relevant to automation and data science mentioned here will help you prepare for … Read more4 Ways Automation Is Altering Data Science

Understanding Data Bias

Types and sources of data bias The huge success of applications of machine learning (ML) applications in the past decade — in image recognition, recommendation systems, e-commerce and online advertising — has inspired its adoption in domains such as social justice, employment screening, smart interactive interfaces such as Siri, Alexa, and the like. Along with … Read moreUnderstanding Data Bias

Machine Learning Powered Content Moderation: Computer Vision Applications at Expedia

How to build a highly customized AI framework for content moderation, using state-of-the-arts in deep learning. Authors: Shervin Minaee, Harsh Pathak, Thomas Crook As an online travel agency website, images of a property (e.g. hotel, resorts, apartment, vacation rental) are invaluable references for travel shoppers considering which property they want to book. Expedia Group™️ receives … Read moreMachine Learning Powered Content Moderation: Computer Vision Applications at Expedia

Streamline Model Tuning on Bankruptcy Predictions

Hi everyone, today’s topic will be about streamlining your machine learning models with sklearn, xgboost, and the h2o package. In particular, we will examine predicting bankruptcies of Polish companies using their financial statements. In my early days of machine learning modeling, I always wondered if there was an easier way to tune models. From that … Read moreStreamline Model Tuning on Bankruptcy Predictions

An Introduction to Autonomous Vehicles

A general understanding of what self-driving cars are really about. Lyft’s self-driving car [Source] Every year, there are around 1.25 million deaths caused by road accidents. That’s equivalent to 3,287 deaths on a daily basis! As a teenager just learning how to drive, this is a scary fact that lingers at the back of my … Read moreAn Introduction to Autonomous Vehicles

Math for Data Science: Collaborative Filtering on Utility Matrices

Left: cosine similarity of U1 to all other users; Right: weighted average of ratings for I3 So our predicted rating for U1 and I3 is 4.34! Also note, different similarity metrics would give slightly different results. Item-item collaborative filtering is pretty much the same as user-user, but instead of computing the similarity between users, similarity … Read moreMath for Data Science: Collaborative Filtering on Utility Matrices

The One with all the FRIENDS Analysis

Providing an alternative look at the most looked at show The crew (source) FRIENDS is one of my favourite shows (probably the favourite) and I’m sure I’m not alone in having rewatched the entire series more than once. I’ve always wondered if there was anything left to know about this oh-so familiar group. After seeing … Read moreThe One with all the FRIENDS Analysis

Overloading Operators in Python

…and a bit on overloading methods (but I’ll try not to overload you) Most of us learning to program in Python run into concepts behind operator overloading relatively early during the course of our learning path. But, like most aspects of Python (and other languages; and, for that matter, pretty much anything), learning about overloaded … Read moreOverloading Operators in Python

How Data Analytics is Helping Small Businesses Re-Imagine Growth Opportunities

Check out how Business Intelligence (BI) and data analytics remove uncertainty in business and provide insights that help in decision making and forecasting. Business Intelligence and data analytics are an integral part of any successful business venture. Business analytics has its dedicated market in the industry and is often a sought-after method to skip the … Read moreHow Data Analytics is Helping Small Businesses Re-Imagine Growth Opportunities

Take your Python Skills to the Next Level With Fluent Python

Photo by Bonnie Kittle on Unsplash The intermediate programmer’s ticket to advanced Python You’ve been programming in Python for a while, and although you know your way around dicts, lists, tuples, sets, functions, and classes, you have a feeling your Python knowledge is not where it should be. You have heard about “pythonic” code and … Read moreTake your Python Skills to the Next Level With Fluent Python

Python Vs R: What’s Best for Machine Learning

Are you thinking to build a machine learning project and stuck between choosing the right programming language for your project? Well, then this article is going to help you clear the doubts related to the characteristics of Python and R. Let’s get started with the basics. R and Python both share similar features and are … Read morePython Vs R: What’s Best for Machine Learning

Confusion Matrix and Class Statistics

Co-author: Maarit Widmann With this post I am going back to the classics. Here explained the confusion matrix and some accuracy measures associated with it. — — — — — — — A classification model assigns data to two or more classes. Sometimes, detecting one or the other class is equally important and bears no … Read moreConfusion Matrix and Class Statistics

Neural Network for Satellite Data Classification Using Tensorflow in Python

A step-by-step guide for Landsat 5 multispectral data classification Deep Learning has taken over the majority of fields in solving complex problems, and the geospatial field is no exception. The title of the article interests you and hence, I hope that you are familiar with satellite datasets; for now, Landsat 5 TM. Little knowledge of … Read moreNeural Network for Satellite Data Classification Using Tensorflow in Python

A.I. For Filmmaking

Recognising Cinematic Shot Types with a ResNet-50 Originally published at https://rsomani95.github.io.Visit the link for a better formatted, interactive version of the post with many more images. GitHub: https://github.com/rsomani95/shot-type-classifier –What is Visual Language, and Why Does it Matter? –Neural Networks 101 (Read if you don’t know what neural networks are) –The Dataset — — Data Sources … Read moreA.I. For Filmmaking

Ace Deep Learning in a Service-Based Organization

Most service-based companies suffer from a phobia of the word “Product”. So service-based companies develop POCs instead. Let’s first understand what is what: Product is an article or substance that is manufactured or refined for sale. Proof of Concept (POC) is a miniature representation of the end-product, with a few working features, that aims of … Read moreAce Deep Learning in a Service-Based Organization

Survival Modeling — Accelerated Failure Time — Xgboost

Survival analysis is a “censored regression” where the goal is to learn time-to-event function. This is similar to the common regression analysis where data-points are uncensored. Time-to-event modeling is critical for understanding users/companies behaviors not limited to credit, cancer, and attrition risks. Cox-Proportional Hazard model is a semi-parametric model where we model hazard ratio using … Read moreSurvival Modeling — Accelerated Failure Time — Xgboost

Using the Pandas “Resample” Function

The next best thing to changing the past — aggregating it. A technical introduction to the pandas resample function. This article is an introductory dive into the technical aspects of the pandas resample function for datetime manipulation. I hope it serves as a readable source of documentation for those less inclined to digging through the … Read moreUsing the Pandas “Resample” Function

Text Analytics: Wine Classification and Recommendation by Tasting Notes

Introduction Wine selection is complicated and personal. The one you love doesn’t necessarily depend on the rating or price. It depends on your personal taste. As a wine lover, I am always overwhelmed by the vast amount of options at wine stores or restaurants . What if you can have a virtual personal sommelier who … Read moreText Analytics: Wine Classification and Recommendation by Tasting Notes

Sentiment Analysis for Hotel Reviews

Whether you like it or not, guest reviews are becoming a prominent factor affecting people’s bookings/purchases. Think about your past experience. When you were looking for a place to stay for a vacation on Expedia/Booking/TripAdvisor, what did you do? I am willing to bet you’d be scrolling down the screen to check on the reviews … Read moreSentiment Analysis for Hotel Reviews

The thin line between data science and data engineering

Editor’s note: This is the fourth episode of the Towards Data Science podcast “Climbing the Data Science Ladder” series, hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: If you’ve been following developments in data science over the … Read moreThe thin line between data science and data engineering

5 Steps to Amazing Visualizations with Matplotlib

Matplotlib sucks. By default. But you can tweak the hell out of it. We’ve all been there. Matplotlib is imported, your dataset is prepared, and you are ready to make some astonishing visualization. Pretty soon, the harsh reality of potato-looking default Matplotlib charts hits you in the face. Damn. A couple of weeks ago I’ve … Read more5 Steps to Amazing Visualizations with Matplotlib

Long-Run Relationships between FANG Stocks

Impulse Response Functions generated by me using the statsmodels library in python Interpreting VECMs In a previous article, I created a model to predict the closing share prices of FANG stocks using a vector error correction model (VECM), which models cointegrated time series. I touched on some topics I wanted to expand on, namely interpreting … Read moreLong-Run Relationships between FANG Stocks

Why do we use word embeddings in NLP?

Natural language processing (NLP) is a sub-field of machine learning (ML) that deals with natural language, often in the form of text, which is itself composed of smaller units like words and characters. Dealing with text data is problematic, since our computers, scripts and machine learning models can’t read and understand text in any human … Read moreWhy do we use word embeddings in NLP?

Why Identity Management is a Prerequisite for Enterprise AI-ML on the Cloud

Security concerns have stalled enterprise AI/ML on the cloud. Identity-based security addresses the concerns. Here is what you need to know. Why has enterprise AI/ML on the cloud stalled? Mostly, because of valid security concerns. But the march for AI/Machine Learning on the cloud is inevitable. And AI/ML’s voracious appetite for data — much of … Read moreWhy Identity Management is a Prerequisite for Enterprise AI-ML on the Cloud

Productionizing NLP Models

After we were done making the project we had many common utilities which can be used for any projects. Innersourcing Numbers are in % of total project time. This can vary for projects. Innersourcing allows an ecosystem of contributors to develop and use reusable components for everyone. We observed that good software engineering takes way … Read moreProductionizing NLP Models

What Data Science can tell us about mainstream music…

A k-means clustering of top 100 artists on Spotify The sound of music has evolved with society over the years, thus the innovation in sound is really a reflection of our cultural and technological progression. Music that is relevant to the mainstream population now would’ve been impossible to create a couple of years ago because … Read moreWhat Data Science can tell us about mainstream music…

Automate Hyperparameter Tuning for your models

Because your time is more important than the machine When we create our machine learning models, a common task that falls on us is how to tune them. People end up taking different manual approaches. Some of them work, and some don’t, and a lot of time is spent in anticipation and running the code … Read moreAutomate Hyperparameter Tuning for your models

Tutorial on Variational Graph Auto-Encoders

Graphs are applicable to many real-world datasets such as social networks, citation networks, chemical graphs, etc. The growing interest in graph-structured data increases the number of researches in graph neural networks. Variational autoencoders (VAEs) embodied the success of variational Bayesian methods in deep learning and have inspired a wide range of ongoing researches. Variational graph … Read moreTutorial on Variational Graph Auto-Encoders

A newbie’s guide to build your own deep learning box

Photo by Sai Kiran Anagani on Unsplash There’re many online articles guiding the users how to build their own deep learning box, like https://blog.slavv.com/the-1700-great-deep-learning-box-assembly-setup-and-benchmarks-148c5ebe6415https://medium.com/adventures-in-high-density/the-3700-deep-learning-box-purchase-assembly-and-setup-458900680eb0 They showed what computer parts you should purchase and also how to assemble it, and the remaining setup. But for a newbie like me, it’s still a lot of trouble to … Read moreA newbie’s guide to build your own deep learning box