Optimizing Feature generation

Feature generation is the process of creating new features from one or multiple existing features, potentially for using in statistical analysis. This process adds new information to be accessible during the model construction and therefore hopefully result in more accurate model. In this article I describe how to used feature interaction detection algorithm based on … Read more Optimizing Feature generation

Decision Tree from Scratch in Python

Decision trees are among the most powerful Machine Learning tools available today and are used in a wide variety of real-world applications from Ad click predictions at Facebook¹ to Ranking of Airbnb experiences. Yet they are intuitive, easy to interpret — and easy to implement. In this article we’ll train our own decision tree classifier … Read more Decision Tree from Scratch in Python

Discerning Odors Using Machine Learning

Hey Google, what does this smell like? Deep learning has made many advances in sight — using computer vision to identify objects, detect cancer in cells, and self-driving cars. It has also made many advances in sound — live captioning, AI generated music, and offline speech recognition are some examples. It is because of these … Read more Discerning Odors Using Machine Learning

The Most Important Supreme Court Decision For Data Science and Machine Learning

Google Books ruled legal in massive win for fair use (updated), Ars Technica Nov 14 2013. Google Wins: Court Issues a Ringing Endorsement of Google Books, Publishers Weekly, Nov 14, 2013. Google book-scanning project legal, says U.S. appeals court, Reuters, October 16, 2015. “We trust that the Supreme Court will see fit to correct the … Read more The Most Important Supreme Court Decision For Data Science and Machine Learning

Amazon DocumentDB (with MongoDB compatibility) is now available in the Europe (Paris) region

Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads.   You can use Amazon DocumentDB in the following AWS regions: US East (N. Virginia, Ohio), US West (Oregon), Europe (Paris, Ireland, Frankfurt, London), and Asia Pacific (Mumbai, Singapore, Sydney, Tokyo, Seoul). For more information on AWS … Read more Amazon DocumentDB (with MongoDB compatibility) is now available in the Europe (Paris) region

Getting Your Hands Dirty with TensorFlow 2.0 and Keras API

TensorFlow 2.0 comes with Keras packaged inside, there is no need to import Keras as a separate module (although you can do this if you need). TensorFlow 2.0 API is simplified and improved. This is good news for us — Machine Learning developers. This is how you import Keras now, from TensorFlow: from tensorflow import … Read more Getting Your Hands Dirty with TensorFlow 2.0 and Keras API

Imagineering & Resurrections

Generative design technologies can be used to emulate and reconfigure things that already exist. There has already been much discussion on the impact of “deepfakes” — an application of deep learning that creates fake photos, videos, and writing based on their real counterparts. But there’s been less discussion on how entire works might be pulled … Read more Imagineering & Resurrections

How GCP helps you take command of your threat detectionHow GCP helps you take command of your threat detectionCloud Developer AdvocateProduct Manager

Why do we keep talking about security all the time? Why hasn’t anyone just gone and fixed it? You’ve probably heard these questions, whether from your leadership, or a board member, or just from friends. Then you labor at explaining why security in the cloud is so complex and challenging, the constant arms race, and … Read more How GCP helps you take command of your threat detectionHow GCP helps you take command of your threat detectionCloud Developer AdvocateProduct Manager

The Perfect Formula for FinTech Products: CX = ML + UX

The new challenge for mature FinTechs is to offer less with more and I have the formula to solve that. When the FinTech industry was just coming of age, most companies launched with one simple offering: a mobile wallet, debit card, or spare change investing app. Those products were a hit with users who were … Read more The Perfect Formula for FinTech Products: CX = ML + UX

Keep Parquet and ORC from the data graveyard with new BigQuery featuresKeep Parquet and ORC from the data graveyard with new BigQuery featuresProduct Manager, Google BigQuery

“At Pandora, we have petabytes of data spread across multiple Google Cloud storage services; accordingly, we expect BigQuery’s federated query capability to be a useful tool for integrating our diverse data assets into a unified analytics ecosystem,” says Greg Kurzhals, product manager at Pandora. “The support for Parquet and other external data source formats will … Read more Keep Parquet and ORC from the data graveyard with new BigQuery featuresKeep Parquet and ORC from the data graveyard with new BigQuery featuresProduct Manager, Google BigQuery

DataOps and data science at enterprise scale

Editor’s note: This is the 11th episode of the Towards Data Science podcast’s “Climbing the Data Science Ladder” series, hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: One thing that you might not realize if you haven’t … Read more DataOps and data science at enterprise scale

Automating bits and pieces of your daily life

Being an avid techie and problem solver, my mind is always looking out for opportunities to apply what I’ve learnt. Other than during my time in internships, I haven’t really put my school fees to good use. Until one fateful day, a notification showed up on my phone: My mum’s daily routine of collating meal … Read more Automating bits and pieces of your daily life

Intuition behind model fitting: Overfitting v/s Underfitting

Before we dive into the idea behind over and under fitting, let’s try to understand behind the scenes working of a Machine Learning model. What do you think happens when you give data to a machine learning model? Imagine a black room filled with white dots. Since a room implies 3 dimensions (height, width and … Read more Intuition behind model fitting: Overfitting v/s Underfitting

Business intelligence applied to a user engagement problem

One key area where companies often become concerned and willing to exploit data to get guidance on where the problems are located and how they can be solved is related to consumer retention or user engagement. So, let’s imagine that we are analysts working for a technological company whose most important KPI revolves around how … Read more Business intelligence applied to a user engagement problem

EPL Fantasy GW10 Recap and GW11 Algorithm Picks

Our Moneyball approach to the Fantasy EPL (team_id: 2057677) If this is the first time you land on one of my Fantasy EPL Blogs, you might want to check out Part1, Part2, Part3, Part5, and Part9 first to get familiar with our overall approach and the improvements we’ve made over time. My partner in crime … Read more EPL Fantasy GW10 Recap and GW11 Algorithm Picks

Tensorflow 2.0 Data Transformation for Text Classification

A complete end-to-end process for classifying text In this article, we will utilize Tensorflow 2.0 and Python to create an end-to-end process for classifying movie reviews. Most Tensorflow tutorials focus on how to design and train a model using a preprocessed dataset. Typically preprocessing the data is the most time-consuming part of an AI project. … Read more Tensorflow 2.0 Data Transformation for Text Classification

Using Python and Selenium to Automate Filling Forms and Mouse Clicks

For this example, we are going to log in thru Instagram’s website app. I actually use Python and Selenium on my daily workflow. At the company I work for we have our own web application that sends reports online. With each report we have an account. Since we have customers onboarding each day we also … Read more Using Python and Selenium to Automate Filling Forms and Mouse Clicks

Offensive Programming in action (part III)

[This article was first published on NEONIRA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the third post on offensive programming, dedicated to using offensive programming … Read more Offensive Programming in action (part III)

Come on, Lint a little: cleaning up your code with Linters

You know what they say, “It’s what’s on the inside that counts.” While this may be true, you still have to find a way to express what’s inside so that people can understand it. You might have the most functional piece of code you’ve ever written. It might be elegant, thorough, and foolproof. But it … Read more Come on, Lint a little: cleaning up your code with Linters

It’s Been a Bad Year for Smartphone Facial Recognition Technologies

Photo by Jyotirmoy Gupta on Unsplash It was recently announced that the Google Pixel 4 would be replacing its fingerprint technology with facial recognition technology to increase security. This was one of the most exciting selling points for the model — but it didn’t take long to discover the loopholes. It was quickly discovered that … Read more It’s Been a Bad Year for Smartphone Facial Recognition Technologies

AWS App Mesh is now available in Europe (Paris) Region

AWS App Mesh is a service mesh that provides application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure. App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high-availability for your applications.  Favorite

Scraping Hansard with Python and BeautifulSoup

Packages required These are the packages I used import csvfrom bs4 import BeautifulSoupimport pandas as pdimport requests csv allows you to manipulate and create csv files. BeautifulSoup is the web scraping library. Pandas will be used to create a dataframe to put our results into a table. Requests is used to send HTTP requests; to … Read more Scraping Hansard with Python and BeautifulSoup

Calculating String Similarity in Python

As before, let’s start with some basic definition: Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.[2] By Giphy: https://giphy.com/gifs/CiYImHHBivpAs/html5 I know, it’s not the cleanest of definitions, but I find it good enough. It requires some math knowledge, … Read more Calculating String Similarity in Python

Introduction to Sequence Modeling Problems

In sequence learning problems, we know that the true output at timestep ‘t’ is dependent on all the inputs that the model has seen up to the time step ‘t’. Since we don’t know the true relationship, we need to come up with an approximation such that the function would depend on all the previous … Read more Introduction to Sequence Modeling Problems

TensorFlow Enterprise makes accessing data on Google Cloud faster and easierTensorFlow Enterprise makes accessing data on Google Cloud faster and easierDeveloper Programs Engineer, Google Cloud AI

Data is at the heart of all AI initiatives. Put simply, you need to collect and store a lot of it to train a deep learning model, and with the advancements and increased availability of accelerators such as GPUs and Cloud TPUs, the speed of getting the data from its storage location to the training … Read more TensorFlow Enterprise makes accessing data on Google Cloud faster and easierTensorFlow Enterprise makes accessing data on Google Cloud faster and easierDeveloper Programs Engineer, Google Cloud AI

Business Strategy For Data Scientists

But most of us will hopefully work for growing, high potential companies (less stress of being laid off). Thus, let’s take a look at the two primary deciders of whether our firm will achieve liftoff and ultimately become profitable and successful — customer lifetime value and customer acquisition cost. Photo by Chevanon Photography from Pexels … Read more Business Strategy For Data Scientists

A brief primer on Variational Inference

[This article was first published on Fabian Dablander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Bayesian inference using Markov chain Monte Carlo methods can be notoriously slow. … Read more A brief primer on Variational Inference

Intro to Support Vector Machines with a Trading Example

We’ll begin by gathering our data. We’ll use a time period going back about five years, October 28, 2014 to October 28, 2019. The stocks that we will get data for are the components of the Dow Jones Industrial Average. Yahoo Finance used to be really easy to get data from, but most packages no … Read more Intro to Support Vector Machines with a Trading Example

AWS Athena helps to find the worst place to park your car in Portland.

After visiting Portland, OR last weekend I’ve decided to explore some publicly available datasets about the city. In this post, we are going to calculate the number of incidents related to vehicles (theft from or theft of a vehicle) and the number of parking spots in each Portland neighborhood using Athena geo queries. After that, … Read more AWS Athena helps to find the worst place to park your car in Portland.

How vital are powerful graphics for Data-Science?

The saving grace of the GPU comes in the form of packages designed with CUDA in mind. Most statistical machine-learning operations involve moving high volumes of data in the form of numerical matrices and computing values in real time. Luckily, this is precisely what a graphics card is designed to do. Although there are short-comings … Read more How vital are powerful graphics for Data-Science?

Colaboratory + Drive + Github -> the workflow made simpler

This post is a continuation of our earlier attempt to make the best of the two worlds, namely Google Colab and Github. In short, we tried to map the usage of these tools in a typical data science workflow. Although we got it to work, the process had its drawbacks: It relied on relative imports, … Read more Colaboratory + Drive + Github -> the workflow made simpler

You Shouldn’t Call Yourself a Data Scientist if You Don’t Know This

Today I want to break down the central limit theorem and how it relates to so much of the work that a data scientist performs. First things first, a core tool to any data scientist is a very simple chart type called a histogram. While you’re sure to have seen many a histogram, we often … Read more You Shouldn’t Call Yourself a Data Scientist if You Don’t Know This

The Mysterious Case of the Ghost Interaction

This spooky post was written in collaboration with Yoav Kessler (@yoav_kessler) and Naama Yavor (@namivor).. Experimental psychology is moving away from repeated-measures-ANOVAs, and towards linear mixed models (LMM). LMMs have many advantages over rmANOVA, including (but not limited to): Analysis of single trial data (as opposed to aggregated means per condition). Specifying more than one … Read more The Mysterious Case of the Ghost Interaction

81st TokyoR Meetup Roundup: A Special Session in {Shiny}!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As another sweltering summer ends, another TokyoR Meetup! With globalwarming in … Read more 81st TokyoR Meetup Roundup: A Special Session in {Shiny}!

Location, Location, Location! in data science

Location, Location, Location! You have heard this many times. It is a common mantra in real estate. Does that apply in data science as well? How do we embrace the location component in Data Science? Is it only another column in your dataset? Or perhaps spatial is special. Location data (big data) is ubiquitous as … Read more Location, Location, Location! in data science

Using Python To Create a Slack Bot

Photo by Lenin Estrada on Unsplash Working at a startup, we needed to automate messages in order to get notified of certain events and triggers. For example, the company I work with deals with connections to certain stores. If that connection is broken Python will read that information in our database. We can now send … Read more Using Python To Create a Slack Bot

Natural Language Understanding — Core Component of Conversational Agent

We are living in an era where messaging apps deal with all sorts of our daily activities, and in fact, these apps have already overtaken social networks as can be indicated in the BI Intelligence Report. In addition to this clear point, the consumption of messaging platforms is further expected to grow significantly in the … Read more Natural Language Understanding — Core Component of Conversational Agent

Power of XGBoost & LSTM in Forecasting Natural Gas Price

FORECASTING cost model is a prerequisite to the development and validation of new optimization methods and control tools. Here, I will show a simple yet powerful approach of forecasting using machine learning algorithms. Psychics and fortune tellers have used Tarot cards for hundreds of years, and Trusted Tarot will give us an accurate reading that’s … Read more Power of XGBoost & LSTM in Forecasting Natural Gas Price