How Sainsbury’s is generating new insights into how the world eatsHow Sainsbury’s is generating new insights into how the world eatsManaging Director, UKI, Google Cloud

Retail will forever be an industry that must constantly reinvent itself in response to, and anticipation of, ever-changing consumer demands. Digital transformation is fueling these changes and we’ve previously spoken about how businesses including Ulta Beauty and Kohl’s are taking advantage of Google Cloud to put data at the center of what they do and … Read more How Sainsbury’s is generating new insights into how the world eatsHow Sainsbury’s is generating new insights into how the world eatsManaging Director, UKI, Google Cloud

Data collection might not be as easy as it might seem

In-depth exploration of data collection processes Some of my most popular repositories on GitHub have been about data collection, either through web scraping or using an Application Programming Interface (API). My approach had always been to find a resource from where I can get the data and then directly start fetching it. After collecting the … Read more Data collection might not be as easy as it might seem

What can Machine Learning Tell Us About America’s Gun Laws?

In-Depth Analysis A 25-year analysis reveals surprising insights. In the United States, it seems we never have to go more than a few weeks without hearing about another mass shooting. With each new incident comes renewed calls to strengthen gun control laws, expand federal background checks, and get rid of assault rifles. Though the opposing … Read more What can Machine Learning Tell Us About America’s Gun Laws?

Why is Machine Learning Deployment Hard?

After several AI projects, I realized that deploying Machine Learning (ML) models at scale is one of the most important challenges for companies willing to create value through AI, and as models get more complex it’s only getting harder. Based on my experience as a consultant, only a very small percentage of ML projects make … Read more Why is Machine Learning Deployment Hard?

Pre-defined sparsity for reducing complexity in neural networks

Neural networks are quite the rage nowadays. They make deep learning possible, which powers smart systems such as speech recognition and self-driving cars. These cool end results don’t really reflect the gory complexity of most modern neural networks, which have many millions of parameters needing to be trained to make the system smart. Training costs … Read more Pre-defined sparsity for reducing complexity in neural networks

A Brief Primer on Optimization to Unlock the Universe

What is shared between artificial intelligence, machine learning and operations research? Optimization, optimization, optimization… Have you ever wondered what is behind all of these crazy deep learning algorithms and machine learning papers? At the core, it is all about optimization, i.e. fitting parameters to minimize or maximize a certain objective function. In this article, I’ll … Read more A Brief Primer on Optimization to Unlock the Universe

Multi-lingual Chatbot Using Rasa and Custom Tokenizer

By default, Rasa framework provides us with four built-in tokenizer: Whitespace Tokenizer Jieba Tokenizer (Chinese) Mitie Tokenizer Spacy Tokenizer Built-in Tokenizer If you are testing it on the Chinese language, you can simply change the tokenizer name in config.yml file to the following and you are good to go. language: zhpipeline:- name: “JiebaTokenizer”- name: “RegexFeaturizer”- … Read more Multi-lingual Chatbot Using Rasa and Custom Tokenizer

How to run your Jupyter Notebook on the cloud in 5 easy steps

The Wreckers already said it: Why do they make it hard to love you? Why can’t they even start to try? ’Cause now I feel the bridge is burnin’ And all the smoke is in my eyes And yes, that bridge is your personal computer and machine learning; it’s really hard to love indeed when … Read more How to run your Jupyter Notebook on the cloud in 5 easy steps

Goodbye, Disqus! Hello, Utterances!

Removing Disqus from my blogdown blog had been on my mind for a while,ever since I saw Bob Rudis’ tweet enjoining Noam Ross to not useit for hisbrand-new website.The same Twitter thread introduced me toUtterances, a “lightweightcomments widget built on GitHub issues”, which I have at last installedto my blog in lieu of Disqus. How … Read more Goodbye, Disqus! Hello, Utterances!

Building Data Science Infrastructure at an Enterprise Level with RStudio and ProCogia

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’re hosting a free, half-day event with one of our Full Service … Read more Building Data Science Infrastructure at an Enterprise Level with RStudio and ProCogia

Part I: Operationalizing R models with Dash Enterprise and Microsoft Azure

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While R offers excellent support for machine learning, the process … Read more Part I: Operationalizing R models with Dash Enterprise and Microsoft Azure

New vtreat Documentation (Starting with Multinomial Classification)

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Nina Zumel finished some great new documentation showing how to … Read more New vtreat Documentation (Starting with Multinomial Classification)

ODSC West 2019 Talks and Workshops to Expand and Apply R Skills (20% discount)

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Go HERE to learn more about the ODSC West 2019 conference with a … Read more ODSC West 2019 Talks and Workshops to Expand and Apply R Skills (20% discount)

From Facebook to startups: data science is becoming an engineering problem

Editor’s note: This is the seventh episode of the Towards Data Science podcast’s “Climbing the Data Science Ladder” series, hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: If you’ve followed our podcast, you’ll know that a clear … Read more From Facebook to startups: data science is becoming an engineering problem

Laravel and Vue.js: Why Is This Couple Getting Popular?

You may be wondering how VueJS and Laravel could have anything to do with each other. VueJS is a Javascript framework and Laravel is a PHP framework, and there can’t possibly be any way that they could serve any purpose to each other, or could they? And remember, you can always hire a developer if … Read more Laravel and Vue.js: Why Is This Couple Getting Popular?

One survey, 100 decks and 1,000 slides in 10 minutes

Automating survey data analysis with open source software – Part II The entire research team is assembled in a meeting room. It’s 10pm. The coffee machine is gurgling, announcing that a fresh pot has been brewed. There are still at least four hours of work left. You know this because you’ve been here before; as … Read more One survey, 100 decks and 1,000 slides in 10 minutes

Five levels of analytical automation

I have been thinking more about how programming that requires minimal human input is a virtue in computer science, and hence machine learning, circles. Although there’s no doubt that is one of the central goals of programming a computer in general, I’m not convinced this extends to data analysis, which needs some thought, contextual knowledge … Read more Five levels of analytical automation

Super Solutions for Shiny Architecture 2/5: Javascript Is Your Friend

TL;DR Three methods for using javascript code in Shiny applications to build faster apps, avoid unnecessary re-rendering, and add components beyond Shiny’s limits. Part 2 of a five part series on super solutions for Shiny architecture.  Why Javascript + Shiny?  Many Shiny creators had a data science background, and not a programming background and are … Read more Super Solutions for Shiny Architecture 2/5: Javascript Is Your Friend

Notes from a panel II: Value of successful BBSRC grants

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This post follows on from the last post on BBSRC Responsive … Read more Notes from a panel II: Value of successful BBSRC grants

Learning Data Science: The Supermarket knows you are pregnant before your Dad does

A few month ago I posted about market basket analysis (see Customers who bought…), in this post we will see another form of it, done with Logistic Regression, so read on… A big supermarket chain wanted to target (wink, wink) certain customer groups better. In this special case we are talking about pregnant women. The … Read more Learning Data Science: The Supermarket knows you are pregnant before your Dad does

Fast adaptive spectral clustering in R (brain cancer RNA-seq)

Spectral clustering refers to a family of algorithms that cluster eigenvectors derived from the matrix that represents the input data’s graph. An important step in this method is running the kernel function that is applied on the input data to generate a NXN similarity matrix or graph (where N is our number of input observations). … Read more Fast adaptive spectral clustering in R (brain cancer RNA-seq)

The Carbon Footprint of AI Research

What is the cost of making good machine learning models, doing AI research? Let’s do some rough back of the envelope calculations. Let’s face it, research in AI is based on computing power. Most of the improvements in last years reused very old ideas from mathematics and machine learning with more computing power. The amount … Read more The Carbon Footprint of AI Research

Conversational Sentiment Analysis

Methods for Determining Sentiment Towards Named Entities I recently built a movie recommender that takes as input a user written passage about liked and/or disliked movies. At the onset of the project I figured that determining which movies users’ liked and disliked would be simple. After all, using text to determine whether someone likes or … Read more Conversational Sentiment Analysis

Web Scraping and Visualizing Chess Data

I’ve been learning about web scraping and data visualization mainly through articles published on this site, and while I’ve found dozens of articles that give a quick intro to web scraping, very few go beyond that. I hope to provide suggestions on how to find data within a site’s html source code, how to scrape … Read more Web Scraping and Visualizing Chess Data

Foundations of AI

In their 1995 classic Artificial Intelligence: A Modern Approach, Berkeley’s Stuart J. Russell and Google’s Peter Norvig broke AI into five distinct research areas originating from the Total Turing test: Machine Learning Expert Systems Computer Vision Natural Language Processing Robotics Though the lines between each of these five disciplines has started to blur as we’ve … Read more Foundations of AI

New package: GetQuandlData

Example 01 – Inflation in the US Let’s download and plot information about inflation in the US: library(GetQuandlData) library(tidyverse) my_id <- c(‘Inflation USA’ = ‘RATEINF/INFLATION_USA’) my_api <- readLines(‘~/Dropbox/.quandl_api.txt’) # you need your own API (get it at https://www.quandl.com/sign-up-modal?defaultModal=showSignUp>) first_date <- ‘2000-01-01’ last_date <- Sys.Date() df <- get_Quandl_series(id_in = my_id, api_key = my_api, first_date = first_date, … Read more New package: GetQuandlData

A R(API)D assessment of travel carbon emissions around the world

As a climate-conscious consumer, I’ve often wondered about the environmental impact of my routine travel from the East Coast to Salt Lake City and back. After seeing one of many recent headlines highlighting air travel’s surprisingly high rates of carbon emissions, I wondered: would taking a train, bus, or driving a car be better for … Read more A R(API)D assessment of travel carbon emissions around the world

You Say You Want a (Data) Revolution

Lessons learned making data a first-class citizen in enterprise Photo by Paul Skorupskas on Unsplash When subscription-based business models started to be all the rage, companies began to realize that being customer-centric is critical for survival — and success. At Gainsight, which helped create and champion the customer success category and where I was the … Read more You Say You Want a (Data) Revolution

Calibration Techniques of Machine Learning Models

CALIBRATION is a post-processing technique to improve error distribution of a predictive model. The evaluation of machine learning (ML) models is a crucial step before deployment. It is essential to assess how well a model will behave for every single case. In many real applications, along with mean error of the model, it is also … Read more Calibration Techniques of Machine Learning Models

Algorithmic Beauty: An Introduction to Cellular Automata

An overview of simple algorithms that generate complex, life-like results. The famous rule 30; capable of generating pseudo-random numbers from simple/deterministic rules. Rule 30 was discovered by Stephan Wolfram in ’83. Cellular Automata (CA) are simultaneously one of the simplest and most fascinating ideas I’ve ever encountered. In this post I’ll go over some famous … Read more Algorithmic Beauty: An Introduction to Cellular Automata

AWS Direct Connect Support for AWS Transit Gateway is now Available in Asia Pacific (Sydney) Region

AWS Direct Connect gateway allows you to access any AWS Region (except China) using your AWS Direct Connect connections. You can associate up to three Transit Gateways from any AWS Region with each Direct Connect gateway. AWS Direct Connect is introducing a new type of virtual interface called the transit virtual interface to support connectivity … Read more AWS Direct Connect Support for AWS Transit Gateway is now Available in Asia Pacific (Sydney) Region

How to Start a Data Science Project That Will Help You Stand Out

A practical way of looking for impactful projects to break into the field Someday in the future, I see myself digging into messy data to find answers and discover important strategic insights for a company. For many aspiring Data Analysts like me, the road to that destination is under construction. Knowing that you are competing … Read more How to Start a Data Science Project That Will Help You Stand Out

Give some semantic love to your keyword search!

Image source: https://ebiquity.umbc.edu At first, search engines (Google, Bing, Yahoo, etc.) were lexical: the search engine looked for literal matches of the query words, without an understanding of the query’s meaning and only returning links that contained the exact query. But, with the advent of machine learning and new techniques in the field of Natural … Read more Give some semantic love to your keyword search!

How to Learn Data Science for Free

Technical skills The first part of the curriculum will focus on technical skills. I recommend learning these first so that you can take a practical first approach rather than say learning the mathematical theory first. Python is by far the most widely used programming language used for data science. In the Kaggle Machine Learning and … Read more How to Learn Data Science for Free

Semantic segmentation : visualization of learning progress by TensorBoard

Building and training of neural networks is not a straightforward process unless you play with the MNIST dataset, kind-of “Hello world” application in the deep learning world. It is very easy to commit a mistake and spend days wondering why the network does not have a performance you expected. Normally, deep learning libraries have some … Read more Semantic segmentation : visualization of learning progress by TensorBoard

AI is “Smarter” than you, you’re“Better” than it, but you’re maybe not “Smart” enough to know it.

Photo by Rock’n Roll Monkey on Unsplash Provocative Controversy is the best. It’s this neat thing we do internally by feeling some logical truth which translates into some emotional sentiment and we call it “things” and make T-shirts out of it. Brains are funny little squishy meatballs, capable of imagining things bigger than our universe … Read more AI is “Smarter” than you, you’re“Better” than it, but you’re maybe not “Smart” enough to know it.

3 Essential Python Skills for Data Scientists

Lambda functions are just so powerful. Yeah, you won’t use them when you have to clean multiple columns the same way — but that’s not something that happened to me very often — more often than not, each attribute will require its own logic behind cleaning. Lambda functions allow you to create ‘anonymous’ functions. This … Read more 3 Essential Python Skills for Data Scientists

Working with VSCode and Jupyter Notebook Style

If you are getting started with machine learning algorithms, you will come across Jupyter Notebook. To maximize efficiency you can integrate its concept with VS Code. As this requires some understanding on how to set up a Python environment this article shall provide an introduction. There a few reasons why it makes sense to develop … Read more Working with VSCode and Jupyter Notebook Style

Real-time Mobile Video Object Detection using Tensorflow

Full-stack Data Science A step-by-step guide to adding object detection to your next mobile app Photo by GeoHey With the increasing interests in computer vision use cases like self-driving cars, face recognition, intelligent transportation systems and etc. people are looking to build custom machine learning models to detect and identify specific objects. However, building a … Read more Real-time Mobile Video Object Detection using Tensorflow

Detect and respond to high-risk threats in your logs with Google CloudDetect and respond to high-risk threats in your logs with Google CloudProduct ManagerProduct Marketing Manager

Editor’s Note: This the fourth blog and video in our six-part series on how to use Cloud Security Command Center. There are links to the three previous blogs and videos at the end of this post.  Data breaches aren’t only getting more frequent, they’re getting more expensive. With regulatory and compliance fines, and business resources … Read more Detect and respond to high-risk threats in your logs with Google CloudDetect and respond to high-risk threats in your logs with Google CloudProduct ManagerProduct Marketing Manager

Multi-Label Image Classification with Neural Network | Keras

The only challenge in multi-label classification is data imbalance. And we can not simply use sampling techniques as we can in multi-class classification. Data imbalance is a well-known problem in Machine Learning. Where some classes in the dataset are more frequent than others, and the neural net just learns to predict the frequent classes. For … Read more Multi-Label Image Classification with Neural Network | Keras

Why Data Scientists & Researchers Need To Understand Product Management

A field of Anemones, Ektar 100, Ori Cohen. Reasons for learning product management as a data science researcher. As researchers, our primary job is to understand data. Understanding data can mean a plethora of different things, such as data analysis, feature engineering, algorithm development, model explainability or interpretability, result & error-analysis, etc. Working closely with … Read more Why Data Scientists & Researchers Need To Understand Product Management

Making a Game for Kids to Learn English and Have Fun with Python

There are a few techniques and then you can learn and create your own game. Game Initialization import pygame# Game Initpygame.init()win = pygame.display.set_mode((640, 480))pygame.display.set_caption(“KidsWord presented by cyda”)run = Truewhile run:pygame.time.delay(100)for event in pygame.event.get():if event.type == pygame.QUIT:run = Falsepygame.display.update()pygame.quit() To start the game, we need a game window. There are two things to set. Window Size … Read more Making a Game for Kids to Learn English and Have Fun with Python