How A.I Will Enhance Content Marketing in the Future

Advances in A.I software and how they could improve the content marketing industry Image Source: Artificial intelligence is starting to shape critical industries across the world on a considerable scale; this is evident in the marketing, gaming, healthcare, tech, and finance industries. The profits gained from utilizing A.I technology can be monumental, this can … Read more

hompson Sampling For Multi-Armed Bandit Problems (Part 1)

Using Bayesian Updating For Online Decision Making “Multi-armed bandit” is perhaps the coolest term in data science, excluding financial applications to “naked European call options” or “short iron butterflies”. They are also among the most commonly encountered practical applications. The term has a helpful motivating story: a “one armed bandit” refers to a slot machine — pull the … Read more

A Perceptron of the Artist as a Young Man

In which a neural network and I enjoy a book together via “Only five people in the world have read and understood Ulysses. I am not one of them.” My high school English teacher said this to my class many years ago. He was good at one-liners. (Another was, “You only have to explain 17% of a … Read more

Behind the Models: Beta, Dirichlet, and GEM Distributions

Building Blocks For Non-Parametric Bayesian Models In a future post I want to cover non-parametric Bayesian models — these models are infinite-dimensional and allow for expansive online learning. But first I want to cover some of the building blocks: Beta, Dirichlet, and GEM distributions. These distributions have several helpful properties that provide for a wide variety of machine … Read more

Getting an environment’s name in R: the envnames package

Looking for an object in nested environments The following picture shows an environment space that highlights the connections between package and system environments (child -> parent relationships) and in particular the use of user-defined environments (outer_env and nested_env), which are part of the global environment and may be regarded as nested environments (within the global … Read more

Categories R Tags ExcerptFavorite

5 Questions to Ask Before Building a Readmissions Model

1. What Intervention? Before you jump to exact details, think about the big picture for a moment. Brainstorm how you’re going to use these predictions. Does your organization have interventions in place for patients that are deemed “high-risk” for readmissions? Will the patients be assigned a dedicated nurse while they are in the hospital? Will … Read more

Making a Command Line HTML Rendering Script for “The Art of the Command Line” (in R)

The Feedly category I have setup for git-stalking has indicated a fairly massive interest in Joshua Levy’s The Art of the Command Line. What is “The Art of the Command Line”? To quote the author(s): Fluency on the command line is a skill often neglected or considered arcane, but it improves your flexibility and productivity … Read more

Categories R Tags ExcerptFavorite

Full EARL London 2019 agenda available

Once again, we are delighted to announce a stellar line up of speakers for this year’s EARL Conference; from Retail and Insurance to Media, Manufacturing and Pharmaceutical, the range of industries now using R stats in their workflow continues to grow. If you are interested to hear why companies such as BBC News, BMW Group, Arla … Read more

Categories R Tags ExcerptFavorite

Interested in AI Policy? Start writing

Photo by Glenn Carstens-Peters on Unsplash Recently, OpenAI’s Amanda Askell, Miles Brundage, and Jack Clark joined Rob Wiblin on the 80,000 hours podcast to discuss a wide range of topics related to AI philosophy. policy, and publication norms. During the conversation, they also discussed where to start if you’re trying to understand AI and AI … Read more

What is Wavelet and How We Use It for Data Science

source: Hello, this is my second post for the signal processing topic. For now, I’m interested in learning more about signal processing to understand a certain paper. And to be honest for me, this wavelet thing is harder to understand than Fourier Transform. After I felt quite understanding about this topic, I realize something. … Read more

Norms, Penalties, and Multitask learning

Introduction A regularizer is commonly used in machine learning to constrain a model’s capacity to cerain bounds either based on a statistical norm or on prior hypotheses. This adds preference for one solution over another in the model’s hypothesis space, or the set of functions that the learning algorithm is allowed to select as being … Read more

RODBC helper function

The number of times I have to connect to SQL and I forget part of the RODBC command to connect to an internal data table. As part of a project I am working on I have been connecting to lots of different sources and became tired of typing lots of lines and repeating the same … Read more

Categories R Tags ExcerptFavorite

Analyzing Anime data in R

If you are a fan of Anime then you are going to love this analysis I did in R. This data comes from the MyAnimeList website and was sourced as part of the Tidy Tuesday initiative by the R for Data Science community. You can download a tidy version of this data from here. They … Read more

What is a Data Engineer?

Now this isn’t an article about the battle of Data Engineers vs Data Scientists, there’s no beef here. Instead this article comes off the back of the sea of articles I’ve seen recently talking about this exact point: that 80% of a Data Scientists work is data preparation and cleansing. So I’m going to talk … Read more

How I Found My First Job in Data Analytics

Tips, tricks, mindset and more! Author’s Note: This post was originally posted on the 2nd of July 2018 and has been reposed here after I shut down the domain. For as long as I can remember, I have always been anxious about whether I would be able to find a job. While many people might not … Read more

My RStudio Configuration

Whenever I need to install RStudio on a new machine, I have to think a bit about the configuration options I’ve tweaked. Invariably, I miss a checkbox that leaves me with slightly different RStudio behavior on each system. This post includes screenshots of my RStudio configuration and custom keyboard shortcuts for RStudio 1.3, MacOS, so … Read more

Categories R Tags ExcerptFavorite

Role of Machine Learning in redefining Retail Banking

Banking industry is going through a transformational journey with the comprehensive usage of Advanced Analytics algorithms in day to day business of core banking. Customer acquisition through various channels, existing customer engagement, predicting defaulters on credit card or loan applications etc are few of the areas where analytics is doing a tremendous job. I will … Read more

Why you should Double-DIP for Natural Image Decomposition

“Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors The key aspect of Double-DIP is inherent in the fact that the distribution of small patches within each decomposed layer is “simpler” (more uniform) than in the original mixed image. Let’s simplify it with an example; Let’s Observe the illustrative example in Figure 3a. Two different textures, X … Read more

K-Means Clustering with scikit-learn

Fundamentals of K-Means Clustering As we will see, the k-means algorithm is extremely easy to implement and is also computationally very efficient compared to other clustering algorithms, which might explain its popularity. The k-means algorithm belongs to the category of prototype-based clustering. Prototype-based clustering means that each cluster is represented by a prototype, which can … Read more

What is machine learning and deep learning?

A series of the Fundamentals of Machine Learning and Deep Learning The best introduction ever that you can get about machine learning and deep learning. (extracted from here) During this series, will be provided links where you can find more information about the subjects exposed. Feel free to explore during or after the reading. I was searching … Read more

How to start a new package with testing in R

# Navigate where you want your folder to be locatedsetwd(“C:/Users/chief/Documents/Github”)# Assumes usethis is installedusethis::create_package(“foo”)# Say yes or no to next (annoying) popup window, it doesn’t matter.# Add a test environmentsetwd(“foo”)usethis::use_testthat()# Add first test function to at least get something in that folder.# Go to foo\tests\testthat# and add this file.context(“foo”)library(foo)test_that(“I’m testing something”, {  # do something … Read more

Categories R Tags ExcerptFavorite

An Introduction to Virtual Adversarial Training

Virtual Adversarial Training is an effective regularization technique which has given good results in supervised learning, semi-supervised learning, and unsupervised clustering. This is a re-post of the original post: Get the source code used in this post from here Virtual adversarial training has been used for: Improving supervised learning performance Semi-supervised learning Deep unsupervised … Read more

More Bayes and multiple comparisons

In my lastpostI had a little fun comparing perspectives among Bayesian, frequentist andprogrammer methodologies. I took a nice post from AnindyaMozumdarfrom the R Bloggers feed and investigated theworld’s fastest man. I’ve found that in writing these posts two things alwayshappen. I learn a lot, and I have follow-on questions or thoughts. This time is noexception, … Read more

Categories R Tags ExcerptFavorite

Making a DotA2 Bot Using ML

The bot roster Problem In December of 2018, the creators of AI Sports gave a presentation and introduced the DotA2 AI Competition to the school. DotA (Defense of the Ancients) is a game played by two teams, each consisting of five players who can choose from over one hundred different heroes. The goal of the game … Read more

78th #TokyoR Meetup Roundup!

With the arrival of summer, another TokyoR UserMeetup! On May 25th, useRsfrom all over Tokyo (and some even from further afield – including KanNishida of Exploratory, all the way fromCalifornia!) flocked to Jimbocho, Tokyo for another jam-packed sessionof R hosted by Mitsui Sumitomo InsuranceGroup. Like my previous round up posts (for TokyoR#76 andTokyoR #77) I … Read more

Categories R Tags ExcerptFavorite

A Beginner’s Guide to Word Embedding with Gensim Word2Vec Model

1. Introduction of Word2vec Word2vec is one of the most popular technique to learn word embeddings using a two-layer neural network. Its input is a text corpus and its output is a set of vectors. Word embedding via word2vec can make natural language computer-readable, then further implementation of mathematical operations on words can be used to … Read more

How to Become a Data Scientist

This question and its variations are the most searched topics on Google. As a practicing datascience professional, and manager to boot, dozens of people ask me this question every week. This post is my honest and detailed answer. Step 1 – Coding & ML skills You need to master programming in either R or Python. … Read more

Categories R Tags ExcerptFavorite

Creating Azure Logic Apps from R using httr

Logic Apps is a serverless framework in Azure quite similar to IFTTT (if this, then that) and Zapier that allows you to connect different services and create workflows. You can define different types of triggers based on: time and events (e.g. http requests, messages received, …) to start workflows. Logic Apps can be created using a … Read more

Reinventing Personalization For Customer Experience

Why? What? How? Atif M.BlockedUnblockFollowFollowing May 30 “Remember that a person’s name is, to that person, the sweetest and most important sound in any language.” — Dale Carnegie, How to Win Friends and Influence People When it comes to building good relationships with customers, learning their names is an essential step for businesses at any level. Consumers expect … Read more

How to use ggplot2 in Python

Introduction Thanks to its strict implementation of the grammar of graphics, ggplot2 provides an extremely intuitive and consistent way of plotting your data. Not only does ggplot2’s approach to plotting ensure that each plot comprises certain basic elements but it also simplifies the readability of your code to a great extent. However, if you are … Read more

Introduction to Latent Matrix Factorization Recommender Systems

Latent Factors are “Hidden Factors” unseen in the data set. Lets use their power. Image URL: Latent Matrix Factorization is an incredibly powerful method to use when creating a Recommender System. Ever since Latent Matrix Factorization was shown to outperform other recommendation methods in the Netflix Recommendation contest, its been a cornerstone in building … Read more

How to Teach Code

Part 2 — Lecturing teaches nothing, Make the complex simple Common Mistakes A teacher will commonly like to show students everything they need to ever know about a concept so they can kick off and be a pro. This could be an hour lecture. Computer scientists (after learning C) can handle that. Code newbies can’t. After the first … Read more

Which 2020 Candidate is the Best at Twitter?

A Data Analysis of the 2020 Democratic Candidate Twitter Accounts Photo by George Pagan III on Unsplash The contest for the 2020 Democratic party nomination will be fought in many arenas. Before the first debates in a month, before the campaign rallies in key states, and even before prime time TV interviews, the fight for the nomination … Read more

An Easy Introduction to SQL for Data Scientists

SQL (Structured Query Language) is a standardised programming language designed for data storage and management. It allows one to create, parse, and manipulate data fast and easy. With the AI-hype of recent years, technology companies serving all kinds of industries have been forced to become more data driven. When a company that serves thousands of … Read more

Databricks: How to Save Files in CSV on Your Local Computer

3. Download the CSV file on your local computer In order to download the CSV file located in DBFS FileStore on your local computer, you will have to change the highlighted URL to the following:–63d7293d-3b02–43ff-b461-edd732f9e06e-4704-c000.csv?o=3847738880082577 As you noticed, the CSV path in bold (df/Sample.csv/part-00000-tid-8365188928461432060–63d7293d-3b02–43ff-b461-edd732f9e06e-4704-c000.csv) is from step 2. The number (3847738880082577) is from the original … Read more

RoboSomm Chapter 3: Wine Embeddings and a Wine Recommender

One of the cornerstones of previous chapters of the RoboSomm series has been to extract descriptors from professional wine reviews, and to convert these into quantitative features. In this article, we will explore a way of extracting features from wine reviews that combines the best of the existing RoboSomm series and academic literature on this … Read more

Quick and easy t-SNE analysis in R

t-SNE is a useful dimensionality reduction method that allows you to visualise data embedded in a lower number of dimensions, e.g. 2, in order to see patterns and trends in the data. It can deal with more complex patterns of Gaussian clusters in multidimensional space compared to PCA. Although is not suited to finding outliers … Read more

Categories R Tags ExcerptFavorite

xaibot – conversations with predictive models!

If you could talk to a predictive machine learning model, what would you ask for? Try! Michał Kuźba is developing a mind-blowing project – xai chat-bot. Dialog based system that helps to explore and understand predictive models through natural language conversations (type, speak or phone the model ). For example, imagine that you have a … Read more

Categories R Tags ExcerptFavorite

Hypothesis testing visualized

Literally seeing how stat tests work In this article, we’ll get an intuitive, visual feel for hypothesis testing. While there are many articles online that explain it in words, there aren’t nearly enough that rely primarily on visuals; which is surprising since the subject lends itself quite well to exposition through pictures and movies. But before … Read more

How to Automate Hyperparameter Optimization

In the machine learning and deep learning paradigm, model “parameters” and “hyperparameters” are two frequently used terms where “parameters” define configuration variables that are internal to the model and whose values can be estimated from the training data and “hyperparameters” define configuration variables that are external to the model and whose values cannot be estimated … Read more

Cognitive capitalism chapter reworked

The Cognitive capitalism chapter of my evidence-based software engineering book took longer than expected to polish; in fact it got reworked, rather than polished (which still needs to happen, and there might be more text moving from other chapters). Changing the chapter title, from Economics to Cognitive capitalism, helped clarify lots of decisions about the … Read more

Categories R Tags ExcerptFavorite

What I Learned from (Two-time) Kaggle Grandmaster Abhishek Thakur

Drawing insights from Abhishek Thakur’s NLP kernel Photo by Georgie Cobbs on Unsplash Quick Bio Before his many data scientist stints in companies scattered throughout Germany, Abhishek Thakur earned his bachelor’s in electrical engineering at NIT Surat and his master’s in computer science at the University of Bonn. Currently, he holds the title of Chief Data Scientist … Read more

Buyers beware, Fake product reviews are plaguing the internet.

Spotting Fake Product Reviews using Machine Learning Opinion spamming is a situation that is aggravating, for instance, CBS News reports that 52% of product reviews posted in are “inauthentic or unreliable”, while at least 30% of reviews posted at Amazon are fake. The problem of identifying opinion spamming remains an open topic, despite the fact … Read more


I had the pleasure to present at the following events and conferences: Upcoming: useR 2019 – Toulouse: ‘Serverless Computing in R’ PyDays Vienna 2019: ‘Hydrogen & Pweave – A better Jupyter Notebook?’ Vienna Applied AI Meetup by AI Austria Meetup ‘Serverless computing: AWS Lambda with R and Docker as a Service’ Vienna-R Meetup ‘Serverless computing … Read more

Categories Featured ExcerptFavorite

April 2019: “Top 40” New CRAN Packages

One hundred eighty-seven new packages made it to CRAN in April. Here are my picks for the “Top 40”, organized into ten categories: Biotechnology, Data, Econometrics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization. Biotechnology genpwr v1.00: Provides functions for power and sample size calculations for genetic association studies allowing for mis-specification of … Read more

Categories R Tags ExcerptFavorite