Job @ Oxford

Boby Mihaylova has two exciting posts available at the Health Economics Research Centre at the University of Oxford. In particular, she is looking for two R-minded researchers/analysts to develop work on disease modelling/cost-effectiveness using large individual-patients databases. In fact, I think it’s really good that they are explicitly including knowledge of R as part of … Read more

Categories R Tags ExcerptFavorite

How to save (and load) datasets in R: An overview

What I will show you In this post, I want to show you a few ways how you can save your datasets in R. Maybe, this seems like a dumb question to you. But after giving quite a few R courses mainly – but not only – for R beginners, I came to acknowledge that … Read more

Categories R Tags ExcerptFavorite

When and when not to A/B test

Experiment: Split vs. Bandit Setup We are going to test two different methods (the chi-squared split test – “Split” hereafter – and the Thompson beta bandit – “Bandit”) with the objective to maximize the cumulated successes (e.g. ad clicks) by sequentially (over several periods, e.g. days) choosing trials (e.g. presenting ads to users) from multiple options … Read more

Viewing text through the eyes of a machine

We have been able to lift the lid on Convolutional Neural Networks (CNN) in computer vision tasks for a number of years now. This has brought with it significant improvements to the field through: Increased robustness of models; Visibility of, and reduction of model bias; and A better understanding of how adversarial images can alter … Read more

Blame the game, not the player

How Game Theory helps us understand why sometimes everyone loses A tennis player decides to dope before his next game in order to gain an edge over his opponent. A retailer decides to keep their store open on Sundays because his competitor started doing so, too. Coca-cola drops their beverage prices in order to gain an … Read more

Creating data frame using structure() function in R

Structure() function is a simple, yet powerful function that describes a given object with given attributes. It is part of base R language library, so there is no need to load any additional library. And also, since the function was part of S-Language, it is in the base library from the earlier versions, making it … Read more

Categories R Tags ExcerptFavorite

SatRday comes to the Baltic Sea for the first time @Gdańsk

There are probably thousands of things to do in the middle of May at the Baltic seaside, but in 2019 the place to be was Gdansk. Despite the dreadful weather, Gdańsk offered the hottest R event in Europe at the time: the very first edition of the SatRday Gdańsk. Appsilon had the pleasure to not … Read more

Categories R Tags ExcerptFavorite

Autonomous Agents And Multi-Agent Systems 101: Agents And Deception

This article provides a brief introduction to the area of autonomous agents and multi-system agents. Furthermore, a perspective of deception mechanisms used by agents is presented. Photo by Debby Hudson on Unsplash Humans use deception mechanisms to gain an advantage over other humans 😶. Some of the most typical mechanisms are (1), not sharing their beliefs … Read more

Linear Regression with Healthcare Data for Beginners in R

Category Tags In this post I will show how to build a linear regression model. As an example, for this post, I will evaluate the association between vitamin D and calcium in the blood, given that the variable of interest (i.e., calcium levels) is continuous and the linear regression analysis must be used. I will … Read more

Categories R Tags ExcerptFavorite

Get Started With TensorFlow 2.0 and Linear Regression

TensorFlow 2.0 has been a major breakthrough in the TensorFlow family. It’s completely new and refurbished and also less creepy! We’ll create a simple Linear Regression model in TensorFlow 2.0 to explore some new changes. So, open up your code editors and let’s get started! Also, open up this notebook for an interactive learning experience. … Read more

An introduction to Convolutional Neural Networks

Describing what Convolutional Neural Networks are, how they function, how they can be used and why they are so powerful A Convolutional neural network (CNN) is a neural network that has one or more convolutional layers and are used mainly for image processing, classification, segmentation and also for other auto correlated data. A convolution is essentially … Read more

Real Time Anomaly Detection with AWS

As a result, our pipeline is like below. You might disconnect Lambda functions, or SNS, and replace with another service you want. This approach offers flexibility while it keeps self-management and durability thanks to AWS tools. Data Analytics App Creating a Data Analytics app in Kinesis is fairly easy: We select the app engine, either … Read more

Reinforcement Learning is full of Manipulative Consultants

When there are variance differences in environments used to train reinforcement learning algorithms, weird things happen. Value estimation networks prefer low variance areas regardless of the rewards, what makes them a Manipulative Consultants. Q-learning algorithms get stuck in “Boring Areas Trap” and can’t get out due to the low variance. Reward noising can help but … Read more

10 Jobs for R users from around the world (2019-05-27)

To post your R job on the next post Just visit  this link and post a new R job  to the R community. You can post a job for  free  (and there are also “featured job” options available for extra exposure). Current R jobs Full-Time Customer Success Representative RStudio – Posted by beckybajan SeattleWashington, United States 6 May2019 Full-Time Data Scientist @ … Read more

Categories R Tags ExcerptFavorite

emayili: Sending Email from R

At Exegetic we do a lot of automated reporting with R. Being able to easily and reliably send emails is a high priority. There is already a selection of packages for sending email from R: We’ve had the most experience with the first two, both of which are really solid packages. However, {gmailr} uses the … Read more

Categories R Tags ExcerptFavorite

startup – run R startup files once per hour, day, week, …

New release: startup 0.12.0 is now on CRAN. This version introduces support for processing some of the R startup files with a certain frequency, e.g. once per day, once per week, or once per month. See below for two examples. startup::startup() is cross platform. The startup package makes it easy to split up a long, … Read more

Categories R Tags ExcerptFavorite

nanotime 0.2.4

Another minor maintenance release of the nanotime package for working with nanosecond timestamps arrived on CRAN yesterday. nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it now uses a more rigorous … Read more

Categories R Tags ExcerptFavorite

The ‘see’ package: beautiful figures for easystats

The see package We have recently decided to collaborate around the new easystats project, a set of packages designed to make your life easier. This project encompasses several packages, devoted for instance to model access or Bayesian analysis, indices of model performance or visualisation. Without further ado, please let us introduce the latest addition to … Read more

Categories R Tags ExcerptFavorite

Demystifying Regular Expressions in R

Introduction In this post, we will learn about using regular expressions in R. While it isaimed at absolute beginners, we hope experienced users will find it useful aswell. The post is broadly divided into 3 sections. In the first section, wewill introduce the pattern matching functions such as grep, grepl etc. inbase R as we … Read more

Categories R Tags ExcerptFavorite

A Deep Dive Into Imbalanced Data: Over-Sampling

Learn how to use imbalanced-learn to improve your performance When implementing classification algorithms, the structure of your data is of great significance. Specifically, the balance between the number of observations for each potential output heavily influences your prediction’s performance (I intentionally avoided using the word “accuracy” for reasons I will later elaborate on in … Read more

In 12 minutes: Stocks Analysis with Pandas and Scikit-Learn

Analyse, Visualize and Predict stocks prices quickly with Python Predicting Stocks with Data Analysis One day, a friend of mine told me that the key to financial freedom is investing in stocks. While it is greatly true during the market boom, it still remains an attractive options today to trade stocks part time. Given the easy access … Read more

Google Coral USB Accelerator Introduction

Live Classification/Object Detection and External Camera Support Coral also provides us with a live image classification script called which uses the PiCamera library to get images from a webcam which will then be displayed with their respective label. The only problem with this script is that it can only be used with a PiCamera. In … Read more

Gale–Shapley algorithm simply explained

From this article, you will learn about stable pairing or stable marriage problem. You will learn how to solve that problem using Game Theory and the Gale-Shapley algorithm in particular. We will use Python to create our own solution using theorem from the original paper from 1962. What is a stable marriage or pairing problem? In … Read more

WoE Transformation for Loss Given Default Models

In the intro section of my MOB package (, reasons and benefits of using WoE transformations in the context of logistic regressions with binary outcomes had been discussed. What’s more, the similar idea can be easily generalized to other statistical models in the credit risk area, such as LGD (Loss Given Default) models with fractional … Read more

Categories R Tags ExcerptFavorite

Two New Ways to Make DNS over HTTPS Queries in R

A fair bit of time ago the {gdns} package made its way to CRAN to give R users the ability to use Google’s (at that time) nascent support for DNS over HTTPS (DoH). A bit later on Cloudflare also provided a global DoH endpoint and that begat the (not-on-CRAN) {dnsflare} package. There are actually two … Read more

Categories R Tags ExcerptFavorite

Turn A Square: generative aRt

A while back I visited Artistes & Robots in Paris. Part of the exhibition was on the origins of computer-based art. Nowadays this is referred to as generative art, where computers generate artwork according to rules specified by the programmer. I wanted to emulate some of the early generative artwork I saw there, using R. Some … Read more

Categories R Tags ExcerptFavorite

Predictability of Tennis Grand Slams

The European tennis season is in full swing, with Roland Garros starting today and Wimbledon taking place in a few weeks. For a sports buff like me, it is the essence of summer (together with the Tour de France). Time to dive into some tennis data. As a follower of both the men’s and the … Read more

Categories R Tags ExcerptFavorite

Bayes’ Theorem — Some Perspectives

“When you change the way you look at things, the things you look at change.” ―Wayne Dyer GarychlBlockedUnblockFollowFollowing May 26 Recently I am reading books related to history of mathematics, which reminds me of the fact that all complicated-looking equations started from a small, real world problem. The benefits of studying small problem are that it … Read more

10 New Things I Learnt from v3

0. & Transfer Learning “It’s always good to use transfer learning [to train your model] if you can.” — Jeremy Howard is synonymous to transfer learning and achieving great results in a short amount of time. The course really lives up to its name. Transfer learning and experimentalism are the two key ideas that Jeremy Howard keeps … Read more

AI investment activity – trends of 2018

AI hype slowdown, building cognitive tech stack, vertical integration and other observations Photo by Markus Spiske on Unsplash This review highlights trends in launching/investing in artificial intelligence (AI) startups from 2018. It contains an analysis of 47 AI startups that were launched in 2018 and managed to raise at least $1M each. Also, it includes a … Read more

Belief Propagation in Bayesian Networks

Bayesian Network Inference In this article, I’ll be using Belief Propagation (BP) with some example data. I presume that you already know about Bayesian Networks (BN). This post explains how to calculate beliefs of different variables in a BN which help reason. Photo by Clint Adair on Unsplash Belief Propagation I created a repository with the … Read more

Forecasting: how to detect outliers?

(the article below is an extract from the book Data Science for Supply Chain Forecast, available here) “I shall not today attempt further to define this kind of material (…), and perhaps I could never succeed in intelligibly doing so. But I know it when I see it.”Potter Stewart In 1964, Potter Stewart was a … Read more

Machine Learning has never been this easy: Feature Engineering Concepts in 6 questions

Key terms: feature normalization, categorical features, one hot representation, feature crosses, text representation, TFIDF, N-gram, Word2Vec This article is written for people who are keen to master machine learning concepts and skills required for machine learning jobs quickly by going through a set of popular and useful questions. Any comments and suggestions are welcome. Background … Read more

10 Python image manipulation tools

4. PIL/ Pillow PIL( Python Imaging Library) is a free library for the Python programming language that adds support for opening, manipulating, and saving many different image file formats. However, its development has stagnated, with its last release in 2009. Fortunately, there is Pillow, an actively-developed fork of PIL which is easier to install; runs on … Read more

Dracarys!— Use Docker Machine, PyTorch & Gigantum for Portable & Reproducible GPU Workflows

TL;DR Manually creating portable & reproducible GPU workflows is fragile, skill intensive & laborious, even with containers. Luckily, you can more or less automate things using Docker Machine, PyTorch & Gigantum. We use these three things to demonstrate a robust system to create workflows that move seamlessly between laptop and cloud, CPU & GPU. Assumptions — You … Read more

Classification: Sigmoid vs. Softmax

When designing a model to perform a classification task (e.g. classifying diseases in a chest x-ray or classifying handwritten digits) we want to tell our model whether it is allowed to choose many answers (e.g. both pneumonia and abscess) or only one answer (e.g. the digit “8.”) This post will discuss how we can achieve … Read more

Visualizing Musical Performance

As a musician and a data scientist, I have been intrigued by the thought of visualizing musical performances. In this post, I outline how to visualize piano performance recordings from the MAESTRO dataset. Examples are provided throughout the post. Below, I layout step by step instructions, with code, for opening, cleaning, and visualizing piano performances … Read more

Do Stocks Provide a Positive Expected Return?

Testing our Hypothesis with some Simulations Instead of calculating test statistics and running a formal hypothesis test, let’s visualize the process by running some simulations (a very similar analysis in spirit). I ran 5,000 one year simulations with the following assumptions: Stock returns are normally distributed with an expected return of 10.9% and standard deviation … Read more

Developing the Simplex Method with NumPy and Matrix Operations

Upon taking classes in operations research or optimization (particularly at the undergraduate level) and reviewing the resources available online that cover the Simplex Method, one will almost certainly be introduced to the tableau method for solving linear programming problems with the Simplex method. Some examples of solving linear programming problems with the tableau method are … Read more

Uploading Large Files to GitHub

3. Git LFS You might have noticed that the abovementioned methods both avoid uploading the large files. What if you really want to upload them so that you could gain access to them on another device? Git Large File Storage lets you store them on a remote server such as GitHub. Download and install git-lfs by … Read more

Poll position: statistics and the Australian federal election

One of the few people in Australia who did not write off a possible Coalition win at the recent federal election was Peter Ellis. We’ve invited him to come and give a talk about making sense of opinion polls and the Australian federal election on Friday this week at Monash University. Visitors are welcome. Here … Read more

Categories R Tags ExcerptFavorite

Rapid Progress or Shallow Understanding?

If there’s one thing we all hate, it’s formatting bibliographies. Yes, you can think it’s important to credit others’ work and yes, it can be satisfying to be organised, but does anyone really care about having exactly the right words italicised? Of course, you could avoid this problem entirely by using automatic online tools, but … Read more

RStudio in Docker – now share your R code effortlessly!

If you are a full time data science practitioner and have passed through the stages of starting out with the Titanic dataset and working through the various exercises in Kaggle , you would know by now that we wish real world data problems are that simple, but they are not! This post is about just one … Read more

Categories R Tags ExcerptFavorite

Data Science Job in 90 days – Book Review

Are you an R-programmer or Datascience enthusiast looking for a break in the datascience field? If so, my latest book “Data Science Jobs – land a lucrative job in 90 days” will help you find one quickly. [Author’s note – The ebook is FREE ONLY until midnight this Sunday (May 26th). So hurry and grab … Read more

Categories R Tags ExcerptFavorite