Job @ Oxford

Boby Mihaylova has two exciting posts available at the Health Economics Research Centre at the University of Oxford. In particular, she is looking for two R-minded researchers/analysts to develop work on disease modelling/cost-effectiveness using large individual-patients databases. In fact, I think it’s really good that they are explicitly including knowledge of R as part of … Read more Job @ Oxford

When and when not to A/B test

Experiment: Split vs. Bandit Setup We are going to test two different methods (the chi-squared split test – “Split” hereafter – and the Thompson beta bandit – “Bandit”) with the objective to maximize the cumulated successes (e.g. ad clicks) by sequentially (over several periods, e.g. days) choosing trials (e.g. presenting ads to users) from multiple options … Read more When and when not to A/B test

Viewing text through the eyes of a machine

We have been able to lift the lid on Convolutional Neural Networks (CNN) in computer vision tasks for a number of years now. This has brought with it significant improvements to the field through: Increased robustness of models; Visibility of, and reduction of model bias; and A better understanding of how adversarial images can alter … Read more Viewing text through the eyes of a machine

Analyzing the Twitter Profile of India’s Newly Elected PM

Disclaimer: This blog has nothing to with politics and has been written just out of academic interest. From quite some time, I wanted to write a blog about using Python to analyze real user data to brush up my concepts and explore new topics. Coincidentally, last week, 600 million Indians voted to elect Mr. Narender … Read more Analyzing the Twitter Profile of India’s Newly Elected PM

Creating data frame using structure() function in R

Structure() function is a simple, yet powerful function that describes a given object with given attributes. It is part of base R language library, so there is no need to load any additional library. And also, since the function was part of S-Language, it is in the base library from the earlier versions, making it … Read more Creating data frame using structure() function in R

Artificial Intelligence perspectives on the Cinematographic Industry

“Slowly I learnt the ways of humans: how to ruin, how to hate, how to debase, how to humiliate. And at the feet of my Master I learnt the highest of human skills, the skill no other creature owns: I finally learnt how to lie.” ― Nick Dear, Frankenstein, Based on the Novel by Mary … Read more Artificial Intelligence perspectives on the Cinematographic Industry

Apache Spark MLlib Tutorial — Part 2: Feature Transformation

Note: This article is part of a series. Check out the full series: Part 1: Regression, Part 2: Feature Transformation, Part 3 and up are coming soon. In the previous article we talked about MLlib and how to use it for training a regression model. This article focus on Feature Transformation. We will understand the … Read more Apache Spark MLlib Tutorial — Part 2: Feature Transformation

SatRday comes to the Baltic Sea for the first time @Gdańsk

There are probably thousands of things to do in the middle of May at the Baltic seaside, but in 2019 the place to be was Gdansk. Despite the dreadful weather, Gdańsk offered the hottest R event in Europe at the time: the very first edition of the SatRday Gdańsk. Appsilon had the pleasure to not … Read more SatRday comes to the Baltic Sea for the first time @Gdańsk

Autonomous Agents And Multi-Agent Systems 101: Agents And Deception

This article provides a brief introduction to the area of autonomous agents and multi-system agents. Furthermore, a perspective of deception mechanisms used by agents is presented. Photo by Debby Hudson on Unsplash Humans use deception mechanisms to gain an advantage over other humans 😶. Some of the most typical mechanisms are (1), not sharing their beliefs … Read more Autonomous Agents And Multi-Agent Systems 101: Agents And Deception

Linear Regression with Healthcare Data for Beginners in R

Category Tags In this post I will show how to build a linear regression model. As an example, for this post, I will evaluate the association between vitamin D and calcium in the blood, given that the variable of interest (i.e., calcium levels) is continuous and the linear regression analysis must be used. I will … Read more Linear Regression with Healthcare Data for Beginners in R

Get Started With TensorFlow 2.0 and Linear Regression

TensorFlow 2.0 has been a major breakthrough in the TensorFlow family. It’s completely new and refurbished and also less creepy! We’ll create a simple Linear Regression model in TensorFlow 2.0 to explore some new changes. So, open up your code editors and let’s get started! Also, open up this notebook for an interactive learning experience. … Read more Get Started With TensorFlow 2.0 and Linear Regression

An introduction to Convolutional Neural Networks

Describing what Convolutional Neural Networks are, how they function, how they can be used and why they are so powerful A Convolutional neural network (CNN) is a neural network that has one or more convolutional layers and are used mainly for image processing, classification, segmentation and also for other auto correlated data. A convolution is essentially … Read more An introduction to Convolutional Neural Networks

Real Time Anomaly Detection with AWS

As a result, our pipeline is like below. You might disconnect Lambda functions, or SNS, and replace with another service you want. This approach offers flexibility while it keeps self-management and durability thanks to AWS tools. Data Analytics App Creating a Data Analytics app in Kinesis is fairly easy: We select the app engine, either … Read more Real Time Anomaly Detection with AWS

Predicting Customer Churn with Neural Networks in Keras

This is a big one for organisations everywhere and one of the main areas in which we see high adoption rate of machine learning, this is probably down to the fact that we are predicting customer behaviour. “Churn” is the term used to describe when a customer stops using a certain organisation’s services. This is … Read more Predicting Customer Churn with Neural Networks in Keras

Reinforcement Learning is full of Manipulative Consultants

When there are variance differences in environments used to train reinforcement learning algorithms, weird things happen. Value estimation networks prefer low variance areas regardless of the rewards, what makes them a Manipulative Consultants. Q-learning algorithms get stuck in “Boring Areas Trap” and can’t get out due to the low variance. Reward noising can help but … Read more Reinforcement Learning is full of Manipulative Consultants

10 Jobs for R users from around the world (2019-05-27)

To post your R job on the next post Just visit  this link and post a new R job  to the R community. You can post a job for  free  (and there are also “featured job” options available for extra exposure). Current R jobs Full-Time Customer Success Representative RStudio – Posted by beckybajan SeattleWashington, United States 6 May2019 Full-Time Data Scientist @ … Read more 10 Jobs for R users from around the world (2019-05-27)

startup – run R startup files once per hour, day, week, …

New release: startup 0.12.0 is now on CRAN. This version introduces support for processing some of the R startup files with a certain frequency, e.g. once per day, once per week, or once per month. See below for two examples. startup::startup() is cross platform. The startup package makes it easy to split up a long, … Read more startup – run R startup files once per hour, day, week, …

Predicting Changes in the Zillow Home Value Index by ZIP Code

Photo by Breno Assis on Unsplash In this piece, I detail the process and results of modeling the Zillow Home Value Index (ZHVI) by ZIP Code looking forward three, six, and twelve months. The models presented here may serve as a useful tool for people considering buying or selling a home in the next twelve months … Read more Predicting Changes in the Zillow Home Value Index by ZIP Code

nanotime 0.2.4

Another minor maintenance release of the nanotime package for working with nanosecond timestamps arrived on CRAN yesterday. nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it now uses a more rigorous … Read more nanotime 0.2.4

The ‘see’ package: beautiful figures for easystats

The see package We have recently decided to collaborate around the new easystats project, a set of packages designed to make your life easier. This project encompasses several packages, devoted for instance to model access or Bayesian analysis, indices of model performance or visualisation. Without further ado, please let us introduce the latest addition to … Read more The ‘see’ package: beautiful figures for easystats

Demystifying Regular Expressions in R

Introduction In this post, we will learn about using regular expressions in R. While it isaimed at absolute beginners, we hope experienced users will find it useful aswell. The post is broadly divided into 3 sections. In the first section, wewill introduce the pattern matching functions such as grep, grepl etc. inbase R as we … Read more Demystifying Regular Expressions in R

A Deep Dive Into Imbalanced Data: Over-Sampling

Learn how to use imbalanced-learn to improve your performance https://unsplash.com/photos/nvDJfbFv0pI When implementing classification algorithms, the structure of your data is of great significance. Specifically, the balance between the number of observations for each potential output heavily influences your prediction’s performance (I intentionally avoided using the word “accuracy” for reasons I will later elaborate on in … Read more A Deep Dive Into Imbalanced Data: Over-Sampling

In 12 minutes: Stocks Analysis with Pandas and Scikit-Learn

Analyse, Visualize and Predict stocks prices quickly with Python Predicting Stocks with Data Analysis One day, a friend of mine told me that the key to financial freedom is investing in stocks. While it is greatly true during the market boom, it still remains an attractive options today to trade stocks part time. Given the easy access … Read more In 12 minutes: Stocks Analysis with Pandas and Scikit-Learn

Google Coral USB Accelerator Introduction

Live Classification/Object Detection and External Camera Support Coral also provides us with a live image classification script called classify_capture.py which uses the PiCamera library to get images from a webcam which will then be displayed with their respective label. The only problem with this script is that it can only be used with a PiCamera. In … Read more Google Coral USB Accelerator Introduction

Gale–Shapley algorithm simply explained

From this article, you will learn about stable pairing or stable marriage problem. You will learn how to solve that problem using Game Theory and the Gale-Shapley algorithm in particular. We will use Python to create our own solution using theorem from the original paper from 1962. What is a stable marriage or pairing problem? In … Read more Gale–Shapley algorithm simply explained

WoE Transformation for Loss Given Default Models

In the intro section of my MOB package (https://github.com/statcompute/MonotonicBinning#introduction), reasons and benefits of using WoE transformations in the context of logistic regressions with binary outcomes had been discussed. What’s more, the similar idea can be easily generalized to other statistical models in the credit risk area, such as LGD (Loss Given Default) models with fractional … Read more WoE Transformation for Loss Given Default Models

Two New Ways to Make DNS over HTTPS Queries in R

A fair bit of time ago the {gdns} package made its way to CRAN to give R users the ability to use Google’s (at that time) nascent support for DNS over HTTPS (DoH). A bit later on Cloudflare also provided a global DoH endpoint and that begat the (not-on-CRAN) {dnsflare} package. There are actually two … Read more Two New Ways to Make DNS over HTTPS Queries in R

Bayes’ Theorem — Some Perspectives

“When you change the way you look at things, the things you look at change.” ―Wayne Dyer GarychlBlockedUnblockFollowFollowing May 26 Recently I am reading books related to history of mathematics, which reminds me of the fact that all complicated-looking equations started from a small, real world problem. The benefits of studying small problem are that it … Read more Bayes’ Theorem — Some Perspectives

10 New Things I Learnt from fast.ai v3

0. Fast.ai & Transfer Learning “It’s always good to use transfer learning [to train your model] if you can.” — Jeremy Howard Fast.ai is synonymous to transfer learning and achieving great results in a short amount of time. The course really lives up to its name. Transfer learning and experimentalism are the two key ideas that Jeremy Howard keeps … Read more 10 New Things I Learnt from fast.ai v3

AI investment activity – trends of 2018

AI hype slowdown, building cognitive tech stack, vertical integration and other observations Photo by Markus Spiske on Unsplash This review highlights trends in launching/investing in artificial intelligence (AI) startups from 2018. It contains an analysis of 47 AI startups that were launched in 2018 and managed to raise at least $1M each. Also, it includes a … Read more AI investment activity – trends of 2018

Belief Propagation in Bayesian Networks

Bayesian Network Inference In this article, I’ll be using Belief Propagation (BP) with some example data. I presume that you already know about Bayesian Networks (BN). This post explains how to calculate beliefs of different variables in a BN which help reason. Photo by Clint Adair on Unsplash Belief Propagation I created a repository with the … Read more Belief Propagation in Bayesian Networks

Machine Learning has never been this easy: Feature Engineering Concepts in 6 questions

Key terms: feature normalization, categorical features, one hot representation, feature crosses, text representation, TFIDF, N-gram, Word2Vec This article is written for people who are keen to master machine learning concepts and skills required for machine learning jobs quickly by going through a set of popular and useful questions. Any comments and suggestions are welcome. Background … Read more Machine Learning has never been this easy: Feature Engineering Concepts in 6 questions

10 Python image manipulation tools

4. PIL/ Pillow PIL( Python Imaging Library) is a free library for the Python programming language that adds support for opening, manipulating, and saving many different image file formats. However, its development has stagnated, with its last release in 2009. Fortunately, there is Pillow, an actively-developed fork of PIL which is easier to install; runs on … Read more 10 Python image manipulation tools

Dracarys!— Use Docker Machine, PyTorch & Gigantum for Portable & Reproducible GPU Workflows

TL;DR Manually creating portable & reproducible GPU workflows is fragile, skill intensive & laborious, even with containers. Luckily, you can more or less automate things using Docker Machine, PyTorch & Gigantum. We use these three things to demonstrate a robust system to create workflows that move seamlessly between laptop and cloud, CPU & GPU. Assumptions — You … Read more Dracarys!— Use Docker Machine, PyTorch & Gigantum for Portable & Reproducible GPU Workflows

Classification: Sigmoid vs. Softmax

When designing a model to perform a classification task (e.g. classifying diseases in a chest x-ray or classifying handwritten digits) we want to tell our model whether it is allowed to choose many answers (e.g. both pneumonia and abscess) or only one answer (e.g. the digit “8.”) This post will discuss how we can achieve … Read more Classification: Sigmoid vs. Softmax

Visualizing Musical Performance

As a musician and a data scientist, I have been intrigued by the thought of visualizing musical performances. In this post, I outline how to visualize piano performance recordings from the MAESTRO dataset. Examples are provided throughout the post. Below, I layout step by step instructions, with code, for opening, cleaning, and visualizing piano performances … Read more Visualizing Musical Performance

Do Stocks Provide a Positive Expected Return?

Testing our Hypothesis with some Simulations Instead of calculating test statistics and running a formal hypothesis test, let’s visualize the process by running some simulations (a very similar analysis in spirit). I ran 5,000 one year simulations with the following assumptions: Stock returns are normally distributed with an expected return of 10.9% and standard deviation … Read more Do Stocks Provide a Positive Expected Return?

Developing the Simplex Method with NumPy and Matrix Operations

Upon taking classes in operations research or optimization (particularly at the undergraduate level) and reviewing the resources available online that cover the Simplex Method, one will almost certainly be introduced to the tableau method for solving linear programming problems with the Simplex method. Some examples of solving linear programming problems with the tableau method are … Read more Developing the Simplex Method with NumPy and Matrix Operations

Poll position: statistics and the Australian federal election

One of the few people in Australia who did not write off a possible Coalition win at the recent federal election was Peter Ellis. We’ve invited him to come and give a talk about making sense of opinion polls and the Australian federal election on Friday this week at Monash University. Visitors are welcome. Here … Read more Poll position: statistics and the Australian federal election

Rapid Progress or Shallow Understanding?

If there’s one thing we all hate, it’s formatting bibliographies. Yes, you can think it’s important to credit others’ work and yes, it can be satisfying to be organised, but does anyone really care about having exactly the right words italicised? Of course, you could avoid this problem entirely by using automatic online tools, but … Read more Rapid Progress or Shallow Understanding?

RStudio in Docker – now share your R code effortlessly!

If you are a full time data science practitioner and have passed through the stages of starting out with the Titanic dataset and working through the various exercises in Kaggle , you would know by now that we wish real world data problems are that simple, but they are not! This post is about just one … Read more RStudio in Docker – now share your R code effortlessly!

Data Science Job in 90 days – Book Review

Are you an R-programmer or Datascience enthusiast looking for a break in the datascience field? If so, my latest book “Data Science Jobs – land a lucrative job in 90 days” will help you find one quickly. [Author’s note – The ebook is FREE ONLY until midnight this Sunday (May 26th). So hurry and grab … Read more Data Science Job in 90 days – Book Review