My presentations on ‘Elements of Neural Networks & Deep Learning’ -Part1,2,3

I will be uploading a series of presentations on ‘Elements of Neural Networks and Deep Learning’. In these video presentations I discuss the derivations of L -Layer Deep Learning Networks, starting from the basics. The corresponding implementations are available in vectorized R, Python and Octave are available in my book ‘Deep Learning from first principles:Second … Read more

Categories R Tags ExcerptFavorite

vitae: Dynamic CVs with R Markdown

Why vitae? The process of maintaining a CV can be tedious. It’s a task I often forget about – that is until someone requests it and I find that my latest is woefully out of date. To make matters worse, these professional updates often need repeating across variety of sites (such as ORCID and LinkedIn). … Read more

Categories R Tags ExcerptFavorite

Considering sensitivity to unmeasured confounding: part 2

In part 1 of this 2-part series, I introduced the notion of sensitivity to unmeasured confounding in the context of an observational data analysis. I argued that an estimate of an association between an observed exposure \(D\) and outcome \(Y\) is sensitive to unmeasured confounding if we can conceive of a reasonable alternative data generating … Read more

Categories R Tags ExcerptFavorite

baRcodeR 0.1.2 release – new linear barcodes

baRcodeR 0.1.2 is released on CRAN today! Download and install by install.packages(“baRcodeR”) Example linear barcode The major feature of this release is the ability to print linear (a.k.a normal) barcodes through specifying type = “linear” in create_PDF() rather than type = “matrix” which prints the usual QR code. The github repository is at yihanwu/baRcodeR. Minor … Read more

Categories R Tags ExcerptFavorite

A Look Back on 2018: Part 1

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun, and that means it’s time to look back on the previous January to December cycle. Today and next time, we’ll explore the returns and volatilities of various market sectors in 2018. We might … Read more

Categories R Tags ExcerptFavorite

Updated Review: jamovi User Interface to R

Introduction jamovi (spelled with a lower-case “j”) is a free and open source graphical user interface for the R software that targets beginners looking to point-and-click their way through analyses. It is available for Windows, Mac, Linux, and even ChromeOS. Versions are also planned for servers and tablets. This post is one of a series of reviews which … Read more

Categories R Tags ExcerptFavorite

Having Fun with TextBlob

A Python library for processing textual data, NLP framework, sentiment analysis As an NLP library for Python, TextBlob has been around for a while, after hearing many good things about it such as part-of-speech tagging and sentiment analysis, I decided to give it a try, therefore, this is the first time I am using TextBlob … Read more

Learn Enough Docker to be Useful

Part 1: The Conceptual Landscape Containers are hugely helpful for improving security, reproducibility, and scalability in software development and data science. Their rise is one of the most important trends in technology today. Docker is a platform to develop, deploy, and run applications inside containers. Docker is essentially synonymous with containerization. If you’re a current … Read more

Viral Fashion, Networks & Statistical Physics

Simulating why Germans gave up ties with an interacting agent approach “All models are wrong, but some are useful” — George Box When German Chancellor Angela Merkel met with Dieter Zetsche, the CEO of German carmaker Daimler and Mercedes-Benz to lay the foundation stone of a new factory in May 2017, some in the german business world were not … Read more

EdTech & Algorithmic Transparency

The recent news surrounding the investigation of Florida high-school student Kamilah Campbell’s SAT score, which was flagged by The College Board for possible cheating, offers an interesting perspective into issues surrounding algorithmic transparency in the growing EdTech sector. While it’s likely that only a portion of the process used to flag the test was automated, … Read more

Machine Learning Security — A Growing Societal Problem

A Growing Societal Problem This article was originally published at Please go there to subscribe. As more and more systems leverage ML models in their decision-making processes, it will become increasingly important to consider how malicious actors might exploit these models, and how to design defenses against those attacks. The purpose of this post is … Read more

On the Road to 0.8.0 — Some Additional New Features Coming in the sergeant Package

It was probably not difficult to discern from my previous Drill-themed post that I’m fairly excited about the Apache Drill 1.15.0 release. I’ve rounded out most of the existing corners for it in preparation for a long-overdue CRAN update and have been concentrating on two helper features: configuring & launching Drill embedded Docker containers and … Read more

Categories R Tags ExcerptFavorite

Qrash Course: Deep Q Networks from the Ground Up in 10 Minutes

This article assumes no prior knowledge in Reinforcement Learning, but it does assume some basic understanding of neural networks. Out of all the different types of Machine Learning fields, the one fascinating me the most is Reinforcement Learning. For those who are less familiar with it — while Supervised Learning deals with predicting values or classes based … Read more

R NewYorkers Feeling the Holiday Spirit? Here’s Your Tip

Combining Pivot Billions with R to dive into whether the holiday spirit inspires bigger tips and which parts of New York experience this effect the most. The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect … Read more

Categories R Tags ExcerptFavorite

Animating Data Transformations: Part II

In our previous series on Animating Data Transformations, we showed you how to use gganimate to construct an animation which illustrates the process of going between tall and wide representations of data. Today, we will show the same procedure for constructing an animation of the unnest() function. The unnest() function takes a tibble containing a … Read more

Categories R Tags ExcerptFavorite

An Introduction to R— Merging and filtering data— Part 1

Data understanding by filtering and merging the 2019 Australian Tennis Open data for the Men’s tour. Photo by Christopher Burns on Unsplash You know it’s summer when the Australian Tennis Open visits Melbourne and everyone is excited that Roger and Serena are in town. Problem I am interested to predict who might win the 2019 Australian … Read more

Will my Customer Come Back : Playing with CLV

I’ve been tinkering with customer lifetime value modeling the past few days since the Olist dataset in Kaggle went up. In particular, I wanted to explore the tried and tested probabilistic models, BG/NBD and GammaGamma to forecast future purchases and profits. I also wanted to see if the machine learning approach could do well — simply predicting … Read more

Data Science in International Development. Part I: Working with Text

Part I: Working with Text Co-written by Kelsey Barton-Henry. Today, headlines are filled with claims about the power of Artificial Intelligence (AI) to do things only humans could do before. Recognizing objects in images, responding to voice queries, or interpreting complex text instances, to mention a few. But how do AI applications work? What are the … Read more

The Golden AI Glacier: Rethinking Roger’s Bell Curve for Healthcare

“One reason why there is so much interest in the diffusion of innovations is because getting a new idea adopted, even when it has obvious advantages, is often very difficult,” said Everett Rogers, ostensibly the pioneer on the topic, in introduction to the 3rd edition of his seminal work, Diffusions of Innovation, published in 1983 … Read more

Understanding the maths of Computed Tomography (CT) scans

Noseman is having a headache and as an old-school hypochondriac he goes to see his doctor. His doctor is quite worried and makes an appointment with a radiologist for Noseman to get a CT scan. Modern CT scanner from Siemens Because Noseman always wants to know how things work he asks the radiologist about the … Read more

Categories R Tags ExcerptFavorite

A deep dive into glmnet: offset

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will look at the offset option. For reference, here is the full signature of the glmnet function: glmnet(x, y, family=c(“gaussian”,”binomial”,”poisson”,”multinomial”,”cox”,”mgaussian”), … Read more

Categories R Tags ExcerptFavorite

Dow Jones Stock Market Index (4/4): Trade Volume GARCH Model

Categories Advanced Modeling Tags Data Visualisation Linear Regression R Programming This is the final part of the 4-series posts. In this fourth post, I am going to build an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily trade volume log ratio. You can read the other three parts in the following links: part 1, … Read more

Categories R Tags ExcerptFavorite

An even better rOpenSci website with Hugo

A bit more than one year ago, rOpenSci launched its new website design, by the designer Maru Lango. Not only did the website appearance change (for the better!), but the underlying framework too. is powered by Hugo, like blogdown! Over the last few months, we’ve made the best of this framework, hopefully improving your … Read more

Categories R Tags ExcerptFavorite

How do Convolutional Neural Nets (CNNs) learn? + Keras example

In this lesson, I am going to explain how computers learn to see; meaning, how do they learn to recognize images or object on images? One of the most commonly used approaches to teach computers “vision” are Convolutional Neural Nets. This lesson builds on top of two other lessons: Computer Vision Basics and Neural Nets. … Read more

Categories R Tags ExcerptFavorite

Animating the Traveling Salesman Problem

Lessons Learned From Animating Models Animation can be a powerful tool. It is one thing to explain a complex topic in words or even in pictures, but visuals in motion have an amazing quality to bring abstract ideas to life. This can be especially helpful in complex areas of computer science like optimization and machine … Read more

Predicting Russian Trolls Using Reddit Comments

Using Machine Learning to Predict Russian Trolls Code for those Interested Introduction Reddit Logo. Source: Russia has long maintained a contentious relationship with countries in the west. Vladimir Putin, the Russian President, has long been known as a Russian nationalist who will do anything to advance the interests of his country (Marten 2018). This has … Read more

ONNX.js: Universal Deep Learning Models in The Browser

An Introduction to The Universal Open Standard Deep Learning Format and Using It In The Browser (ONNX/ONNX.js) Photo by Franck V. on Unsplash Running deep learning models on the client-end browser is not something new. Early 2018, Google released TensorFlow.js. It is an open-source library that is used to define, train, and run machine learning (ML) … Read more

Soft Actor-Critic Demystified

An intuitive explanation of the theory and a PyTorch implementation guide Soft Actor-Critic, the new Reinforcement Learning Algorithm from the folks at Open AI and UC Berkley has been making a lot of noise recently. The algorithm not only boasts of being more sample efficient than traditional RL algorithms but also promises to be robust … Read more

You did a sentiment analysis with tidytext but you forgot to do dependency parsing to answer WHY is something positive/negative

A small note on the growing list of users of the udpipe R package. In the last month of 2018, we’ve updated the package on CRAN with some noticeable changes The default models which are now downloaded with the function udpipe_download_model are now models built on Universal Dependencies 2.3 (released on 2018-11-15) This means udpipe … Read more

Categories R Tags ExcerptFavorite

French Baccalaureate Results

I. Context The French Baccalaureate (BAC) is the final exam all French students must pass to graduate from high school. Not only is it necessary to graduate, but a student’s performance on the BAC is the American equivalent to one’s performance on the ACT/SAT for college applications. As I am myself a product of the … Read more

Categories R Tags ExcerptFavorite

A Complete View of Decision Trees and SVM in Machine Learning

Tree-based Methods Tree-based methods have been favorite techniques in many industries with proven successful cases for prediction. These methods are considered non-parametric, making no assumption on the distribution of data and the structure of the true model. They require less data cleaning and are not influenced by outliers and multicollinearity to some fair extent. The … Read more

Analysis of South African Funds

Packages used in this post Disclaimer: I am no financial advisor, have never been and you should not take any of this analysis as investment advice. These thoughts are my own, please dont mail me about your money strategies/problems. I enjoy numbers, scraping and data analysis and that is wat this post is about. Also, … Read more

Categories R Tags ExcerptFavorite

AzureR packages now on CRAN

The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the Github repositories. Updated versions of these packages will also be … Read more

Categories R Tags ExcerptFavorite

A Beautiful 2 by 2 Matrix Identity

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations for the Moore–Penrose inverse and … Read more

Categories R Tags ExcerptFavorite

Deep learning from a programmer’s perspective (aka Differentiable Programming)

Or why neural networks are not-so-neural anymore The main lesson from 2018: deep learning is “cool”. One of the main reasons is that the basic problems faced by DL are general enough to be of interest to an insane amount of disciplines, from computer vision to neural machine translation to voice interfaces. More importantly, DL … Read more

Are Our Thoughts Really Dot Products?

The Mathematician, Philosopher, and Number Religion leader Pythagoras How AI Research Revived Pythagoreanism and Confused Science with Philosophy Recently, I wrote an article about how deep learning might be hitting its limitations and posed the possibility of another AI winter. I closed that article with a question about whether AI’s limitations are defined just as … Read more

How to A/B test without spending a dime

Get statistically significant results without paying for a testing platform Lisa XuBlockedUnblockFollowFollowing Jan 8 A/B testing is an integral part of the product development process and used by everyone from growth marketers to designers. However, not everyone has a proper A/B testing platform. Maybe you can’t afford a system that can cost up to $100,000/yr or … Read more

Implementing a Profitable Promotional Strategy for Starbucks with Machine Learning (Part 1)

V. Data Preprocessing: Generating Monthly Data In order to transform the datasets into something useful, we will have to perform substantial amount of data cleaning and pre-processing. At the end of this section, we will generate a dataset that looks like this: Snapshot of Monthly Data After Data Preprocessing The primary task will be to identify … Read more

Sound UX: Sound Representation of Machine Learning Estimation on Image and Temperature Data by…

The purpose of this research is to display a captured image and a temperature data by sonification. Sonification is the technique of a transition from various data to sound and is used for accessibility, media art and interaction design. We proposed a system generate sound from minimum distances between moving objects and path prediction by … Read more

How to give an effective presentation at a Meetup or Conference

Essential skills to help you prepare for your first presentation at a Meetup or Conference. Photo by Teemu Paananen on Unsplash We learn about the technical side of data science and spend weeks learning to code and exploring linear regression, logistic regression, PCA, clustering, ridge regression, lasso, decision tree, random forest. However communication skills are also … Read more

Does pay impact loyalty in tech? A study in simple data visualization.

The tech scene in San Francisco is famously agile: companies pivot, products iterate, and employees change jobs. It is not uncommon to work for a firm that celebrates an employee when they hit short one- or two-year anniversaries. This happens at startups (typically only a few years old themselves) but also at the larger behemoths. … Read more

Where does .Renviron live on Citrix?

At one of my clients I run RStudio under Citrix in order to have access to their data. For the most part this works fine. However, every time I visit them I spend the first few minutes of my day installing packages because my environment does not seem to be persisted from one session to … Read more

Categories R Tags ExcerptFavorite

Dow Jones Stock Market Index (3/4): Log Returns GARCH Model

Categories Advanced Modeling Tags Data Visualisation Linear Regression R Programming In this third post, I am going to build an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily log-returns. You can read the first and second part which I published previously. Packages The packages being used in this post series are herein listed. suppressPackageStartupMessages(library(lubridate)) … Read more

Categories R Tags ExcerptFavorite