Taking Google Sheets to (a) Class.

I am currently building a Flask app for teachers. Since Google Drive has been adopted by teachers, Google sheets are used by them also. One of my app’s features is to easily allow teachers to copy and paste the sheet link into the app and submit it through a form. It will then convert it … Read more

RQuantLib 0.4.8: Small updates

A new version 0.4.8 of RQuantLib reached CRAN and Debian. This release was triggered by a CRAN request for an update to the configure.ac script which was easy enough (and which, as it happens, did not result in changes in the configure script produced). I also belatedly updated the internals of RQuantLib to follow suit … Read more

Categories R Tags ExcerptFavorite

Machine Learning Models as Micro Services in Docker

One of the biggest underrated challenges in machine learning development is the deployment of the trained models in production that too in a scalable way. One joke on it I have read is “Most common way, Machine Learning gets deployed today is powerpoint slides :)”. Why Docker? Docker is a containerization platform which packages an application … Read more

How to setup the PySpark environment for development, with good software engineering practices

In this article we will discuss about how to set up our development environment in order to create good quality python code and how to automate some of the tedious tasks to speed up deployments. We will go over the following steps: setup our dependencies in a isolated virtual environment with pipenv how to setup … Read more

Convolutional Neural Network: A Step By Step Guide

“Artificial Intelligence, deep learning, machine learning — whatever you’re doing if you don’t understand it — learn it. Because otherwise, you’re going to be a dinosaur within three years” — Mark Cuban, a Serial Entrepreneur Hello and welcome, aspirant! If you are reading this and interested in the topic, I’m assuming that you are familiar with the basic concepts of deep … Read more

Let’s build an Article Recommender using LDA

Due to keen interest in learning new topics, I decided to work on a project where a Latent Dirichlet Allocation (LDA) model can recommend Wikipedia articles based on a search phrase. This article explains my approach towards building the project in Python. Check out the project on GitHub below. Structure Photo by Ricardo Cruz on Unsplash … Read more

Rcpp 1.0.1: Updates

Following up on the 10th anniversary and the 1.0.0. release, we excited to share the news of the first update release 1.0.1 of Rcpp. package turned ten on Monday—and we used to opportunity to mark the current version as 1.0.0! It arrived at CRAN overnight, Windows binaries have already been built and I will follow … Read more

Categories R Tags ExcerptFavorite

Object Detection On Aerial Imagery Using RetinaNet

ESRI Data Science Challenge 2019 3rd place solution (Left) the original image. (Right) Car detections using RetinaNet, marked in green boxes Detecting cars and swimming pools using RetinaNet Introduction For tax assessments purposes, usually, surveys are conducted manually on the ground. These surveys are important to calculate the true value of properties. For example, having a swimming … Read more

Light on Math ML: Attention with Keras

Why Keras? With the unveiling of TensorFlow 2.0 it is hard to ignore the conspicuous attention (no pun intended!) given to Keras. There was greater focus on advocating Keras for implementing deep networks. Keras in TensorFlow 2.0 will come with three powerful APIs for implementing deep networks. Sequential API — This is the simplest API where you … Read more

Why you should be a Generalist first, Specialist later as a Data Scientist?

So what’s a Generalist and a Specialist? Before going any further, let’s first understand what we mean when we talk about being a generalist and a specialist in data science. A generalist is someone that has knowledge in many areas whereas a specialist knows a lot in one area. Simple as that. Particularly in data … Read more

Tipster Season

So it is approaching AFL mens season, which means that soon everyones twitter feed, Facebook and emails will get clogged up with various tipsters. People saying they have won at 60% of the time over last season and therefor you should pay them money and follow their tips! But how can you assess the accuracy … Read more

Categories R Tags ExcerptFavorite

Who are Independent Voters?

The differences in people who identify with a party “not very strongly”, and those who identify as independent but “are closer to” a party. The data we are using is polling conducted by YouGov Blue and from the progressive data organization Data For Progress, it consists of 3,215 voters and then is weighted by “age, sex, … Read more


I would like to once again recommend our readers to our note on wrapr::let(), an R function that can help you eliminate many problematic NSE (non-standard evaluation) interfaces (and their associate problems) from your R programming tasks. The idea is to imitate the following lambda-calculus idea: let x be y in z := ( λ … Read more

Categories R Tags ExcerptFavorite

Data Scientist Knowledge and Skills

A data scientist creates knowledge from data; and has skills in statistics, programming, and the domain under study. A data scientist creates knowledge from data through quantitative and programming methods and the knowledge of the domain under study. Data science is field in which data scientists work. A data scientist should have skills and knowledge in … Read more

Robotic Control with Graph Networks

Exploiting relational inductive bias to improve generalization and control source Machine learning is helping to transform many fields across diverse industries, as anyone interested in technology undoubtedly knows. Things like computer vision and natural language processing were changed dramatically due to deep learning algorithms in the past few years, and the effects of that change are … Read more

FWIW “The chance of the setwd() command having the desired effect – making the file paths work –…

FWIW “The chance of the setwd() command having the desired effect – making the file paths work – for anyone besides its author is 0%.” You should look into using here::here() instead. https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization … Read more

Categories R Tags ExcerptFavorite

PCA and SVD explained with numpy

How exactly are principal component analysis and singular value decomposition related and how to implement using numpy. Principal component analysis (PCA) and singular value decomposition (SVD) are commonly used dimensionality reduction approaches in exploratory data analysis (EDA) and Machine Learning. They are both classical linear dimensionality reduction methods that attempt to find linear combinations of … Read more

Hyper-parameter Tuning Techniques in Deep Learning

The process of setting the hyper-parameters requires expertise and extensive trial and error. There are no simple and easy ways to set hyper-parameters — specifically, learning rate, batch size, momentum, and weight decay. Source Deep learning models are full of hyper-parameters and finding the best configuration for these parameters in such a high dimensional space is not a … Read more

Crash Course in Quantum Computing Using Very Colorful Diagrams

Representing information in Quantum Computing In a traditional computer, information is represented using traditional bits that can only possess a value of 1 or 0 and not both at the same time. In Quantum computers, we represent information using Qubits (quantum bits). We can represent Qubits using the bra-ket notations: |0⟩ or |1⟩, pronounced ‘ket … Read more

How to ask good questions?

What is a good question? Why ask a good question? How to do so? This story explores these three questions. Why ask good questions? Before diving into the how part, you might want to know why you should care about asking good questions so let us take a moment to understand why. Questions are intended … Read more

Identifying the Sources of Winter Air Pollution in Bangkok Part I

Air Pollution Map near Bangkok in January 2019 Air pollution is a serious environmental threat in many Asian countries. In Thailand, this issue has recently gained prominence due to the high levels of air pollution in Bangkok during the winter of 2019. Air pollution is reported through the air quality index (AQI), with higher values indicating … Read more

Web Traffic Forecasting

Motivation: Time-series being an important concept in statistics and machine learning is often less explored by data enthusiasts like us. To change the winds, we decided to work on one of the most burning time series problem of today’s day and era, “predicting web traffic”. This blog mirrors our brain storming involved in Web Traffic … Read more

Measuring Financial Turbulence and Systemic Risk

This project illustrates 2 unique approaches for measuring financial risk. This project illustrates 2 unique approaches for measuring financial risk. The Financial Turbulence Indicator measures the turbulence of global financial markets across time. This matters because: We can predict the future path of financial turbulence, since financial turbulence is highly persistent across time. You can … Read more

Developing a DCGAN Model in Tensorflow 2.0

Introduction In early March 2019, TensorFlow 2.0 was released and we decided to create an image generator based on Taehoon Kim’s implementation of DCGAN. Here’s a tutorial on how to develop a DCGAN model in TensorFlow 2.0. “To avoid the fast convergence of D (discriminator) network, G (generator) network is updated twice for each D … Read more

Version 0.7.1 of NIMBLE released

We’ve released the newest version of NIMBLE on CRAN and on our website. NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods (such as MCMC and SMC). Version 0.7.1 is primarily a maintenance release with a couple important bug fixes and a few additional … Read more

Categories R Tags ExcerptFavorite

Adding Custom Fonts to ggplot in R

ggplot – You can spot one from a mile away, which is great! And when you do it’s a silent fist bump. But sometimes you want more than the standard theme. Fonts can breathe new life into your plots, helping to match the theme of your presentation, poster or report. This is always a second … Read more

Categories R Tags ExcerptFavorite

littler 0.3.7: Small tweaks

The eight release of littler as a CRAN package is now available, following in the thirteen-ish year history as a package started by Jeff in 2006, and joined by me a few weeks later. littler is the first command-line interface for R and predates Rscript. And it is (in my very biased eyes) better as … Read more

Categories R Tags ExcerptFavorite

Scraping old player data

As its been pointed out to me on that it would be handy if within fitzRoy that it should contain past players data from footywire. So here is roughly how to do that. library(rvest) ## Loading required package: xml2 library(tidyverse) ## ── Attaching packages ──────────────── tidyverse 1.2.1 ── ## ✔ ggplot2 3.1.0 ✔ purrr 0.3.0 … Read more

Categories R Tags ExcerptFavorite

Basic Binary Sentiment Analysis using NLTK

“Your most unhappy customers are your greatest source of learning.” — Bill Gates So what does the customer say? In today’s context, it turns out a LOT. Social media has opened the floodgates of customer opinions and it is now free-flowing in mammoth proportions for businesses to analyze. Today, using machine learning companies are able to extract … Read more

How I implemented googleSignIn in R (shiny) and lived

Known user identity when building shiny apps can sometimes come really handy. While you can implement your own user login, for instance using cookies, you can also use some of the services which authenticate a user for you, such as Google. This way, you don’t have to handle cookies or passwords, just a small part … Read more

Categories R Tags ExcerptFavorite

The sexiest job of the 22nd century

Three questions you should ask in a data science interview Data science has been called “the sexiest job of the 21st century” — a sentiment I’d believe if I saw more business leaders hiring data scientists into environments where we can be effective. Instead, many of us feel misunderstood and invisible. The world isn’t ready for us … Read more

IT Support Ticket Classification and Deployment using Machine Learning and AWS Lambda

IT Support Ticket Classification and Deployment IT Ticket Classification Project Description and initial assumptions: As a part of our final project for Cognitive computing, we decided to address a real life business challenge for which we chose IT Service Management. Of all the business cases, we were interested with four user cases that might befitting … Read more

Enterprise Technology 101: How Five Practices Can Make Your Organization a Leader or a Loser

Why me? One of the many interesting things about being a technologist — leading the design, development, and deployment of custom software solutions for nearly 20 years — has been the opportunity to experientially learn by observing trials and errors. The diversity of these opportunities has enhanced that learning. Projects have involved many types of technologies: client-server, application development, … Read more

Those Racist Robots…

(Source: https://phys.org/news/2017-06-robots-children-autism.html) ARTIFICIAL INTELLIGENCE (AI) is one of the hottest topics out there, especially with the whole debate over whether or not robots are likely to take over the world. Regardless of our view on Artificial Intelligence being an actual advancement in our history or just another reckless, clumsy integration of accumulated knowledge, examining this … Read more

Questions pairs identification

Background You have a burning question — you login to Quora, post your question and wait for responses. There is a chance that what you asked is truly unique but more often than not if you have a question, someone has had it too. Did you notice that Quora tells you that a similar question has been … Read more

Data Science with no Math

Using AI to Build Mathematical Datasets This is an addendum to my last article, in which I had to add a caveat at the end that I was not a mathematician, and I was new at Python. I added this because I struggled to come up with a mathematical formula to generate patient data that … Read more

R and Python: Using reticulate to get the best of both worlds

It’s March 15th and that means it’s World Sleep Day (WSD). Don’t snooze off just yet! We’re about to check out a package that can make using R and Python a dream. It’s called reticulate and we’ll use it to train a Support Vector Machine for a simple classification task. I discovered WSD on the … Read more

Categories R Tags ExcerptFavorite

Software Dependencies and Risk

Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks. If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish … Read more

Categories R Tags ExcerptFavorite

The Clash of the Titans in Test and ODI cricket

Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better 13. Cumulative average strike rate of batsmen Tendulkar’s strike rate is better than Kohli and Gavaskar par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeStrikeRate(“./tendulkar.csv”,”Tendulkar”) batsmanCumulativeStrikeRate(“./kohli.csv”,”Kohli”) batsmanCumulativeStrikeRate(“./gavaskar.csv”,”Gavaskar”) 14 Performance forecast of batsmen The … Read more

Categories R Tags ExcerptFavorite

How do I know if my AI idea is possible?

One of the questions that I get asked often as an AI consultant is, in some ways, the most simple: Is this possible? People will come to me with some very vague notion of something they want automated or some sort of AI product they want to create. They usually don’t come from a technology … Read more

How to Learn Data Science

Photo by Paul Schafer on Unsplash Why most online data science courses will fail to teach you the skills you need Right now, I’m in a fairly unique position. On the one hand I’m writing a book (The Science of Data Science), which I hope will be as inclusive and as easy to read as possible. On … Read more

#20: Dependencies. Now with badges!

Welcome to post number twenty in the randomly redundant R rant series of posts, or R4 for short. It has been a little quiet since the previous post last June as we’ve been busy with other things but a few posts (or ideas at least) are queued. Dependencies. We wrote about this a good year … Read more

Categories R Tags ExcerptFavorite

What is the difference between AI, machine learning, and deep learning?

People like to throw buzzwords like artificial intelligence, machine learning, and deep learning into conversations. I plead guilty. They accurately describe the work I do. That does not offer an excuse to hide behind buzzwords without understanding what they mean. So, let’s go over what they mean so you know when to use each in … Read more