Everything you need to know about Scatter Plots for Data Visualisation

If you’re a Data Scientist there’s no doubt that you’ve worked with scatter plots before. Despite their simplicity, scatter plots are a powerful tool for visualising data. There’s a lot of options, flexibility, and representational power that comes with the simple change of a few parameters like color, size, shape, and regression plotting. Here you’ll … Read more

Making Music with Machine Learning

Image from https://www.maxpixel.net/Circle-Structure-Music-Points-Clef-Pattern-Heart-1790837 Music is not just an art, music is an expression of the human condition. When an artist is making a song you can often hear the emotions, experiences, and energy they have in that moment. Music connects people all over the world and is shared across cultures. So there is no way … Read more

Exploration of the Social News TV: The Communication Behavior of #ajnewsgrid

What is NewsGrid? NewsGrid is a young news program broadcast globally by Al Jazeera since 2016. It is Al Jazeera’s first interactive news hour. The show is produced in three parts, top stories of the day presented by one presenter, stories create huge social reaction on Twitter presented by a social media presenter, and the … Read more

Hey, Who Moved the Goalposts?

Part of 10 reasons why Software Development projects fail series The most successful software development projects have a timeline and a series of milestones to accomplish that project within a set period of time. Those milestones are critical, because they help to divide a large project into a series of much smaller projects, and they … Read more

Forecast Framework Demo

(This article was first published on – R, and kindly contributed to R-bloggers) Want to learn how to do some forecasting with R? Here’s your chance to try out a new time-series forecasting package for R whose aim is to standardize and simplify the process of making and evaluating forecasts! The Reich Lab uses an … Read more

Categories R Tags ExcerptFavorite

Watch if R is running from Shiny

Today I discovered that the tag of a Shiny App gets theshiny-busy class when computation is done in the R process. Whichmeans that you can potentially watch with JavaScript if the R process isrunning. TIL — Shiny Apps switch to the ‘shiny-busy’ class when R is performingcomputation in the background.So it’s basically possible to useJavaScript … Read more

Categories R Tags ExcerptFavorite

Playing Around with Phyllotactic Spirals

I wanted to figure out how to create gif animation using the magick, so I decided I’ll try that out with ggplot2 spiral art. Loading up packages I’m definitely in love with “magick” right now ? library(tidyverse) ## for pretty much everything… library(magick) ## I’m now a magick fan!!! library(scales) ## Handy when it comes … Read more

Categories R Tags ExcerptFavorite

Binary Tree: The Diameter.

Dynamic programming sequences sub-problems together, having each sub-problem lead to the solution. With dynamic programming we no longer have to visits paths we’ve been down before, instead we can prune the shorter branches and track the diameter at each step. With the dynamic approach the algorithm travels down the tree and counts lengths on the … Read more

Revisiting Adam Smith’s Invisible Hand in the Data Economy

Fundamental paradigms of the free market should also be scrutinized by data science An unobservable market force that helps the demand and supply of goods in a free market to reach equilibrium automatically and efficiently is what we call the invisible hand. But I am a data scientist, I don’t deal in unobservable forces, no observations … Read more

Set Theory — Cardinality & Power Sets

With basic notation & operations cleared in articles one & two in this series, we’ve now built a fundamental understanding of Set Theory. This third article further compounds this knowledge by zoning in on the most important property of any given set: the total number of unique elements it contains. Also known as the cardinality, … Read more

Fast, static D3 maps built with Turf.js and the command-line

Combining Mike Bostock’s command-line cartography tutorial with the flexibility of Node.js Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center Recently, I needed to build a handful of U.S. state bubble maps to be embedded in a story for San Antonio Express-News. I wanted to use D3 but was concerned about slow asset … Read more

Introduction to Unsupervised Learning

Understand principal component analysis (PCA) and clustering methods Photo by Oscar Keys on Unsplash Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding … Read more

The Basics of Cryptography

With Applications in R Source Have you ever wondered how companies securely store your passwords? Or how your credit card information is kept private when making online purchases? The answer is cryptography. The vast majority of internet sites now use some form of cryptography to ensure the privacy of its users. Even information such as emails … Read more

Assessing NHL award winners using K-means

Data sets The final data-set used is a combination of traditional and advanced player metrics. Traditional statistics concern metrics like goals and assists (total being known as points), plus-minus, penalty minutes and time on ice, whilst advanced player metrics deal more with player behavior and puck possession. Using Python’s beautifulsoup library, I scraped more traditional … Read more

Reinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

In this article I want to share my project on implementing reinforcement learning and deep reinforcement learning methods on a Tic Tac Toe game. The article contains: 1. Rigorous definition of the game as a Markov decision process. 2. How to implement the reinforcement learning method, called TD(0), to create an agent that plays the … Read more

On Canonical Companies

In Information Technology and tech startups, we talk about “systems of record”. It’s a system that is the source for a particular data that may also exist in other systems. That “system or record” is the ultimate source of truth. It’s the canonical record of data. We often talk about these systems as the place … Read more

H2O for Inexperienced Users

Some background: I am a senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer … Read more

Software 2.0 — Playing with Neural Networks (Part 1)

In this article we are going to discuss about neural networks (from scratch), the innovative concept, which has taken the world by storm. I will assume that the reader is already familiar with the following concepts: Cost function (MSE and Cross Entropy) Gradient Descent Logistic regression Activation Function Binary Classification Particularly, this article will try … Read more

Sentiment Analysis with Word Bags and Word Sequences

For generic text, word bag approaches are very efficient at text classification. For a binary text classification task studied here, LSTM working with word sequences is on par in quality with SVM using tf-idf vectors. But performance is a different matter… The bag-of-words approach to turning documents into numerical vectors ignores the sequence of words … Read more

Data Visualization in Music

Last fall I went to an Edward Tufte lecture where he began with a very effective, very sweet video of music visualized in a sequence: Tufte knows this is a great and charming start to a lecture. He knows it provides a welcome change from the outside world; an elegant fusion of music, color, and … Read more

The Science Behind AlphaStar

How DeepMind Uses Reinforcement Learning to Beat Human Pros in StarCraft II Long term strategic planning has long been considered a unique quality of the human mind that would be very difficult to imitate by artificial intelligence(AI) agents. Conceptually, strategic thinking involves evaluating a large number of data points in the present in order to … Read more

Scaling H2O analytics with AWS and p(f)urrr (Part 3)

This is the final installment of a three part series that looks at how we can leverage AWS, H2O and purrr in R to build analytical pipelines. In the previous posts I looked at starting up the environment through the EC2 dashboard on AWS’ website. The other aspect we looked at, in Part II, was … Read more

Categories R Tags ExcerptFavorite

Main benefits of using a Chatbot for your business

I’m going to tell you about the future — messenger chatbots. What is Messenger Chatbot and why it is crucial for your business? Basically, It’s a digital assistant, most of the times based on AI that has the direction of various commands which looks like a natural sound conversation with your customers. Now you have the opportunity … Read more

Uncertainty estimation for Neural Network — Dropout as Bayesian Approximation

Uncertainty Estimation One of the key distinction about Bayesian is that parameters are distributions instead of fixed weights. Error = Model Uncertainty + Model misspecification + inherent noise The Bayesian neural network decomposes uncertainty into model uncertainty, model misspecification, and inherent noise. MCDropout MCDropout One of the key here in Bayesian is that everything is … Read more

SatRday LA – R Conference Announcement

Come and have fun with local useRs at SatRday LA on April 6, 2019. This is the first SatRday in LA, and the second in the states. If you have not heard of SatRday, it is a one-day affordable, inclusive, non-profit R conference organized by local R users. Important Details Who: All levels of R users When: April … Read more

Categories R Tags ExcerptFavorite

Machine Learning Techniques applied to Stock Price Prediction

Image generated using Neural Style Transfer. Machine learning has many applications, one of which is to forecast time series. One of the most interesting (or perhaps most profitable) time series to predict are, arguably, stock prices. Recently I read a blog post applying machine learning techniques to stock price prediction. You can read it here. … Read more

Sliding Puzzle – Solving Search Problem with Iterative Deepening A*

Now as we are more familiar with the game, let’s solve it! Search Algorithms Let’s begin our Graph Traversal journey with visualizing and setting our problem. “A goal properly set is halfway reached.” Zig Ziglar Problem Given a board state, find a combination of moves that leads to the final state. Graph Representation Now that we … Read more

What follows AlphaStar for Academic AI Researchers?

DeepMind continues making progress, but the path forward for AI researchers in academia is unclear. Ten years ago I challenged AI researchers across the globe to build a professional-level bot for StarCraft 1. The Brood War API was recently released, and for the first time academics and professionals could test out AI systems on a highly-competitive … Read more

Using AI For Good

How to Help Developing Countries with Artificial Intelligence CE KanBlockedUnblockFollowFollowing Jan 27 Recently, I have come across quite a few articles stating how artificial intelligence may threaten the developing world by eliminating the need for repetitive, labor-intensive manufacturing roles. Automation of factories can potentially lead to higher unemployment rates in poorer nations, thereby disrupting local … Read more

Hierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

Photo by sabina fratila on Unsplash In the first part of this series, we explored the basics of using a Bayesian-based machine learning model framework, PyMC3, to construct a simple Linear Regression model on Ford GoBike data. In this example problem, we aimed to forecast the number of riders that would use the bike share tomorrow … Read more

Graphing My Daily Phone Use

How many times do I look at my phone? I set up a small program on my phoneto count the screen activations and logged to a file. In this post I showwhat went wrong and how to plot the results. The data I set up a small program on my phone that counts every day … Read more

Categories R Tags ExcerptFavorite

Mathematical Notation in Online R/exams

Many R/exams exercises employ mathematical notation that needs to be converted and rendered suitably for inclusion in online exams. While R/exams attempts to set suitable defaults, an overview is provided of possible adjustments and when these might be useful or even necessary. Overview A popular use case of the R/exams package is the generation of … Read more

Categories R Tags ExcerptFavorite

Handling imbalanced datasets in machine learning

Reworking the problem is better Up to now the conclusion is pretty disappointing: if the dataset is representative of the true data, if we can’t get any additional feature and if we target a classifier with the best possible accuracy, then a “naive behaviour” (answering always the same class) is not necessarily a problem and should … Read more

Interactive Controls for Jupyter Notebooks

How to use interactive IPython widgets to enhance data exploration and analysis There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. Despite knowing this, I still find myself repeatedly executing cells just to make the slightest change, for example, choosing … Read more

Building Big Shiny Apps — A Workflow (1/2)

During the rstudio::conf(2019L), I’ve presented an eposter called “Building Big Shiny Apps — A Workflow”. You can find the poster here, and this blog post is an attempt at a transcription of what I’ve been talking about while presenting the poster. As this is a rather long topic, I’ve divided this post into two parts: … Read more

Categories R Tags ExcerptFavorite

Degrees of Freedom and Sudoko

Intuitive explanation of Degrees of Freedom and How Degrees of Freedom affects Sudoku Source : Pixabay A lot of aspiring Data Scientists take courses on statistics and get befuddled with the concept of Degrees of Freedom. Some memorize it by rote as ‘n-1′. But there is a intuitive reason why it is ‘n-1’. The Intuitive … Read more

10 Tips for Choosing the Optimal Number of Clusters

Matt.0BlockedUnblockFollowFollowing Jan 27 Photo by Pakata Goh on Unsplash Clustering is one of the most common unsupervised machine learning problems. Similarity between observations is defined using some inter-observation distance measures or correlation-based distance measures. There are 5 classes of clustering methods: + Hierarchical Clustering+ Partitioning Methods (k-means, PAM, CLARA)+ Density-Based Clustering+ Model-based Clustering+ Fuzzy Clustering My … Read more

R tips and tricks – higher-order functions

A higher-order function is a function that takes one or more functions as arguments, and\or returns a function as its result. This can be super handy in programming when you want to tilt your code towards readability and still keep it concise.Consider the following code: # Generate some fake data > eps <- rnorm(10, sd= … Read more

Categories R Tags ExcerptFavorite

A Gentle Introduction to Deep Learning : Part 3

PCA & Linear Algebra(Advance) Photo by Antoine Dautry “You can’t build great building on a weak foundation”. This quote truly justifies what I am trying to do here, you cannot learn the true form of machine learning or deep learning until you don’t have the knowledge of some of the important mathematical concepts like linear algebra … Read more

Data Augmentation for Natural Language Processing

Lessons learned from a hate speech detection task to improve supervised NLP models Note: this post is mainly targeted at an audience unfamiliar with Natural Language Processing and will hence cover some basics concepts before moving on to data augmentation Source: Harvard Political Review Natural Language Processing (NLP) has become increasingly popular in both academia and … Read more

Learning to Drive Smoothly in Minutes

Learning to Drive in Minutes — The Updated Approach Although Wayve.ai technique may work in principle, it has some issues that needs to be addressed to apply it to a self-driving RC car. First, because the feature extractor (VAE) is trained after each episode, the distribution of features is not stationary. That is to say, the features are … Read more

The New Dawn of AI: Federated Learning

The emerging AI market model is dominated by tech giants such as Google, Amazon and Microsoft, who offer cloud-based AI solutions and APIs. This model offers users little control over the usage of AI products and their own data that is collected from their devices, locations etc. In the long run, such a centralized model … Read more

Analytics Building Blocks: Regression

A modularized notebook to tune and compare 11 regression algorithms with minimal coding in a control panel fasion This article summarizes and explains key modules of my regression block (One of the simple modularized notebooks I am developing to execute common analysis tasks). The notebook is intended to facilitate quicker experimentation for the users with … Read more