10 Steps to Teaching Data Science Well

1. Ensure everyone in your class feels comfortable to participate. Create an inclusive learning environment by ensuring everyone feels like they belong to the classroom and can participate, whether it’s through answering questions in front of everyone or approaching the instructor one-on-one. Prioritize people of color, women, LGBTQIA+, and folks with disabilities when assessing participation … Read more

Will your income be more than $50K/yr? Machine Learning can tell

Machine learning is breaking grounds in numerous fields including Finance. What if we could use Machine Learning models to identify incomes of individuals? I found just the right dataset for this, called Census Income Dataset. I used the information in the dataset to predict if someone would earn an income greater than $50K/yr. I collected … Read more

Semantic search

Bonus: how does it work? Google included a Knowledge Graph in 2012, an ontology, a representation of semantic relations between people, places and things in a graph format. These relations can be synonyms, homonyms, etc. With the Hummingbird update in 2013, Google had a huge knowledge graph of its collection of around 570 million concepts and … Read more

Location Location Location

How to create geographic area embeddings using Machine Learning and a little black magic wizardry. The Zoopla.co.uk journey is heavily focused on geographic areas: it’s the only mandatory input to the Zoopla search. And there are a lot of unique geographic areas consumers reference in their Zoopla searches, 32,237 over the last year to be … Read more

Back to Basics: Part 1

Permutations and Combinations It seems that no matter how many advanced stats classes I take, or how many times I revisit this topic, permutations and combinations continue to elude me. This post is the first in a series of “Back to Basics” blog posts, which aims to build up the Mathematical foundations of Data Science. … Read more

Building a Content Based Recommender System for Hotels in Seattle

Photo Credit: Pixabay How to use description of a hotel to recommend similar hotels. The cold start problem is a well known and well researched problem for recommender systems, where system is not able to recommend items to users. due to three different situation i.e. for new users, for new products and for new websites. Content-based filtering … Read more

Monotonic Binning with GBM

In addition to monotonic binning algorithms introduced in my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), two more functions based on Generalized Boosted Regression Models have been added to my GitHub repository, gbm_bin() and gbmcv_bin(). The function gbm_bin() estimates a GBM model without the cross validation and tends to generate a more granular binning outcome. The function gbmcv_bin() estimates … Read more

Categories R Tags ExcerptFavorite

Graphing Brexit: Clustering Edition

This is the 2nd in a series of posts showing how to analyse Brexit with graphs. In this post we cluster MPs based on voting records. Brexit — EU On Friday I wrote a blog post showing how to do graph analysis of Brexit data using Neo4j, and towards the end of the first post I showed … Read more

Multivariate Time Series Forecasting Using Random Forest

Introduction In my earlier post (Understanding Entity Embeddings and It’s Application) [1], I’ve talked about solving a forecasting problem using entity embeddings — basically using tabular data that have been represented as vectors and using them as input to a neural network based model to solve a forecasting problem. This time around though, I’ll be doing the … Read more

Improving user experience with AI on mobile

For many users, their mobile device is their preferred interface to the online world. On-device intelligence is nothing new — think smart assistants, predictive text — but these systems have tended to involve simple predictive models or rule-based systems. While deep neural networks are the current state-of-the-art for image and video analysis, their complex structures impose a heavy computational … Read more

A Brief History of ASR: Automatic Speech Recognition

This moment has been a long time coming. The technology behind speech recognition has been in development for over half a century, going through several periods of intense promise — and disappointment. So what changed to make ASR viable in commercial applications? And what exactly could these systems accomplish, long before any of us had heard of … Read more

Tips for R to Python and Vice-Versa seamlessly

When we TATVA AI visit our clients, often both data scientists and higher management ask us, how we deal with both  Python and R simultaneously for client requests; as there is no universal preference among clients. Though solution is not straight forward, however, I suggest to exploit common libraries for quick deployments, such as, dfply (python) … Read more

Categories R Tags ExcerptFavorite

Matrix-style screensaver in R

This post shares short code snippet to make your own screen saver in R, The Matrix-style: The code takes a few seconds to complete. nx = 100 ny = 80 kk <- 110 x = sample(x = 1:nx, size = kk, replace = TRUE) y = seq(-1, -ny, length = kk) codes <- matrix(0:127, 8, … Read more

Categories R Tags ExcerptFavorite

Using R: plotting the genome on a line

Imagine you want to make a Manhattan-style plot or anything else where you want a series of intervals laid out on one axis after one another. If it’s actually a Manhattan plot you may have a friendly R package that does it for you, but here is how to cobble the plot together ourselves with … Read more

Categories R Tags ExcerptFavorite

FastAI Image Segmentation

Getting our data For this tutorial, we will use the CamVid data-set which is a really high-quality road-segmentation data-set provided by the University of Cambridge. Another nice thing about the data-set is that we don’t need to download it manually because it is included in the FastAI library and so we can simply download it using … Read more

Building efficient data pipelines using TensorFlow

Having efficient data pipelines is of paramount importance for any machine learning model. In this blog, we will learn how to use TensorFlow’s Dataset module tf.data to build efficient data pipelines. https://www.tensorflow.org/guide/performance/datasets Motivation Most of the introductory articles on TensorFlow would introduce you with the feed_dict method of feeding the data to the model. feed_dict … Read more

mapedit 0.5.0 and Leaflet.pm

[view rawRmd] In our last post mapedit and leaflet.js >1.0 wediscussed remaining tasks for theRConsortium funded projectmapedit. mapedit 0.5.0 fixesa couple of lingering issues, but primarily focuses on bringing thepower of Leaflet.pm as analternate editor.Leaflet.draw,the original editor in mapedit provided by leaflet.extras, is awonderful tool but struggles with snapping and those pesky holes that wecommonly … Read more

Categories R Tags ExcerptFavorite

See you at useR! Toulouse

Hey all, just a quick post to give you some details about my workshop at useR! 2019, in Toulouse! Hacking RStudio: Advanced Use of your Favorite IDE About Have you ever wanted to become more productive with RStudio? Then this workshop is made for you! You’ve been wandering the web for a while now, reading … Read more

Categories R Tags ExcerptFavorite

Step-by-Step Guide to Creating R and Python Libraries

R and Python are the bread and butter of today’s machine learning languages. R provides powerful statistics and quick visualizations, while Python offers an intuitive syntax, abundant support, and is the choice interface to today’s major AI frameworks. In this article we’ll look at the steps involved in creating libraries in R and Python. This … Read more

Machine Learning and Discrimination

Optimizing for Fairness Building machine learning algorithms that optimize for non-discrimination can be done in 4 ways: Formalizing a non-discrimination criterion Demographic parity Equalized odds Well-calibrated systems We will discuss each of these in turn. Formalizing a non-discrimination criterion is essentially what the other 3 approaches involve, they are types of criterion which aim to … Read more

The inner path to becoming SensAI

The inner path to becoming SensAI Recently I did a proof of concept hackathon with Yama Saraj for his startup SensAI. Such hackathon forces founders to express their vision on their business model and their technology stack. There is a big difference running a social entreprise on DIY technology or a startup with a stack in … Read more

Applied AI: Going From Concept to ML Components

Opening your mind to different ways of applying machine learning to the real world. By Abraham Kang with special thanks to Kunal Patel and Jae Duk Seo for being a sounding board and providing input for this article. Photo by Franck V. on Unsplash Executive Summary Candidate Problem Many people are interested in automating redundant processes … Read more

What library can load image in Python and what are their difference?

from skimage import io img = io.imread(img_dir) Colour channel After loading the image, usually plt.imshow(img) will be used to plot the images. Let’s plot some doge! You may spot that the OpenCV image above looks odd. It is because matplotlib, PIL and skimage represent image in RGB (Red, Green, Blue) order, while OpenCV is in … Read more

Trust and interpretability in machine learning

Do machine learning models always need to be interpretable? Given a choice between an interpretable model that is inaccurate and a non-interpretable model that is accurate, wouldn’t you rather choose the non-interpretable but accurate model? In other words, is there any reason for sacrificing accuracy at the altar of interpretability? Before going any further we … Read more

Implementing MACD in Python

MACD is a popularly used technical indicator in trading stocks, currencies, cryptocurrencies, etc. MACD is popularly used in analyzing charts for stocks, currencies, crypto, and other assets…Credit: Unsplash Basics of MACD MACD is used and discussed in many different trading circles. Moving Average Convergence Divergence (MACD) is a trend following indicator. MACD can be calculated very … Read more

Microsoft Introduction to AI — Part 1

Machine Learning Are you a bit like me and have wanted to learn about Artificial Intelligence although felt a little intimidated by the maths involved? Maybe you thought the concepts were too difficult to understand and you would be out of your depth. I recently completed the Microsoft Introduction to AI course and wrote course … Read more

Repetition in Songs: A Python Tutorial

One of Ed Sheeran songs as a case study Credit: Unsplash Everyone has heard a song or knows what a song sounds like. I can carelessly say everyone can define a song …in their own words. Just for the benefit of the doubt, a song (according to Wikipedia) is a single work of music that is typically … Read more

Getting Started with Google BigQuery’s Machine Learning — Titanic Dataset

While still in Beta, BigQuery ML has been available since mid last year; however, I didn’t get around to working with this Google cloud-based Machine Learning offering until recently. As a non-data scientist, my first impression — what’s not to like? After all, the ability to run ML models from the comfort of web-based SQL editor is … Read more

The problem with data science job postings

Every once in a while, you notice something that you realize you probably should have noticed a long time ago. You start to see it everywhere. You wonder why more people aren’t talking about it. For me, “every once in a while” was yesterday when I was scrolling through the #jobs channel in the SharpestMinds … Read more

Website with Australian federal election forecasts by @ellis2013nz

The election forecasts Building on my recent blog posts, I’ve put up a page dedicated to forecasts of the coming Australian federal election. It takes the state space model of two-party-preferred vote from my first blog on polls leading up to this election, and combines it with a more nuanced understanding of the seats actually … Read more

Categories R Tags ExcerptFavorite

We are All Baby Shark (in Data Tracking)

A look at telemetry, fitness trackers, and depressing loads of data. Data is amazing. You know that already. You’re told it every moment of every day. We are literally told or shown by ESPN, our kids report cards, our treadmills, our wristwatch, our Alexa, our Google, our Siri, our phones, and our apps that data is … Read more

Analyzing performances of cricketers using cricketr template

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available. Other interesting functions include batting performance moving average, forecast and … Read more

Categories R Tags ExcerptFavorite

Wrapping up the stars project

[view rawRmd] Summary This is the fourth blog on thestars project, an it completes theR-Consortium funded project for spatiotemporal tidy arrays with R. Itreports on the current status of the project, and current developmentdirections. Although this project ends, with the release of stars 0.3 onCRAN, theadoption, update, enthusiasm and participation in the development of thestars … Read more

Categories R Tags ExcerptFavorite

Quick Control Charts for AFL

Related To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more

Categories R Tags ExcerptFavorite

no country for old liars

A puzzle from the Riddler about a group of five persons, A,..,E, where all and only people strictly older than L are liars, all making statements about others’ ages: A: B>20 and D>16 B: C>18 and E<20 C: D<22 and A=19 D: E≠20 and B=20 E: A>21 and C<18 The Riddler is asking for the … Read more

Categories R Tags ExcerptFavorite

Attention and its Different Forms

An overview of generalised attention with its different types and uses. I assume you are already familiar with Recurrent Neural Networks (including the seq2seq encoder-decoder architecture). The Bottleneck Problem In the encoder-decoder architecture, the complete sequence of information must be captured by a single vector. This poses problems in holding on to information at the beginning … Read more

Yes, Rep. Borowicz invocation was unusual

On Monday, Pa. State Rep. Stephanie Borowicz gave in invocation on the House floor that started causing backlash before she even said “Amen.” The prayer, which coincided with the swearing in of the state’s first Muslim female representative, has been labeled as Islamophobic and xenophobic by critics. What was so controversial? Filled with calls for … Read more

Graphing Brexit

Along with many of my countrymen, I spent Wednesday night watching UK MPs vote ‘No’ to a series of potential Brexit options, and then read analysis of how different individuals and parties had votes on various websites. While being very interesting, all the analysis felt very tabular to me, and I was curious whether we … Read more

drat 0.1.5: New release

A new version of drat just arrived on CRAN. And like the last time in December 2017 it went through as an automatically processed upgrade directly from the CRAN prechecks. Being a simple package can have its upsides… And like the last time, this release once again draws largely upon contributed pull requests. Neal Fultz … Read more

Categories R Tags ExcerptFavorite

Cross Platform Super Dark IDE Theme, R-Studio Server

A recent post over at r-bar.net demonstrated how to use a Windows system utility to achieve a super Dark Rstudio theme. Super Dark Theme Well if you liked that post , but are on Linux or OSX or use RStudio Server, then we have a cross platform solution for you! Rstudio is a web browser, … Read more

Categories R Tags ExcerptFavorite

Explore your Researcher Degrees of Freedom

I am an applied economist working in the area of accounting and corporate transparency. I work with observational data a lot, meaning with data that is already available and not under my control. Whenever I set sails to design a test, there are a lot of decisions to take: Which sample should I use? What … Read more

Categories R Tags ExcerptFavorite