10 Steps to Teaching Data Science Well

1. Ensure everyone in your class feels comfortable to participate. Create an inclusive learning environment by ensuring everyone feels like they belong to the classroom and can participate, whether it’s through answering questions in front of everyone or approaching the instructor one-on-one. Prioritize people of color, women, LGBTQIA+, and folks with disabilities when assessing participation … Read more10 Steps to Teaching Data Science Well

Will your income be more than $50K/yr? Machine Learning can tell

Machine learning is breaking grounds in numerous fields including Finance. What if we could use Machine Learning models to identify incomes of individuals? I found just the right dataset for this, called Census Income Dataset. I used the information in the dataset to predict if someone would earn an income greater than $50K/yr. I collected … Read moreWill your income be more than $50K/yr? Machine Learning can tell

Semantic search

Bonus: how does it work? Google included a Knowledge Graph in 2012, an ontology, a representation of semantic relations between people, places and things in a graph format. These relations can be synonyms, homonyms, etc. With the Hummingbird update in 2013, Google had a huge knowledge graph of its collection of around 570 million concepts and … Read moreSemantic search

Location Location Location

How to create geographic area embeddings using Machine Learning and a little black magic wizardry. The Zoopla.co.uk journey is heavily focused on geographic areas: it’s the only mandatory input to the Zoopla search. And there are a lot of unique geographic areas consumers reference in their Zoopla searches, 32,237 over the last year to be … Read moreLocation Location Location

Building a Content Based Recommender System for Hotels in Seattle

Photo Credit: Pixabay How to use description of a hotel to recommend similar hotels. The cold start problem is a well known and well researched problem for recommender systems, where system is not able to recommend items to users. due to three different situation i.e. for new users, for new products and for new websites. Content-based filtering … Read moreBuilding a Content Based Recommender System for Hotels in Seattle

Monotonic Binning with GBM

In addition to monotonic binning algorithms introduced in my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), two more functions based on Generalized Boosted Regression Models have been added to my GitHub repository, gbm_bin() and gbmcv_bin(). The function gbm_bin() estimates a GBM model without the cross validation and tends to generate a more granular binning outcome. The function gbmcv_bin() estimates … Read moreMonotonic Binning with GBM

Multivariate Time Series Forecasting Using Random Forest

Introduction In my earlier post (Understanding Entity Embeddings and It’s Application) [1], I’ve talked about solving a forecasting problem using entity embeddings — basically using tabular data that have been represented as vectors and using them as input to a neural network based model to solve a forecasting problem. This time around though, I’ll be doing the … Read moreMultivariate Time Series Forecasting Using Random Forest

Get rid of your fear and conqueror Git in less than five minutes.

Why we write another guide? We write another because we need to help ourselves to really understand how Git works and what it can do for us, we need a slightly deeper understanding. Let me put it in this way my dear readers I and my friend we decide to draw a picture together on a … Read moreGet rid of your fear and conqueror Git in less than five minutes.

Improving user experience with AI on mobile

For many users, their mobile device is their preferred interface to the online world. On-device intelligence is nothing new — think smart assistants, predictive text — but these systems have tended to involve simple predictive models or rule-based systems. While deep neural networks are the current state-of-the-art for image and video analysis, their complex structures impose a heavy computational … Read moreImproving user experience with AI on mobile

A Brief History of ASR: Automatic Speech Recognition

This moment has been a long time coming. The technology behind speech recognition has been in development for over half a century, going through several periods of intense promise — and disappointment. So what changed to make ASR viable in commercial applications? And what exactly could these systems accomplish, long before any of us had heard of … Read moreA Brief History of ASR: Automatic Speech Recognition

Why Sublime Text for Data Science is Hotter than Jennifer Lawrence?

1. Create A Dictionary/List or Whatever: How many times does it happen that we want to make a list or dictionary for our Python code from a list we got in an email text? I bet numerous times. How do we do this? We haggle in Excel by loading that Text in Excel and then trying … Read moreWhy Sublime Text for Data Science is Hotter than Jennifer Lawrence?

Tips for R to Python and Vice-Versa seamlessly

When we TATVA AI visit our clients, often both data scientists and higher management ask us, how we deal with both  Python and R simultaneously for client requests; as there is no universal preference among clients. Though solution is not straight forward, however, I suggest to exploit common libraries for quick deployments, such as, dfply (python) … Read moreTips for R to Python and Vice-Versa seamlessly

Healthcare tweet Extraction, Visualisation and Particle Swarm Optimisation using Python

Abstract We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our … Read moreHealthcare tweet Extraction, Visualisation and Particle Swarm Optimisation using Python

Building efficient data pipelines using TensorFlow

Having efficient data pipelines is of paramount importance for any machine learning model. In this blog, we will learn how to use TensorFlow’s Dataset module tf.data to build efficient data pipelines. https://www.tensorflow.org/guide/performance/datasets Motivation Most of the introductory articles on TensorFlow would introduce you with the feed_dict method of feeding the data to the model. feed_dict … Read moreBuilding efficient data pipelines using TensorFlow

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

In this blog post I’m going to show you how you can extract text from scanned pdf files, or pdf fileswhere no text recognition was performed. (For pdfs where text recognition was performed, you canread my other blog post). The pdf I’m going to use can be downloaded from here.It’s a poem titled, D’Léierchen (Dem … Read moreGet text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

mapedit 0.5.0 and Leaflet.pm

[view rawRmd] In our last post mapedit and leaflet.js >1.0 wediscussed remaining tasks for theRConsortium funded projectmapedit. mapedit 0.5.0 fixesa couple of lingering issues, but primarily focuses on bringing thepower of Leaflet.pm as analternate editor.Leaflet.draw,the original editor in mapedit provided by leaflet.extras, is awonderful tool but struggles with snapping and those pesky holes that wecommonly … Read moremapedit 0.5.0 and Leaflet.pm

Identifying the Sources of Winter Air Pollution in Bangkok Part II

Mae Fah Luang University Campus on March 2019. (Photo by MFU Photoclub with permission) In the previous blog, I looked at the winter air pollution in Bangkok. The main source of pollution comes from particles smaller than 2.5 micrometer (PM 2.5 particles). These particles are smaller than the width of a human hair and can … Read moreIdentifying the Sources of Winter Air Pollution in Bangkok Part II

Step-by-Step Guide to Creating R and Python Libraries

R and Python are the bread and butter of today’s machine learning languages. R provides powerful statistics and quick visualizations, while Python offers an intuitive syntax, abundant support, and is the choice interface to today’s major AI frameworks. In this article we’ll look at the steps involved in creating libraries in R and Python. This … Read moreStep-by-Step Guide to Creating R and Python Libraries

Cleaning, Analyzing, and Visualizing Survey Data in Python

A tutorial using pandas, matplotlib, and seaborn to produce digestible insights from dirty data If you work in data at a D2C startup, there’s a good chance you will be asked to look at survey data at least once. And since SurveyMonkey is one of the most popular survey platforms out there, there’s a good chance … Read moreCleaning, Analyzing, and Visualizing Survey Data in Python

Machine Learning and Discrimination

Optimizing for Fairness Building machine learning algorithms that optimize for non-discrimination can be done in 4 ways: Formalizing a non-discrimination criterion Demographic parity Equalized odds Well-calibrated systems We will discuss each of these in turn. Formalizing a non-discrimination criterion is essentially what the other 3 approaches involve, they are types of criterion which aim to … Read moreMachine Learning and Discrimination

The inner path to becoming SensAI

The inner path to becoming SensAI Recently I did a proof of concept hackathon with Yama Saraj for his startup SensAI. Such hackathon forces founders to express their vision on their business model and their technology stack. There is a big difference running a social entreprise on DIY technology or a startup with a stack in … Read moreThe inner path to becoming SensAI

Applied AI: Going From Concept to ML Components

Opening your mind to different ways of applying machine learning to the real world. By Abraham Kang with special thanks to Kunal Patel and Jae Duk Seo for being a sounding board and providing input for this article. Photo by Franck V. on Unsplash Executive Summary Candidate Problem Many people are interested in automating redundant processes … Read moreApplied AI: Going From Concept to ML Components

What library can load image in Python and what are their difference?

from skimage import io img = io.imread(img_dir) Colour channel After loading the image, usually plt.imshow(img) will be used to plot the images. Let’s plot some doge! You may spot that the OpenCV image above looks odd. It is because matplotlib, PIL and skimage represent image in RGB (Red, Green, Blue) order, while OpenCV is in … Read moreWhat library can load image in Python and what are their difference?

Trust and interpretability in machine learning

Do machine learning models always need to be interpretable? Given a choice between an interpretable model that is inaccurate and a non-interpretable model that is accurate, wouldn’t you rather choose the non-interpretable but accurate model? In other words, is there any reason for sacrificing accuracy at the altar of interpretability? Before going any further we … Read moreTrust and interpretability in machine learning

Implementing MACD in Python

MACD is a popularly used technical indicator in trading stocks, currencies, cryptocurrencies, etc. MACD is popularly used in analyzing charts for stocks, currencies, crypto, and other assets…Credit: Unsplash Basics of MACD MACD is used and discussed in many different trading circles. Moving Average Convergence Divergence (MACD) is a trend following indicator. MACD can be calculated very … Read moreImplementing MACD in Python

Microsoft Introduction to AI — Part 1

Machine Learning Are you a bit like me and have wanted to learn about Artificial Intelligence although felt a little intimidated by the maths involved? Maybe you thought the concepts were too difficult to understand and you would be out of your depth. I recently completed the Microsoft Introduction to AI course and wrote course … Read moreMicrosoft Introduction to AI — Part 1

Getting Started with Google BigQuery’s Machine Learning — Titanic Dataset

While still in Beta, BigQuery ML has been available since mid last year; however, I didn’t get around to working with this Google cloud-based Machine Learning offering until recently. As a non-data scientist, my first impression — what’s not to like? After all, the ability to run ML models from the comfort of web-based SQL editor is … Read moreGetting Started with Google BigQuery’s Machine Learning — Titanic Dataset

The problem with data science job postings

Every once in a while, you notice something that you realize you probably should have noticed a long time ago. You start to see it everywhere. You wonder why more people aren’t talking about it. For me, “every once in a while” was yesterday when I was scrolling through the #jobs channel in the SharpestMinds … Read moreThe problem with data science job postings

Website with Australian federal election forecasts by @ellis2013nz

The election forecasts Building on my recent blog posts, I’ve put up a page dedicated to forecasts of the coming Australian federal election. It takes the state space model of two-party-preferred vote from my first blog on polls leading up to this election, and combines it with a more nuanced understanding of the seats actually … Read moreWebsite with Australian federal election forecasts by @ellis2013nz

Should you start your R blog now? 6 reasons I found in my first year of R blogging

It has been a year since I posted the first post on this blog. Since that time, I have learned many lessons, but the main one is probably that blogging has never been as accessible as it is now. In this anniversary post, I would like to give you a few reasons to start your … Read moreShould you start your R blog now? 6 reasons I found in my first year of R blogging

We are All Baby Shark (in Data Tracking)

A look at telemetry, fitness trackers, and depressing loads of data. Data is amazing. You know that already. You’re told it every moment of every day. We are literally told or shown by ESPN, our kids report cards, our treadmills, our wristwatch, our Alexa, our Google, our Siri, our phones, and our apps that data is … Read moreWe are All Baby Shark (in Data Tracking)

Analyzing performances of cricketers using cricketr template

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available. Other interesting functions include batting performance moving average, forecast and … Read moreAnalyzing performances of cricketers using cricketr template

Wrapping up the stars project

[view rawRmd] Summary This is the fourth blog on thestars project, an it completes theR-Consortium funded project for spatiotemporal tidy arrays with R. Itreports on the current status of the project, and current developmentdirections. Although this project ends, with the release of stars 0.3 onCRAN, theadoption, update, enthusiasm and participation in the development of thestars … Read moreWrapping up the stars project

Quick Control Charts for AFL

Related To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read moreQuick Control Charts for AFL

Attention and its Different Forms

An overview of generalised attention with its different types and uses. I assume you are already familiar with Recurrent Neural Networks (including the seq2seq encoder-decoder architecture). The Bottleneck Problem In the encoder-decoder architecture, the complete sequence of information must be captured by a single vector. This poses problems in holding on to information at the beginning … Read moreAttention and its Different Forms

Yes, Rep. Borowicz invocation was unusual

On Monday, Pa. State Rep. Stephanie Borowicz gave in invocation on the House floor that started causing backlash before she even said “Amen.” The prayer, which coincided with the swearing in of the state’s first Muslim female representative, has been labeled as Islamophobic and xenophobic by critics. What was so controversial? Filled with calls for … Read moreYes, Rep. Borowicz invocation was unusual

Better predictions for AFL from adjusted Elo ratings by @ellis2013nz

Warning – this post discusses gambling odds and even describes me placing small $5 bets, which I can easily afford to lose. In no way should this be interpreted as advice to anyone else to do the same, and I accept absolutely no liability for anyone who treats this blog post as a basis for … Read moreBetter predictions for AFL from adjusted Elo ratings by @ellis2013nz

Graphing Brexit

Along with many of my countrymen, I spent Wednesday night watching UK MPs vote ‘No’ to a series of potential Brexit options, and then read analysis of how different individuals and parties had votes on various websites. While being very interesting, all the analysis felt very tabular to me, and I was curious whether we … Read moreGraphing Brexit

Securing a dockerized plumber API with SSL and Basic Authentication

The use of docker containers by now is a well established technique to make the deployment of R scripts to a stable environment incredibly easy and reliable. In cases where you dockerize a shiny app or want to provide a REST API with plumber, often it is mandatory to somehow restrict the access to the … Read moreSecuring a dockerized plumber API with SSL and Basic Authentication

drat 0.1.5: New release

A new version of drat just arrived on CRAN. And like the last time in December 2017 it went through as an automatically processed upgrade directly from the CRAN prechecks. Being a simple package can have its upsides… And like the last time, this release once again draws largely upon contributed pull requests. Neal Fultz … Read moredrat 0.1.5: New release