A Glimpse of TensorFlow

TensorFlow is a popular open-source software library from Google. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, TensorFlow was released under the Apache 2.0 open source license. Detailed study of TensorFlow can take months. But a glimpse of its power … Read moreA Glimpse of TensorFlow

NLP Learning Series Part 1: Text Preprocessing Methods for Deep Learning

Recently, I started up with an NLP competition on Kaggle called Quora Question insincerity challenge. It is an NLP Challenge on text classification and as the problem has become more clear after working through the competition as well as by going through the invaluable kernels put up by the kaggle experts, I thought of sharing … Read moreNLP Learning Series Part 1: Text Preprocessing Methods for Deep Learning

Automated Dashboard for Classification Neural Network in R

Categories Programming Tags Data Visualisation Flexdashboard Neural Networks R Programming In this article, you learn how to make Automated Dashboard for Classification Neural Network in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R. After this … Read moreAutomated Dashboard for Classification Neural Network in R

“Interesting” Projections — Where PCA Fails.

An attractive alternative for exploratory data analysis. Most data scientists are familiar with principal components analysis (PCA) as an exploratory data analysis tool. A recap for the uninitiated: researchers often use PCA for dimensionality reduction in hopes of revealing useful information in their data (e.g. disease vs. non-disease class separation). PCA does this through finding … Read more“Interesting” Projections — Where PCA Fails.

My course on Hyperparameter Tuning in R is now on Data Camp!

I am very happy to announce that (after many months) my interactive course on Hyperparameter Tuning in R has now been officially launched on Data Camp! Course Description For many machine learning problems, simply running a model out-of-the-box and getting a prediction is not enough; you want the best model with the most accurate prediction. … Read moreMy course on Hyperparameter Tuning in R is now on Data Camp!

ROC Curves

I have been thinking about writing a short post on R resources for working with (ROC) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer to its historical origins as a … Read moreROC Curves

RStudio Connect 1.7.0

RStudio Connect is the publishing platform for everything you create in R. Inconversations with our customers, R users were excited to have a central placeto share all their data products, but were facing a tough problem. Theircolleagues working in Python didn’t have the same option, leaving their workstranded on their desktops. Today, we are excited … Read moreRStudio Connect 1.7.0

Image Classification using SSIM

Simple Image Classifier with OpenCV Find the Differences As humans, we are generally very good at finding the difference in a picture. For example, let’s look at the above picture and see how they are different. For one, the fruits, ice-creams and drinks have obviously changed. That was pretty easy, right? However, for computers, this is … Read moreImage Classification using SSIM

Recursive Programming

How to solve a problem by pretending you already have Despite often being introduced early-on in most ventures into programming, the concept of recursion can seem strange and potentially off-putting upon first encountering it. It seems almost paradoxical: how can we find a solution to a problem using the solution to the same problem? Recursion can … Read moreRecursive Programming

Evaluating A Real-Life Recommender System, Error-Based and Ranking-Based

A recommender system aims to find and suggest items of likely interest based on the users’ preferences Recommender system is one of the most valuable applications in machine learning today. Amazon attributes its 35% of revenue to its recommender system. Evaluation is an integral part of researching and developing any recommender system. Depends on your … Read moreEvaluating A Real-Life Recommender System, Error-Based and Ranking-Based

AI, Machine Learning and Data Science Roundup: January 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News Preview of Tensorflow 2.0 (the public preview … Read moreAI, Machine Learning and Data Science Roundup: January 2019

The Myth of Us vs Them

The Data For this article, we’ll rely primarily on Hans Rosling’s book: Factfulness: Ten Reasons We’re Wrong About the World. Other sources are the World Bank Poverty Page, the YouTube video “Debunking Third World Myths with the Best Stats You’ve Never Seen”, and the article “Should We Continue to Use the Term ‘Developing World’?”. I … Read moreThe Myth of Us vs Them

Everything You Need to Know About Decision Trees

Intro to decision trees, random forests, bagging, boosting, and the underlying theory Photo by Jay Mantri on Unsplash Tree-based methods can be used for regression or classification. They involve segmenting the prediction space into a number of simple regions. The set of splitting rules can be summarized in a tree, hence the name decision tree methods. … Read moreEverything You Need to Know About Decision Trees

Analyzing Information Flow within a Twitter (Ego-)Community

Topic Modelling & Analysis Topic-Term Profile Put simply, Topic modelling takes a collection of ‘documents’ (tweets in our case) which are made of various ‘terms’ (words in the tweets) and finds N (the number of topics) unique weighting strategies to apply to the terms such that each of ‘document’ is categorized into a mixture of the … Read moreAnalyzing Information Flow within a Twitter (Ego-)Community

The Data Fabric for Machine Learning. Part 1.

How the new advances in semantics and the data fabric can help us be better at Machine Learning. Also, a new definition of machine learning. Introduction If you search for machine learning online you’ll find around 2,050,000,000 results. Yeah for real. It’s not easy to find that description or definition that fits every use or … Read moreThe Data Fabric for Machine Learning. Part 1.

Using DataCamp reduces anxiety about learning R!

I used DataCamp‘s excellent Introduction to R as Essential Prior Independent Study and found it made people a bit less worried about a term of R! I have a lot of fun teaching first year biology undergraduates but there are a few challenges in teaching data skills when they are not (perceived as) a student’s core discipline … Read moreUsing DataCamp reduces anxiety about learning R!

Decision Boundary Visualization(A-Z)

Meaning, Significance, Implementation Classification problems have been very common and essential in the field of Data Science. For example: Diabetic Retinopathy, Mood or Sentiment Analysis, Digit Recognition, Cancer-Type prediction (Malignant or Benign) etc. These problems are often solved by Machine Learning or Deep Learning. Also in Computer Vision, projects like Diabetic Retinopathy or Glaucoma Detection, … Read moreDecision Boundary Visualization(A-Z)

London Design Festival (Part 3): Computer Vision

Part 3: Analysing 3K images from Twitter using computer vision Introduction In this final blog post of the series, I apply computer vision techniques to understand 3,300 images about the London Design Festival 2018, a seven-day design festival that happened from 15 to 23 September 2018. London Design Festival 2018 (LDF18) had a very active events … Read moreLondon Design Festival (Part 3): Computer Vision

How to cut out the SQL middle-person in analytics

Constantly manually executing SQL queries for your clients? Here’s a way to get them to help themselves. One of my current goals in life is to help data analysts cut out the boring, mind-numbing aspects of their work so that they can focus on more interesting, useful and cool stuff. One common situation I see … Read moreHow to cut out the SQL middle-person in analytics

The power of subjectivity in the age of robot journalism

If you were born before 1990, there is a good chance you remember the guy with whom you interacted in video rental shops. Let’s call him Bob. When you met Bob, he took your previous movie back, recommended a new one to you and finally took your money. What is interesting about the Bobs is … Read moreThe power of subjectivity in the age of robot journalism

Exploring Australian Open Tennis data with Tableau

Using Tableau to understand Australian Open Tennis data for the Men’s tour 2000 to 2018. Tableau is like visual SQL It’s a heatwave this week in Sydney with 30+ degrees. When is winter arriving? I have been missing out of action with my posts, I continue to write to my future self and continue the … Read moreExploring Australian Open Tennis data with Tableau

NHL player rating using standard and advanced hockey stats

Playing EA sports (EAS) NHL (or “Chel” as it known) over the years, I have often wondered how they come up with their player ratings. According to each player bio and profile, the total score (usually running from around 70 to 100) is derived from a combination of various parameters attributed to each player. For … Read moreNHL player rating using standard and advanced hockey stats

Automated Dashboard for Credit Modelling with Decision trees and Random forests in R

Categories Programming Tags Data Visualisation Flexdashboard R Programming RMarkdown In this article, you learn how to make Automated Dashboard for Credit Modelling with Decision trees and Random forests in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script … Read moreAutomated Dashboard for Credit Modelling with Decision trees and Random forests in R

Lecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)

These are slides from a lecture I gave at the School of Applied Sciences in Münster. In this lecture, I talked about Real-World Data Science and showed examples on Fraud Detection, Customer Churn & Predictive Maintenance. The slides were created with xaringan. Related To leave a comment for the author, please follow the link and … Read moreLecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)

When Automation Bites Back

In the light of these examples of clumsy and dishonest automation, what concerns me is that many engineers, data scientists, designers and decision-makers bring these frictions into people’s everyday life because they do not employ approaches to foresee the limits and implications of their work. Apart from the engineering of efficient solutions, automation requires professionals … Read moreWhen Automation Bites Back

An Overview of Categorical Input Handling for Neural Networks

A quick guide to summarize many approaches for handling categorical data (both low and high cardinality) when preprocessing data for neural network based predictors In the context of a coding exercise in 2018, I was asked to write a sklearn pipeline and a tensorflow estimator for a dataset that describes employees and their wages. The … Read moreAn Overview of Categorical Input Handling for Neural Networks

Use foreach with HPC schedulers thanks to the future package

The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It’s ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you’d rather not have locking up your interactive R session. You can … Read moreUse foreach with HPC schedulers thanks to the future package

An intuitive guide to Gaussian processes

What is machine learning? Machine learning is linear regression on steroids. Machine learning is using data we have (known as training data) to learn a function that we can use to make predictions about data we don’t have yet. The simplest example of this is linear regression, where we learn the slope and intercept of … Read moreAn intuitive guide to Gaussian processes

Curse of Dimensionality

In Machine Learning, we often have high-dimensional data. If we’re recording 60 different metrics for each of our shoppers, we’re working in a space with 60 dimensions. If we’re analyzing grayscale images sized 50×50, we’re working in a space with 2,500 dimensions. If the images are RGB-colored, the dimensionality increases to 7,500 dimensions (one dimension … Read moreCurse of Dimensionality

Detecting Credit Card Fraud Using Machine Learning

Catching Bad Guys with Data Science This article describes my machine learning project on credit card fraud. If you are interested in the code, you can find my notebook here. Source: https://giphy.com/gifs/glitch-money-shopping-d3mmdNnW5hkoUxTG Introduction Ever since starting my journey into data science, I have been thinking about ways to use data science for good while generating value … Read moreDetecting Credit Card Fraud Using Machine Learning

Classification of Signature and Text images using CNN and Deploying the model on Google Cloud ML…

II. Training and Deploying the model on Google Cloud ML Engine Cloud ML Engine helps to train your machine learning models at scale, to host the trained model in the cloud, and to use the model to make predictions about new data. Data The data has been prepared by taking the signature images and text images … Read moreClassification of Signature and Text images using CNN and Deploying the model on Google Cloud ML…

Markov Ventures — Generating Venture Firms Using Markov Chains

An exploration of generating venture capital firm names using Markov chains Venture capitalists are not very creative with naming, so I decided to try using a Markov chain to generate some names. (I’m a helper.) First, I tried the fairly vanilla Markov chain described in Towards Data Science. Basically, I take my data set of investor … Read moreMarkov Ventures — Generating Venture Firms Using Markov Chains

Supervised Learning: Basics of Linear Regression

1. Introduction Regression analysis is a subfield of supervised machine learning. It aims to model the relationship between a certain number of features and a continuous target variable. In regression problems we try to come up with a quantitative answer, predicting the prices of a house or the number of seconds that someone will spend … Read moreSupervised Learning: Basics of Linear Regression

Exploratory Design in Data Visualization

Understanding and leveraging chart similarity This article is a collaboration between myself and Jason Forrest and if you haven’t, check out his article exploring your relationship with your audience so that they trust you enough to collaborate. One of the most challenging tasks for a data visualization designer is convincing their stakeholders that an unfamiliar … Read moreExploratory Design in Data Visualization

Master Python through building real-world applications (Part 6)

Scraping data from FIFA.com using BeautifulSoup Most people think data science is about cool machine learning algorithms and self-driving cars. Let me tell you something, it’s not. Almost 80% of the time you are searching and cleaning the data, and if successful, remaining 20% in those cool stuff you see upfront. “Find data and play … Read moreMaster Python through building real-world applications (Part 6)

Feature Selection using Genetic Algorithms in R

This is a post about feature selection using genetic algorithms in R, in which we will do a quick review about: What are genetic algorithms? GA in ML? What does a solution look like? GA process and its operators The fitness function Genetics Algorithms in R! Try it yourself Relating concepts Animation source: “Flexible Muscle-Based … Read moreFeature Selection using Genetic Algorithms in R

Transfer Learning in NLP for Tweet Stance Classification

Method 1: ULMFiT ULMFiT has been entirely implemented in v1 of the fastai library (see fastai.text on their GitHub repo). Version 1 of fastai is built on top of PyTorch v1, so having some knowledge of PyTorch objects is beneficial to get started. In this post, we cover some of the techniques that fastai has developed … Read moreTransfer Learning in NLP for Tweet Stance Classification

Using clusterlab to benchmark clustering algorithms

Clusterlab is a CRAN package (https://cran.r-project.org/web/packages/clusterlab/index.html) for the routine testing of clustering algorithms. It can simulate positive (data-sets with >1 clusters) and negative controls (data-sets with 1 cluster). Why test clustering algorithms? Because they often fail in identifying the true K in practice, published algorithms are not always well tested, and we need to know … Read moreUsing clusterlab to benchmark clustering algorithms

Analytics — 5 mistakes that companies make

Wasting perfectly good data Lots of wasted potential all around. Companies that are not data-driven are missing out on a lot of opportunities. Management is so busy with operations that they constantly overlook the value of their data. There should be a permanent watch for analytics potential within the company. Useful data is not hard to … Read moreAnalytics — 5 mistakes that companies make

Mango Solutions contributes to technology partners RStudio conference

As leading advanced analytics partner for RStudio, Mango Solutions are delighted to be contributing to the upcoming rstudio::conf programme with a workshop and a talk. Two of Mango’s senior consultants, Aimée Gott, Education Practice Lead and Mark Sellors, Head of Data Engineering will be sharing their R expertise with delegates. Aimée Gott will be delivering the Intermediate … Read moreMango Solutions contributes to technology partners RStudio conference

Neural Text Modelling with R package ruimtehol

Last week the R package ruimtehol was released on CRAN (https://github.com/bnosac/ruimtehol) allowing R users to easily build and apply neural embedding models on text data. It wraps the ‘StarSpace’ library “>https://github.com/facebookresearch/StarSpace allowing users to calculate word, sentence, article, document, webpage, link and entity ’embeddings’. By using the ’embeddings’, you can perform text based multi-label classification, … Read moreNeural Text Modelling with R package ruimtehol

Understanding the Magic of Neural Networks

Everything “neural” is (again) the latest craze in machine learning and artificial intelligence. Now what is the magic here? Let us dive directly into a (supposedly little silly) example: we have three protagonists in the fairy tail little red riding hood, the wolf, the grandmother and the woodcutter. They all have certain qualities and little … Read moreUnderstanding the Magic of Neural Networks

How Do People Feel About Saving Sea Turtles?

Like many of my peers, climate change and environmental sustainability is something I always knew and cared about, but failed to act on until I saw a particular viral video of a plastic straw being extracted from a sea turtle’s nostril. (It’s as horrifying as it sounds). Since, I’ve been forced to think more about … Read moreHow Do People Feel About Saving Sea Turtles?

Build A Trust Infrastructure Between Your Data Team and Your Audience

You have to want to earn it There are many ways to build trust with your audience, but the most important is that you want to earn it. It sounds minor, but the moment you decide that you want to earn your audience’s trust, you shift away from a reactive mindset and towards a collaborative one. … Read moreBuild A Trust Infrastructure Between Your Data Team and Your Audience

My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5

This is the next set of presentations on “Elements of Neural Networks and Deep Learning”.  In the 4th presentation I discuss and derive the generalized equations for a multi-unit, multi-layer Deep Learning network.  The 5th presentation derives the equations for a Deep Learning network when performing multi-class classification along with the derivations for cross-entropy loss. The corresponding … Read moreMy presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5

Can You Enhance That? — Image Restoration With 1 Training Image

Movies are awesome. But computer vision can make them even awesome-er (we’ll just make that’s a word)! Movies use computer vision for all kinds of things like motion capture, special effects, and Computer Generated Imaging (CGI). One of the most common and cliche uses is the good old “can you enhance that?” in action films. … Read moreCan You Enhance That? — Image Restoration With 1 Training Image

How to Get a Data Science Interview in 2019

Ken JeeBlockedUnblockFollowFollowing Jan 14 There is almost nothing you can do that will guarantee you an interview or a job in data science. On the other hand, there are many things that you can do that can increase your probability of getting noticed. I have created this step by step guide to help data science … Read moreHow to Get a Data Science Interview in 2019