Machine Learning: Dimensionality Reduction With Principal Component Analysis Explained

Photo by Markus Spiske on Unsplash Principal Component Analysis or PCA is used to reduce the number of features without the loss of too much information. The problem with having too many dimensions is that it makes it difficult to visualize the data and makes training models more computationally expensive. To give us an intuitive understanding … Read more

Deep (learning) like Jacques Cousteau – Part 2 – Scalars

(TL;DR: Scalars are single numbers.) I have so many scalars!Opie, the open source snake I’m sorry…that was lame…Me Lasttime,we covered some basic concepts regarding sets on our journey tounderstanding vectors and matrices. Let’s do this! What’s a scalar? A scalar is a single number! This seems very simple (and it is). But weneed to know … Read more

Categories R Tags ExcerptFavorite

Machine Learning Classification: The Success of Kickstarter Tech Projects

As of April 2019, over 400,000 projects have been launched on Kickstarter. With crowdfunding becoming an ever-increasingly popular method of raising capital, I thought it would be interesting to explore the data behind Kickstarter projects and also apply a machine learning model to predict whether or not a project will be successful based on its … Read more

Free online r course

Recently a young relative mentioned that the campus R course she hoped to attend was full. What online alternatives did she have? So, I decided to start one of my own! https://github.com/matloff/fasteR  Designed for complete beginners. I now have six lessons up on the site. I hope to add one new lesson per week. Related … Read more

Categories R Tags ExcerptFavorite

Why Use Weight of Evidence?

I had been asked why I spent so much effort on developing SAS macros and R functions to do monotonic binning for the WoE transformation, given the availability of other cutting-edge data mining algorithms that will automatically generate the prediction with whatever predictors fed in the model. Nonetheless, what really distinguishes a good modeler from … Read more

Categories R Tags ExcerptFavorite

Optimal Control: LQR

In this article, I am going to talk about optimal control. More specifically I am going to talk about the unbelievably awesome Linear Quadratic Regulator that is used quite often in the optimal control world and also address some of the similarities between optimal control and the recently hyped reinforcement learning. It is fascinating that … Read more

Trick Out Your Terminal in 10 Minutes or Less

The time you spend staring at your terminal doesn’t have to be painful. Practically without time or effort, you can transform your terminal from a frustrating white block of pain into a beautiful, fast, fun, easy-to-use seat of power. GIF via GIPHY The terminal is an incredible tool. It’s the magical box from which you … Read more

6 bits of advice for Data Scientists

Syndromes, Hypotheses, Fallacies, Lies, Awareness, and Probabilities Ask Questions!!! To err is human. And who is more human than us data scientists when measured by that metric. The important thing is to look at our mistakes. And learn from them. A data scientist needs to be critical and always on a lookout of something that … Read more

The Ethics of People Analytics and AI in the Workplace: Four Dimensions of Trust

http://www.joshbersin.com AI and People Analytics have taken off. As I’ve written about in the past, the workplace has become a highly instrumented place. Companies use surveys and feedback tools to get our opinions, new tools monitor emails and our network of communications (ONA), we capture data on travel, location, and mobility, and organizations now have … Read more

Data Science & NLP for an effective content strategy

Harnessing the power of Process excellence, Analytics & NLP to design the content strategy for a data science knowledge portal Background The demand for skill-sets in data science and AI has been exponentially increasing over the past few years. However the supply of skilled data scientists is not increasing at the same pace, thereby leading … Read more

Data Driven Growth with Python  — Part 1: Know Your Metrics

Data Driven Growth with Python Learn what and how to track with Python Introduction This series of articles were designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis and machine learning. I … Read more

The Road To Enlightenment

THE BRAIN is wider than the sky, For, put them side by side, The one the other will include, With ease, and you beside. The brain is deeper than the sea, For, hold them, blue to blue, The one the other will absorb, As sponges, buckets do. The brain is just the weight of God, For, lift … Read more

Fast food, causality and R packages, part 2

I am currently working on a package for the R programming language; its initial goal was to simplydistribute the data used in the Card and Krueger 1994 paper that you can readhere (PDF warning). However, I decided that Iwould add code to perform diff-in-diff. In my previous blog post I showedhow to set up the … Read more

Categories R Tags ExcerptFavorite

Reinforcement Learning for Real-World Robotics

Ideas from the literature on RL for real-world robot control source Robots — The Promise Robots are pervasive throughout modern industry. Unlike most science-fiction works of the previous century, humanoid robots are still not doing our dirty dishes and taking out the trash, nor are Schwarzenegger-looking terminators fighting on the battlefields (at least for now…). But, in almost … Read more

Understanding when Simple and Multiple Linear Regression give Different Results

Simple and multiple linear regression are often the first models used to investigate relationships in data. If you play around with them for long enough you’ll eventually realize they can give different results. Relationships that are significant when using simple linear regression may no longer be when using multiple linear regression and vice-versa, insignificant relationships … Read more

Five Methods to Debug your Neural Network

A lot of us trying to understand the Machine Learning Algorithms, but sometimes we have some time we faced bugging problem in our algorithm and that what we are going to find out how to debug your Neural Network. This article is short but well documented about the debugging process.So, I expect all of you … Read more

Artificial Curiosity

How can artificial intelligence become curious? Photo by Joseph Chan on Unsplash Many people get interested into Machine Learning (or Artificial Intelligence as a more general field of study) by some extraordinary plots presented in books, movies and tv series. It is indeed very fascinating to study and implement algorithms that can specialize and outperform humans in … Read more

Extreme Rare Event Classification using Autoencoders in Keras

In this post, we will learn how we can use a simple dense layers autoencoder to build a rare event classifier. The purpose of this post is to demonstrate the implementation of an Autoencoder for extreme rare-event classification. We will leave the exploration of different architecture and configuration of the Autoencoder on the user. Please … Read more

Backpropagation in simple terms

As I learned about neural networks (NN), I struggled to understand what backpropagation was doing and why it made sense. This post is a follow-up to the Neural Networks Demystified story. So, if you don’t know what is gradient descent, and what is forward propagation, I’d recommend checking it out. Why do we care about … Read more

Weekly Selection — May 3, 2019

The Fastest Way to Learn Data Science By Rebecca Vickery — 5 min read When I first started writing blogs about data science on medium I wrote a series of posts describing a complete roadmap for learning data science. I am largely self-taught in data science and over the last few years have, through trial and error, found … Read more

Comparing and Matching Column Values in Different Excel Files using Pandas

Pandas for column matching Often, we may want to compare column values in different Excel files against one another to search for matches and/or similarity. Using the Pandas library from Python, this is made an easy task. To demonstrate how this is possible, this tutorial will focus on a simple genetic example. No genetic knowledge … Read more

Converting D3.js to PDF to PowerPoint

A simple way to use D3.js visualizations in business reports After creating a beautiful visualization in D3.js I often get the same question from my colleagues in marketing: «Could you send me a PowerPoint of this?». I usually explain that this a programmed chart, similar to a website, and that it is unfortunately not possible to … Read more

One Step to Quickly Improve the Readability and Visual Appeal of ggplot Graphs

There’s something wonderful about a graph that communicates a point clearly. You know it when you see it. It’s the kind of graph that makes you pause and say ‘wow!’. There are all kinds of different graphs that fit this description, but they usually have a few things in common: Clarity: The message of the … Read more

Categories R Tags ExcerptFavorite

Speedup your CNN using Fast Dense Feature Extraction and PyTorch

What are patch based methods? and what is the problem? Patch based CNN’s usually applied on single patches of an image, where each patch is classified separately. This approach is often used when trying to execute the same CNN several times on neighboring, overlapping patches in an image. This includes tasks based feature extraction like camera … Read more

Using AI to Predict Rothko Paintings’ Auction Prices

Mark Rothko’s hovering rectangles of color suspended within monochromatic fields are among the most recognizable paintings produced in the 20th century. Working within a highly constricted format, he produced a surprisingly varied body of work coveted by collectors and museums around the world. His widespread popularity and secure position in the canon of art history, … Read more

Availability of Microsoft R Open 3.5.2 and 3.5.3

It’s taken a little bit longer than usual, but Microsoft R Open 3.5.2 (MRO) is now available for download for Windows and Linux. This update is based on R 3.5.2, and accordingly fixes a few minor bugs compared to MRO 3.5.1. The main change you will note is that new CRAN packages released since R … Read more

Categories R Tags ExcerptFavorite

If you like to travel, let Python help you scrape the best fares!

Well, every Selenium project starts with a webdriver. I’m using Chromedriver, but there are other alternatives. PhantomJS or Firefox are also popular. After downloading it, place it in a folder and that’s it. These first lines will open a blank Chrome tab. Please bear in mind I’m not breaking new ground here. There are way … Read more

Natural Language Processing — Event Extraction

Extracting events from news articles The amount of text generated every day is mind-blowing. Millions of data feeds are published in the form of news articles, blogs, messages, manuscripts and countless more, and the ability to automatically organize and handle it is becoming indispensable. With improvements in neural network algorithms, significant computer power increase and easy … Read more

Kaplan Meier curves

Kaplan-Meier curves are widely used in clinical and fundamental research, but there are some important pitfalls to keep in mind when making or interpreting them. In this short post, I’m going to give a basic overview of how data is represented on the Kaplan Meier plot. The Kaplan-Meier estimator is used to estimate the survival … Read more

Make your own Super Pandas using Multiproc

Parallelization is awesome. We data scientists have got laptops with quad-core, octa-core, turbo-boost. We work with servers with even more cores and computing power. But do we really utilize the raw power we have at hand? Instead, we wait for time taking processes to finish. Sometimes for hours, when urgent deliverables are at hand. Can … Read more

DeepPiCar — Part 3 Make PiCar See and Think

DeepPiCar Series Set up computer vision (OpenCV) and deep learning software (TensorFlow). Turn the PiCar into a DeepPiCar. Executive Summary Welcome back! If you have been following my previous two posts (Part 1 and Part 2) on DeepPiCar, you should have a running robotic car that can be controlled via Python. In the article, we … Read more

Deep (learning) like Jacque Cousteau – Part 1 – Sets

(TL;DR: I’m going to go deep into deep learning. Sets are collections of things.) I will be using a lot of LaTeX rendered with MathJax which doesn’t show up in the RSS feed. Please visit my site directly to see equations and all that goodness! Here I go, deep type flowJacques Cousteau could never get … Read more

Categories R Tags ExcerptFavorite

Clearing the Water Around A.I.

Big Data or Big Hype? Nearly everyone today has been experiencing some effects and new ideas about artificial intelligence. Most companies, banks, retail stores, etc, are focusing on ways aritificial inteligence will expand their market and lead them to more successful ventures. I’m sure you’ve all once had or heard a conversation with people with no … Read more

R for Data Science in a Day

Hi everyone, Do you want to know more about R, get hands-on experience and build your first Data Science project? Free up your schedule and save this date, 13th May! R for Data Science in a Day A 1-day long workshop during which you’ll learn the basics of R language, so you’ll feel confident and … Read more

Categories R Tags ExcerptFavorite

How Microsoft Azure Machine Learning Studio Clarifies Data Science

Simple to use, but serious data science knowledge still required Two great tastes that taste great together — Azure model construction + data science knowledge I’ve been dying to test drive one of the many recent tools on the market targeted at the “citizen data scientists” like DataRobot, H20 Driverless AI and Microsoft’s new product in the cloud … Read more

Zalando Dress Recomendation and Tagging

In Artificial Intelligence, Computer Vision techniques are massively applied. A nice field of application (one of my favourite) is fashion industry. The availability of resources in term of raw images allows to develop interesting use cases. Zalando knows this (I suggest to take a look at their GitHub repository) and frequently develops amazing AI solutions, … Read more

Yet Another Full Stack Data Science Project — A CRISP-DM Implementation

Data Preparation The data preparation phase covers all activities to construct the final dataset from the initial raw data. Data preparation is 80% of the process. Data wrangling and Data Analysis are the core activities in the Data Preparation phase of the CRISP-DM model and are the first logical programming steps. Data Wrangling is a … Read more

Under Pi : gganimate test around quadrature of the circle

An updated look on squaring the circle using gganimate and R code. It gives a geometric and visual construction, a good and practical representation of what Pi is. As n becomes larger, segments become smaller and smaller, Pi can then be seen as perfection and we almost intuit infinity. require(sp) require(rgeos) library(ggplot2) library(dplyr, warn.conflicts = … Read more

Categories R Tags ExcerptFavorite

Are we Asking too Much of Algorithms?

Over the past week Google has been under fire as a former police chief accused the internet giant of pushing extremist content, with a search for “British Muslim spokesperson” returning content from a jailed radical cleric as the top search result. Last month Facebook was criticised for not being able to guarantee that the recent … Read more

Queensland road accidents mapped with Shiny and leaflet in R

The Queensland government collects data on road accidents dating back to 1st January 2001 and details characteristics of the incident including, Location of the crash (lat / long coordinates) ABS statistical area codes (SA2-4, LGA, remoteness) Atmospheric and road conditions (weather, lighting, sealed / unsealed roads, speed limit zone, etc) Severity of the incident (minor … Read more

Categories R Tags ExcerptFavorite

[R]eady for production: The Data Science Event 2019 with eoda and RStudio

eoda and RStudio invite the German speaking R-community to the Data Science Event 2019 in Frankfurt on June 13th – the event for the productive use of R. Learn how you can seamlessly implement your analysis solutions with the optimal IT infrastructure into your business processes. Discover best practice approaches in productive data science architectures, … Read more

Categories R Tags ExcerptFavorite

Achieving a top 5% position in an ML competition with AutoML

AutoML pipelines are a hot topic. The general goal is simple: enable everyone to train high-quality models specific to their business needs by relying on state-of-the-art machine learning models, hyper-tuning techniques and large volumes of compute. In this blog post I will be applying Microsoft’s AutoML pipeline to a public ML competition, and by dissecting … Read more

Advanced candlesticks for machine learning (ii): volume and dollar bars

In this article we will learn how to build volume and dollar bars and we will explore what advantages they offer in respect to traditional time-based candlesticks and tick-bars. Finally, we will analyze two of their statistical properties — autocorrelation and normality of returns — in a large dataset of 16 cryptocurrency trading pairs Introduction In a previous post we … Read more