Regression: Kernel and Nearest Neighbor Approach

Nadaraya-Watson Kernel-Weighted Average Regression In the above method, one of the major drawbacks was the equal assignment of weights. This method assigns weights to each point in a window of query point based on a specific Kernel. Main intuition is that weights should decrease with increase in distance and more weights should be assigned for … Read moreRegression: Kernel and Nearest Neighbor Approach

Neural-Symbolic VQN — Disentagled Reasoning — Or — The answer: disentanglement

An explanation of an interpretable deep learning system. Nearly every technological step forward starts with an example from science fiction. So before I am going to explain what these scientists built, I want you to watch a part of an episode of the classic TV series Star Trek — Next Generation, The Identity Crisis. Play the embedded video from … Read moreNeural-Symbolic VQN — Disentagled Reasoning — Or — The answer: disentanglement

Customers who bought…

One of the classic examples in data science (called data mining at the time) is the beer and diapers example: when a big supermarket chain started analyzing their sales data they encountered not only trivial patterns, like toothbrushes and toothpaste being bought together, but also quite strange combinations like beer and diapers. Now, the trivial … Read moreCustomers who bought…

Using Rstudio Jobs for training many models in parallel

Recently, Rstudio added the Jobs feature, which allows you to run R scripts in the background. Computations are done in a separate R session that is not interactive, but just runs the script. In the meantime your regular R session stays live so you can do other work while waiting for the Job to complete. … Read moreUsing Rstudio Jobs for training many models in parallel

Data Analysis of 10.000 AI Startups

Extracting insights from AngelList companies Introduction AngelList is a place that connects startups to investors and job candidates looking to work at startups. Their goal is to democratize the investment process, helping startups with both fundraising and talent. Be it to find a job, investors for a startup, or even if just to make connections, … Read moreData Analysis of 10.000 AI Startups

Making thematic maps for Belgium

Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from … Read moreMaking thematic maps for Belgium

Video Games as a Perfect Playground for Artificial Intelligence

Why video games are so good for machine learning? One of the first reasons behind the popularity of video games among AI researchers is the tendency of video games to mimic real life in many ways. This idea is not very straightforward when it comes to older games, as they have arcade-style graphics and physics. … Read moreVideo Games as a Perfect Playground for Artificial Intelligence

Simple guide for ensemble learning methods

What, why, how and Bagging-Boosting demystified, explained rather unconventionally, read on:) JuhiBlockedUnblockFollowFollowing Feb 25 Before this post, I have published a “Holy grail for Bias variance trade-off, Overfitting and Underfitting”. This comprehensive article serves as an important prequel to this post if you are a newbie or would just like to brush up the concepts of … Read moreSimple guide for ensemble learning methods

RcppStreams 0.1.3: Keeping CRAN happy

Not unlike the Rblpapi release on Thursday and the RVowpalWabbit release on Friday (both of which dealt with the upcoming staged install), we now have another CRAN-requested maintenance release. This time it is RcppStreams which got onto CRAN as of early this morning. RcppStreams brings the excellent Streamulus C++ template library for event stream processing … Read moreRcppStreams 0.1.3: Keeping CRAN happy

Gartner’s 2019 Take on Data Science Software

I’ve just updated The Popularity of Data Science Software to reflect my take on Gartner’s 2019 report, Magic Quadrant for Data Science and Machine Learning Platforms. To save you the trouble of digging through all 40+ pages of my report, here’s just the updated section: IT Research Firms IT research firms study software products and … Read moreGartner’s 2019 Take on Data Science Software

A example in causal inference designed to frustrate: an estimate pretty much guaranteed to be biased

I am putting together a brief lecture introducing causal inference for graduate students studying biostatistics. As part of this lecture, I thought it would be helpful to spend a little time describing directed acyclic graphs (DAGs), since they are an extremely helpful tool for communicating assumptions about the causal relationships underlying a researcher’s data. The … Read moreA example in causal inference designed to frustrate: an estimate pretty much guaranteed to be biased

stats19: a package for road safety research

Introduction stats19 is a new R package enabling access to and working withGreat Britain’s official road traffic casualty database,STATS19. We started the package in late 2018 following three main motivations: The release of the 2017 road crash statistics, which showedworsening road safety in some areas, increasing the importance ofmaking the data more accessible. The realisation … Read morestats19: a package for road safety research

Logistic regression in R using blorr package

We are pleased to introduce the blorr package, a set of tools for building andvalidating binary logistic regression models in R, designed keeping in mindbeginner/intermediate R users. The package includes: comprehensive regression output variable selection procedures bivariate analysis, model fit statistics and model validation tools various plots and underlying data If you know how to … Read moreLogistic regression in R using blorr package

Bolsonaro’s First Job Approval Ratings

President Jair Bolsonaro’s job approval ratings average 39.5% during his first quarter in office so far (from January through late February). Compared to the former presidents, for which I’ve estimates, his quarterly job approval ratings are above the overall average for the inauguration term (31%). However, his ratings trail quarterly averages of the Workers’ Party … Read moreBolsonaro’s First Job Approval Ratings

Why Doing Good Science is Hard and How to Do it Better

Photo by Steve Johnson on Unsplash Doing good science is hard and a lot of experiments fail. Although the scientific method helps to reduce uncertainty and lead to discoveries, its path is full of potholes. In this post, you’ll learn about common p-value misinterpretations, p-hacking, and the problem with performing multiple hypothesis tests. Of course, not … Read moreWhy Doing Good Science is Hard and How to Do it Better

Rating Sports Teams — Elo vs. Win-Loss

Photo by Ariel Besagar on Unsplash Which is better? Introduction There are many ways to determine who is the best team or player in any sport. You can look at the last 5 games. The last 10 games. You can use score differential. You can rate them on which teams “feel” the best. You can look at … Read moreRating Sports Teams — Elo vs. Win-Loss

Understand how your TensorFlow Model is Making Predictions

Introduction Machine learning can answer questions more quickly and accurately than ever before. As machine learning is used in more mission-critical applications, it is increasingly important to understand how these predictions are derived. In this blog post, we’ll build a neural network model using the Keras API from TensorFlow, an open-source machine learning framework. One … Read moreUnderstand how your TensorFlow Model is Making Predictions

Build Your First Open Source Python Project

A step-by-step guide to a working package Every software developer and data scientist should go through the exercise of making a package. You’ll learn so much along the way. Making an open source Python package may sound daunting, but you don’t need to be a grizzled veteran. You also don’t need an elaborate product idea. You … Read moreBuild Your First Open Source Python Project

Remote Sensing Basics: Normalized Difference Vegetation Index

Applications of Satellite Imagery for Ecology Research NDVI visualization of continental US If you haven’t already, please check out my previous post that summarizes my capstone project from General Assembly’s Data Science Immersive course: Land-use and Deforestation in the Brazilian Amazon. It is a good introduction to some of my interests in machine learning, remote sensing, … Read moreRemote Sensing Basics: Normalized Difference Vegetation Index

Four Dataviz Posters

I was asked for some examples of posters I’ve made using R and ggplot. Here are four. Some of these are done from start to finish in R, others involved some post-processing in Illustrator, usually to adjust some typographical elements or add text in a sidebar. I’ve linked to a PDF of each one, along … Read moreFour Dataviz Posters

Get started with Apache Spark and TensorFlow on Azure Databricks

TensorFlow is now available on Apache Spark framework, but how do you get started? It called TensorFrame TL;DR This is a step by step tutorial on how to get new Spark TensorFrame library running on Azure Databricks. Big Data is a huge topic that consists of many domains and expertise. All the way from DevOps, … Read moreGet started with Apache Spark and TensorFlow on Azure Databricks

Reshama Shaikh discusses women in machine learning and data science.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Reshama Shaikh, organizer of the meetup groups Women in Machine Learning & Data Science (otherwise known as WiMLDS) and PyLadies. Here is the podcast link. Hugo: Hi there, Reshama, and welcome to DataFramed. Reshama: Hello, Hugo. Thank you for inviting me. Hugo: It’s such … Read moreReshama Shaikh discusses women in machine learning and data science.

An Exercise on Basic R: How’s Kickstarter Doing These Days?

Basic Data Manipulation and Visualization with tidyverse and ggplot2, published with mediumR It’s a practice story! I didn’t realize the tables/tibbles would be poorly shaped after importing from R directly to Medium. If anyone has ever faced this, please drop a link for me to refer to! I got my hands on 2018 January Kickstarter data-set from … Read moreAn Exercise on Basic R: How’s Kickstarter Doing These Days?

State of Data Science & Machine Learning

Data Scientist Arsenal Data science and Machine Learning technology landscape are ever expanding. It is not humanly possible to be expert in all the available frameworks, platforms and methodologies. The survey has captured Programming Languages, Frameworks, Tools & Platforms that are used and suggested by the participants. Ignoring the edge cases, this should give a … Read moreState of Data Science & Machine Learning

MLB run scoring trends: Shiny app update

The new Major League Baseball season will soon begin, which means it’s time to look back and update my run scoring trends data visualization application, built using RStudio’s shiny package.You can find the app here: https://monkmanmh.shinyapps.io/MLBrunscoring_shiny/The github repo for this app is https://github.com/MonkmanMH/MLBrunscoring_shinyThis update gave me the opportunity to make some cosmetic tweaks to the … Read moreMLB run scoring trends: Shiny app update

From archaeology to data science: the joy of iterative career paths

Discovering my love of all things data At school I hadn’t planned on doing anything particularly technical as a career. I took a maths A-level largely as a refreshing break from the essay-writing of history and English lit, and the time-consuming creativity of fine art. I went to Cambridge to study Archaeology and Anthropology (another iterative … Read moreFrom archaeology to data science: the joy of iterative career paths

Why and how global brands like Facebook and Danone invest in market research

Last week, I received the following email from Facebook: “Hi Joei, Facebook is seeking candid feedback from individuals who create online videos on Facebook, Instagram and other platforms. Please help us do that by taking a simple survey here…” Thanks, The Facebook Research Team” This made me think: if Facebook, one of the world’s fastest-growing … Read moreWhy and how global brands like Facebook and Danone invest in market research

AI Gets Creative Thanks To GANs Innovations

For an Artificial Intelligence (AI) professional, or data scientist, the barrage of AI-marketing can evoke very different feelings than for a general audience. For one thing, the AI industry is incredibly broad and has many different forms and functions, so industry professionals tend to focus more deeply on which branches of AI are being hyped … Read moreAI Gets Creative Thanks To GANs Innovations

Python, Oracle ADWC and Machine Learning

How to use Open Source tools to analyze data managed through Oracle Autonomous Data Warehouse Cloud (ADWC). Introduction Oracle Autonomous Database is the latest, modern evolution of Oracle Database technology. A technology to help managing and analyzing large volumes of data in the Cloud easier, faster and more powerful. ADWC is the specialization of this technology … Read morePython, Oracle ADWC and Machine Learning

A beginner’s guide to Linear Regression in Python with Scikit-Learn

Simple Linear Regression Linear Regression While exploring the Aerial Bombing Operations of World War Two dataset and recalling that the D-Day landings were nearly postponed due to poor weather, I downloaded these weather reports from the period to compare with missions in the bombing operations dataset. You can download the dataset from here. The dataset … Read moreA beginner’s guide to Linear Regression in Python with Scikit-Learn

A Tale of Two (Small Belgian) Cities with Open Data: Official Crime Statistics and Self-Reported Feelings of Safety in Leuven and Vilvoorde

In this post, we will analyze government data from the Flemish region in Belgium on A) official crime statistics and B) self-reported feelings of safety among residents of Flanders. We will focus our analysis on two cities in the province of Flemish Brabant: Leuven and Vilvoorde. A key question of this analysis is: do the … Read moreA Tale of Two (Small Belgian) Cities with Open Data: Official Crime Statistics and Self-Reported Feelings of Safety in Leuven and Vilvoorde

Deploy ML/DL Models to Production via Panini

What is Panini? Panini is a platform that serves ML/DL models at low latency and makes the ML model deployment to production from a few days to a few minutes. Once deployed in Panini’s server, it will provide you with an API key to infer the model. Panini query engine is developed in C++, which provides … Read moreDeploy ML/DL Models to Production via Panini

Bayesian Optimization for Hyper-Parameter

In past several weeks, I spent a tremendous amount of time on reading literature about automatic parameter tuning in the context of Machine Learning (ML), most of which can be classified into two major categories, e.g. search and optimization. Searching mechanisms, such as grid search, random search, and Sobol sequence, can be somewhat computationally expensive. … Read moreBayesian Optimization for Hyper-Parameter

‚Arrest this man, he talks in maths‘ – Animating ten years of listening history on Last.FM

Previously, when Rcrastinate was still on blogspot.com, I had a first look at ten years of my playback history on Last.FM. But there is still a lot one can do with this dataset. I wanted to try {gganimate} for a long time and this nice longitudinal dataset gives me the opportunity. First, I am loading … Read more‚Arrest this man, he talks in maths‘ – Animating ten years of listening history on Last.FM

January 2019: “Top 40” New CRAN Packages

One hundred and fifty-three new packages made it to CRAN in January. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods cPCG v1.0: Provides a function to solve systems of linear equations using a (preconditioned) conjugate gradient algorithm. The vignette shows how … Read moreJanuary 2019: “Top 40” New CRAN Packages

Le Monde puzzle [#1087]

A board-like Le Monde mathematical puzzle in the digit category: Given a (k,m) binary matrix, what is the maximum number S of entries with only one neighbour equal to one? Solve for k=m=2,…,13, and k=6,m=8. For instance, for k=m=2, the matrix is producing the maximal number 4. I first attempted a brute force random filling … Read moreLe Monde puzzle [#1087]

Deep Active Noise Cancellation

RNN predicts a structured noise to suppress it in a complex acoustic environment Flickr, CC BY-NC 2.0 In my previous post I told about my Active Noise Cancellation system based on neural network. Here I outline my experiments with sound prediction with recursive neural networks I made to improve my denoiser. The noise sound prediction … Read moreDeep Active Noise Cancellation

Cryptocurrency Analysis with Python — MACD

I’ve decided to spend the weekend learning about cryptocurrency analysis. I’ve hacked together the code to download daily Bitcoin prices and apply a simple trading strategy to it. Note that there already exists tools for performing this kind of analysis, eg. tradeview, but this way enables more in-depth analysis. Disclaimer I am not a trader … Read moreCryptocurrency Analysis with Python — MACD

Tips & Tricks in Multiple Linear Regression

Gathered methods to analyse data, diagnose models and visualize results This analysis was a project which I decided to undertake for the Regression Analysis module in school. I have learnt and gathered several methods you can use in R to take your depth of analysis further. As usual, I always learn the most discovering on … Read moreTips & Tricks in Multiple Linear Regression

Convolutional Neural Network

Learn Convolutional Neural Network from basic and its implementation in Keras Table of contents What is CNN ? Why should we use CNN ? Few Definitions Layers in CNN Keras Implementation 1. What is CNN ? Computer vision is evolving rapidly day-by-day. Its one of the reason is deep learning. When we talk about computer vision, a term convolutional neural … Read moreConvolutional Neural Network

Sentiment Analysis with Deep Learning

Recognize and Classify Human Emotions in Netflix Reviews In this article, I will cover the topic of Sentiment Analysis and how to implement a Deep Learning model that can recognize and classify human emotions in Netflix reviews. One of the most important elements for businesses is being in touch with its customer base. It is vital … Read moreSentiment Analysis with Deep Learning

Data Pre-processing with Pandas on Trending YouTuBe Video Statistics 〠 ❤︎ ✔︎

The purpose of this article is to provide a standardized data pre-processing solution that could be applied to any types of datasets. You will learn how to convert data from initial raw form to another format, in order to prepare the data for exploratory analysis and machine learning models. Overview of the data This dataset is … Read moreData Pre-processing with Pandas on Trending YouTuBe Video Statistics 〠 ❤︎ ✔︎