Simple guide for ensemble learning methods

What, why, how and Bagging-Boosting demystified, explained rather unconventionally, read on:) JuhiBlockedUnblockFollowFollowing Feb 25 Before this post, I have published a “Holy grail for Bias variance trade-off, Overfitting and Underfitting”. This comprehensive article serves as an important prequel to this post if you are a newbie or would just like to brush up the concepts of … Read more

RcppStreams 0.1.3: Keeping CRAN happy

Not unlike the Rblpapi release on Thursday and the RVowpalWabbit release on Friday (both of which dealt with the upcoming staged install), we now have another CRAN-requested maintenance release. This time it is RcppStreams which got onto CRAN as of early this morning. RcppStreams brings the excellent Streamulus C++ template library for event stream processing … Read more

Categories R Tags ExcerptFavorite

Gartner’s 2019 Take on Data Science Software

I’ve just updated The Popularity of Data Science Software to reflect my take on Gartner’s 2019 report, Magic Quadrant for Data Science and Machine Learning Platforms. To save you the trouble of digging through all 40+ pages of my report, here’s just the updated section: IT Research Firms IT research firms study software products and … Read more

Categories R Tags ExcerptFavorite

A example in causal inference designed to frustrate: an estimate pretty much guaranteed to be biased

I am putting together a brief lecture introducing causal inference for graduate students studying biostatistics. As part of this lecture, I thought it would be helpful to spend a little time describing directed acyclic graphs (DAGs), since they are an extremely helpful tool for communicating assumptions about the causal relationships underlying a researcher’s data. The … Read more

Categories R Tags ExcerptFavorite

stats19: a package for road safety research

Introduction stats19 is a new R package enabling access to and working withGreat Britain’s official road traffic casualty database,STATS19. We started the package in late 2018 following three main motivations: The release of the 2017 road crash statistics, which showedworsening road safety in some areas, increasing the importance ofmaking the data more accessible. The realisation … Read more

Categories R Tags ExcerptFavorite

Logistic regression in R using blorr package

We are pleased to introduce the blorr package, a set of tools for building andvalidating binary logistic regression models in R, designed keeping in mindbeginner/intermediate R users. The package includes: comprehensive regression output variable selection procedures bivariate analysis, model fit statistics and model validation tools various plots and underlying data If you know how to … Read more

Categories R Tags ExcerptFavorite

Bolsonaro’s First Job Approval Ratings

President Jair Bolsonaro’s job approval ratings average 39.5% during his first quarter in office so far (from January through late February). Compared to the former presidents, for which I’ve estimates, his quarterly job approval ratings are above the overall average for the inauguration term (31%). However, his ratings trail quarterly averages of the Workers’ Party … Read more

Categories R Tags ExcerptFavorite

Why Doing Good Science is Hard and How to Do it Better

Photo by Steve Johnson on Unsplash Doing good science is hard and a lot of experiments fail. Although the scientific method helps to reduce uncertainty and lead to discoveries, its path is full of potholes. In this post, you’ll learn about common p-value misinterpretations, p-hacking, and the problem with performing multiple hypothesis tests. Of course, not … Read more

Understand how your TensorFlow Model is Making Predictions

Introduction Machine learning can answer questions more quickly and accurately than ever before. As machine learning is used in more mission-critical applications, it is increasingly important to understand how these predictions are derived. In this blog post, we’ll build a neural network model using the Keras API from TensorFlow, an open-source machine learning framework. One … Read more

Build Your First Open Source Python Project

A step-by-step guide to a working package Every software developer and data scientist should go through the exercise of making a package. You’ll learn so much along the way. Making an open source Python package may sound daunting, but you don’t need to be a grizzled veteran. You also don’t need an elaborate product idea. You … Read more

Remote Sensing Basics: Normalized Difference Vegetation Index

Applications of Satellite Imagery for Ecology Research NDVI visualization of continental US If you haven’t already, please check out my previous post that summarizes my capstone project from General Assembly’s Data Science Immersive course: Land-use and Deforestation in the Brazilian Amazon. It is a good introduction to some of my interests in machine learning, remote sensing, … Read more

Four Dataviz Posters

I was asked for some examples of posters I’ve made using R and ggplot. Here are four. Some of these are done from start to finish in R, others involved some post-processing in Illustrator, usually to adjust some typographical elements or add text in a sidebar. I’ve linked to a PDF of each one, along … Read more

Categories R Tags ExcerptFavorite

Reshama Shaikh discusses women in machine learning and data science.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Reshama Shaikh, organizer of the meetup groups Women in Machine Learning & Data Science (otherwise known as WiMLDS) and PyLadies. Here is the podcast link. Hugo: Hi there, Reshama, and welcome to DataFramed. Reshama: Hello, Hugo. Thank you for inviting me. Hugo: It’s such … Read more

Categories R Tags ExcerptFavorite

R Journal Volume 10/2, December 2018 is out!

We forgot to say: R Journal Volume 10/2, December 2018 is out! A huge thanks to the editors who work very hard to make this possible. And big “thank you” to the editors, referees, and journal for helping improve, and for including our note on pipes in R. Related To leave a comment for the … Read more

Categories R Tags ExcerptFavorite

An Exercise on Basic R: How’s Kickstarter Doing These Days?

Basic Data Manipulation and Visualization with tidyverse and ggplot2, published with mediumR It’s a practice story! I didn’t realize the tables/tibbles would be poorly shaped after importing from R directly to Medium. If anyone has ever faced this, please drop a link for me to refer to! I got my hands on 2018 January Kickstarter data-set from … Read more

State of Data Science & Machine Learning

Data Scientist Arsenal Data science and Machine Learning technology landscape are ever expanding. It is not humanly possible to be expert in all the available frameworks, platforms and methodologies. The survey has captured Programming Languages, Frameworks, Tools & Platforms that are used and suggested by the participants. Ignoring the edge cases, this should give a … Read more

Introducing Rank Data Analysis with Arkham Horror Data

Last week I analyzed player rankings of the Arkham Horror LCG classes. This week I explain what I did in the data analysis. As I mentioned, this is the first time that I attempted inference with rank data, and I discovered how rich the subject is. A lot of the tools for the analysis I … Read more

Categories R Tags ExcerptFavorite

MLB run scoring trends: Shiny app update

The new Major League Baseball season will soon begin, which means it’s time to look back and update my run scoring trends data visualization application, built using RStudio’s shiny package.You can find the app here: github repo for this app is update gave me the opportunity to make some cosmetic tweaks to the … Read more

Categories R Tags ExcerptFavorite

From archaeology to data science: the joy of iterative career paths

Discovering my love of all things data At school I hadn’t planned on doing anything particularly technical as a career. I took a maths A-level largely as a refreshing break from the essay-writing of history and English lit, and the time-consuming creativity of fine art. I went to Cambridge to study Archaeology and Anthropology (another iterative … Read more

Why and how global brands like Facebook and Danone invest in market research

Last week, I received the following email from Facebook: “Hi Joei, Facebook is seeking candid feedback from individuals who create online videos on Facebook, Instagram and other platforms. Please help us do that by taking a simple survey here…” Thanks, The Facebook Research Team” This made me think: if Facebook, one of the world’s fastest-growing … Read more

My Journey From Commerce To Data Science

Image from Pexels Even though I was sure that I won’t enjoy a job as an accountant or a commerce lecturer, I completed my Master’s degree in Commerce in 2016. After completing my studies, I joined for a one-year fellowship (TIFP) in KSUM, a government agency of our state. I had no idea what career I … Read more

AI Gets Creative Thanks To GANs Innovations

For an Artificial Intelligence (AI) professional, or data scientist, the barrage of AI-marketing can evoke very different feelings than for a general audience. For one thing, the AI industry is incredibly broad and has many different forms and functions, so industry professionals tend to focus more deeply on which branches of AI are being hyped … Read more

Python, Oracle ADWC and Machine Learning

How to use Open Source tools to analyze data managed through Oracle Autonomous Data Warehouse Cloud (ADWC). Introduction Oracle Autonomous Database is the latest, modern evolution of Oracle Database technology. A technology to help managing and analyzing large volumes of data in the Cloud easier, faster and more powerful. ADWC is the specialization of this technology … Read more

A beginner’s guide to Linear Regression in Python with Scikit-Learn

Simple Linear Regression Linear Regression While exploring the Aerial Bombing Operations of World War Two dataset and recalling that the D-Day landings were nearly postponed due to poor weather, I downloaded these weather reports from the period to compare with missions in the bombing operations dataset. You can download the dataset from here. The dataset … Read more

A Tale of Two (Small Belgian) Cities with Open Data: Official Crime Statistics and Self-Reported Feelings of Safety in Leuven and Vilvoorde

In this post, we will analyze government data from the Flemish region in Belgium on A) official crime statistics and B) self-reported feelings of safety among residents of Flanders. We will focus our analysis on two cities in the province of Flemish Brabant: Leuven and Vilvoorde. A key question of this analysis is: do the … Read more

Categories R Tags ExcerptFavorite

Bayesian Optimization for Hyper-Parameter

In past several weeks, I spent a tremendous amount of time on reading literature about automatic parameter tuning in the context of Machine Learning (ML), most of which can be classified into two major categories, e.g. search and optimization. Searching mechanisms, such as grid search, random search, and Sobol sequence, can be somewhat computationally expensive. … Read more

Categories R Tags ExcerptFavorite

Titanic: Love in Data Analytics

Let’s delve right into the code. Note that there are many ways to execute Python code. For simplicity and ease of use, I prefer Python’s Jupyter notebook. Data Preparation The first task is to load the training dataset from the file. Python offers a great and easy way to load and manipulate data sets using … Read more

January 2019: “Top 40” New CRAN Packages

One hundred and fifty-three new packages made it to CRAN in January. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods cPCG v1.0: Provides a function to solve systems of linear equations using a (preconditioned) conjugate gradient algorithm. The vignette shows how … Read more

Categories R Tags ExcerptFavorite

Le Monde puzzle [#1087]

A board-like Le Monde mathematical puzzle in the digit category: Given a (k,m) binary matrix, what is the maximum number S of entries with only one neighbour equal to one? Solve for k=m=2,…,13, and k=6,m=8. For instance, for k=m=2, the matrix is producing the maximal number 4. I first attempted a brute force random filling … Read more

Categories R Tags ExcerptFavorite

Deep Active Noise Cancellation

RNN predicts a structured noise to suppress it in a complex acoustic environment Flickr, CC BY-NC 2.0 In my previous post I told about my Active Noise Cancellation system based on neural network. Here I outline my experiments with sound prediction with recursive neural networks I made to improve my denoiser. The noise sound prediction … Read more

Brief introduction to Markov chains

Markov Chains properties In this section, we will only give some basic Markov chains properties or characterisations. The idea is not to go deeply into mathematical details but more to give an overview of what are the points of interest that need to be studied when using Markov chains. As we have seen that in … Read more

AI and the “Useless” Class

Human robots will take your job before AI. The human robot is you, and you will help AI steal your job tomorrow. Will you become “useless”? Photo by Alex Iby on Unsplash Will artificial intelligence (AI) produce a “useless” class of people that have no value to offer society? This is a possibility that Yuval Noah … Read more

AI & Architecture

B. Layout Assistant Layout Assistant, a Step by Step Pipeline | Source: Author In this section, we offer a multi-step pipeline, integrating all the necessary steps to draw a floor plan. Jumping across scales, it emulates the process taken by an architect and tries to encapsulate each step into one specific model, trained to perform a … Read more

Cryptocurrency Analysis with Python — MACD

I’ve decided to spend the weekend learning about cryptocurrency analysis. I’ve hacked together the code to download daily Bitcoin prices and apply a simple trading strategy to it. Note that there already exists tools for performing this kind of analysis, eg. tradeview, but this way enables more in-depth analysis. Disclaimer I am not a trader … Read more

Tips & Tricks in Multiple Linear Regression

Gathered methods to analyse data, diagnose models and visualize results This analysis was a project which I decided to undertake for the Regression Analysis module in school. I have learnt and gathered several methods you can use in R to take your depth of analysis further. As usual, I always learn the most discovering on … Read more

Convolutional Neural Network

Learn Convolutional Neural Network from basic and its implementation in Keras Table of contents What is CNN ? Why should we use CNN ? Few Definitions Layers in CNN Keras Implementation 1. What is CNN ? Computer vision is evolving rapidly day-by-day. Its one of the reason is deep learning. When we talk about computer vision, a term convolutional neural … Read more

Sentiment Analysis with Deep Learning

Recognize and Classify Human Emotions in Netflix Reviews In this article, I will cover the topic of Sentiment Analysis and how to implement a Deep Learning model that can recognize and classify human emotions in Netflix reviews. One of the most important elements for businesses is being in touch with its customer base. It is vital … Read more

Data Pre-processing with Pandas on Trending YouTuBe Video Statistics 〠 ❤︎ ✔︎

The purpose of this article is to provide a standardized data pre-processing solution that could be applied to any types of datasets. You will learn how to convert data from initial raw form to another format, in order to prepare the data for exploratory analysis and machine learning models. Overview of the data This dataset is … Read more

Data Science for Fitness: 50 is the new 30 — Part I

The following article will try to explain a very interesting experience for me, that along with the algorithmic music composition algos (sans neural nets) I developed in 2013–2014 is one of the most rewarding projects I have undertaken: Data Science for Fitness. In these series of practical applications of Data Science (aren’t you tired of … Read more

Solving the Mystery of Backpropagation

The most clever thing about backpropagation seems to be the method used to calculate the partial derivatives of the cost function with respect to each weight and bias in the network. This paves the way to ponder even how this elegant algorithm was found for the first time. But if you carefully look at the … Read more

Job Satisfaction and success in the How How to succeed in the coding world

A data driven approach: Success Mantras Using Stack overflow Survey Data from 2017 Introduction Success may have different meaning for different people. So, what does it take to become a successful developer? The short answer would be it varies from person to person. Success in this articles’ context though implies job satisfaction and higher salary. There … Read more

Strategic Data Science: Creating Value With Data Big and Small

The post Strategic Data Science: Creating Value With Data Big and Small appeared first on The Lucid Manager. Data science is without a doubt the most popular business fad of the past decade. The promise of machine learning blinds many managers so they forget about deploying these new approaches strategically. This article provides a framework … Read more

Categories R Tags ExcerptFavorite

Deep learning based super resolution, without using a GAN

This article describes the techniques and training a deep learning model for image improvement, image restoration, inpainting and super resolution. This utilises many techniques taught in the Fastai course and makes use of the Fastai software library. This method of training a model is based upon methods and research by very talented AI researchers, I’ve … Read more