Be careful of NA/NaN/Inf values when using base R’s plotting functions!

I was recently working on a supervised learning problem (i.e. building a model using some features to predict some response variable) with a fairly large dataset. I used base R’s plot and hist functions for exploratory data analysis and all looked well. However, when I started building my models, I began to run into errors. … Read more Be careful of NA/NaN/Inf values when using base R’s plotting functions!

Cyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

Introduction The Prudential Ride London is an annual summer cycling weekend and within that I will focus on the Ride London-Surrey 100, a 100 mile route open to the public starting at the Stratford Olympic Park in East London and finishing in front of Buckingham Palace. I wrote an initial analysis using R in 2016 … Read more Cyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

How to Automate EDA with DataExplorer in R

EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That … Read more How to Automate EDA with DataExplorer in R

Do you love Data Science? I mean, the Data part in it

Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that’s the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle … Read more Do you love Data Science? I mean, the Data part in it

Synthesizing population time-series data from the USA Long Term Ecological Research Network

Introduction The availability of large quantities of freely available data is revolutionizing the world of ecological research. Open data maximizes the opportunities to perform comparative analyses and meta-analyses. Such synthesis efforts will increasingly exploit “population data”, which we define here as time series of population abundance. Such population data plays a central role in testing … Read more Synthesizing population time-series data from the USA Long Term Ecological Research Network

Local randomness in R

[This article was first published on rstats on QuestionFlow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Prologue Let’s say we have a deterministic (non-random) problem for which … Read more Local randomness in R

Plumber Logging

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The plumber R package is used to expose R functions as API … Read more Plumber Logging

AI, Machine Learning and Data Science Roundup: July/August 2019

A mostly monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News StanfordNLP: a pure-Python package for grammatical … Read more AI, Machine Learning and Data Science Roundup: July/August 2019

Can we use a neural network to generate Shiny code?

Many news reports scare us with machines taking over our jobs in the not too distant future. Common examples of take-over targets include professions like truck drivers, lawyers and accountants. In this article we will explore how far machines are from replacing us (R programmers) in writing Shiny code. Spoiler alert: you should not be … Read more Can we use a neural network to generate Shiny code?

Vectors and Functions

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous set we started with arithmetic operations on vectors. We’ll take … Read more Vectors and Functions

tsbox 0.2: supporting additional time series classes

[This article was first published on R – usefulr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The tsbox package makes life with time series in R easier. … Read more tsbox 0.2: supporting additional time series classes

Visualization of Red Tide in the Gulf of Mexico

The Red Tide Visualization App offers a quick, interactive snapshot of harmful algae blooms, or ‘Red Tide,’ observed in the Gulf of Mexico from 2000 to 2018.  This app allows the user to examine 6,000 algal blooms recorded by the National Oceanic and Atmosphereic Association’s harmful algae bloom observation system and their corresponding water temperatures, water … Read more Visualization of Red Tide in the Gulf of Mexico

vtreat up on PyPi

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I am excited to announce vtreat is now available for … Read more vtreat up on PyPi

Building a data pipeline- uploading external data in AWS S3

[This article was first published on Stories Data Speak, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction Recently, I stepped into the AWS ecosystem to learn and … Read more Building a data pipeline- uploading external data in AWS S3

Returning to Tides

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Fred Viole shared a great “data only” R solution to … Read more Returning to Tides

Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

[This article was first published on Jozef’s Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous post, we focused on setting up declarative Jenkins pipelines … Read more Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

Road to Rugby World Cup 2019: Rugby scores decomposition

With the Rugby World Cup 2019 Japan starting on 20th September, I thought I’d take a look at the tournament from a few different statistical angles. For this post I’ll be looking at the problem: given a rugby score, how can we decompose it into possible combinations of tries, conversions, penalties and dropped goals? Context … Read more Road to Rugby World Cup 2019: Rugby scores decomposition

How to generate meaningful fake data for learning, experimentation and teaching

The Problem There’s one thing about R that a lot of people have as their Top-of-Mind. That’s the black-and-white plot of iris dataset which is definitely a huge boring view of R. That’s boring because of aesthetics but also because it’s such a cliched example used over and over again. The other problem is finding … Read more How to generate meaningful fake data for learning, experimentation and teaching

How to Do Mediation Scientifically

Mediation analysis has been around a long time, though its popularity has varied between disciplines and over the years. While some fields have been attracted to the potential of mediation models to identify pathways, or mechanisms, through which an independent variable affects an outcome, others have been skeptical that the analysis of mediated relationships can … Read more How to Do Mediation Scientifically

Confusion matrices – evaluating your classification models

As part of my companies commitment to allow our users access to some of our code – we have created a visual way you can assess your accuracy in a confusion matrix. The below was posted on our blog site and shows how to interpret a confusion matrix. I hope the attached article is useful. … Read more Confusion matrices – evaluating your classification models

A Flurry of Facets

The remainder of the release centers around facets and a few geoms that has beenmade specifically for them. Enter the matrix The biggest news is undoubtedly the introduction of facet_matrix(), a facetthat allows you to create a grid of panels with different data columns in thedifferent rows and columns of the grid. Examples of such … Read more A Flurry of Facets

In search of the perfect partial plot

[This article was first published on Artful Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction Partial dependence (PD) plots are essential for interpreting Random Forests models. … Read more In search of the perfect partial plot

A Comprehensive Introduction to Working with Databases using R

[This article was first published on Rsquared Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction In a previous post, we had briefly looked at connecting … Read more A Comprehensive Introduction to Working with Databases using R

B3 is NOT shutting down its ftp site, for now..

[This article was first published on R on msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Surprise, surprise. B3’s ftp site is still up and running. Following … Read more B3 is NOT shutting down its ftp site, for now..

tikzDevice v0.12.3

[This article was first published on R on Ralf Stubner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Yesterday tikzDevice version 0.12.3 madeit unto CRAN and is nowpropagating … Read more tikzDevice v0.12.3

Community Call – Reproducible Workflows at Scale with drake

Ambitious workflows in R, such as machine learning analyses, can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R … Read more Community Call – Reproducible Workflows at Scale with drake

Freeing the data scientist mind from the curse of vectoRization

Julia to the rescue! Photo by Debby Hudson on Unsplash Nowadays, most data scientists use either Python or R as their main programming language. That was also my case until I met Julia earlier this year. Julia promises performance comparable to statically typed compiled languages (like C) while keeping the rapid development features of interpreted … Read more Freeing the data scientist mind from the curse of vectoRization

Data Warehousing and Data Mining

The relationship between data mining tools and data warehousing systems can be most easily seen in the connector options of popular analytics software packages. For example, the image below right shows the many source options from which to pull data in from warehouse backends in Tableau Desktop. Microsoft Power BI includes similar interface options. There … Read more Data Warehousing and Data Mining

R collation order

[This article was first published on R – David’s blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. You need to declare generic functions in S4 before you … Read more R collation order

Extract Top Reddit Posts of #rstats in 3 lines of R Code

This post is kept (literally) minimal to demonstrate how simple is this hack using R (of course could be simple in other languages too). This is also to establish a point that R has got use-cases beyond statistics and data-mining. Objective rstats subreddit is one of the popular sources of R-related information / discussion on … Read more Extract Top Reddit Posts of #rstats in 3 lines of R Code

Ei Gude! – Data science courses with R in Frankfurt

Unlock the potential of data science with the free R programming language for advanced analytics and data visualization. In our popular R trainings you will learn how to implement data science in your company. With over 1,500 satisfied participants, eoda’s R courses are among the leading courses in the German-speaking world. This time we bring … Read more Ei Gude! – Data science courses with R in Frankfurt

Introducing correlationfunnel v0.1.0 – Speed Up Exploratory Data Analysis by 100X

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I’m pleased to announce the introduction of correlationfunnel version 0.1.0, which officially hit … Read more Introducing correlationfunnel v0.1.0 – Speed Up Exploratory Data Analysis by 100X

Dollar Signs and Percentages- 3 Different Ways to Convert Data Types in R

[This article was first published on R – data technik, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Working with percentages in R can be a little tricky, … Read more Dollar Signs and Percentages- 3 Different Ways to Convert Data Types in R