An API for @racently

[This article was first published on R | datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. @racently is a side project that I have been nursing along … Read moreAn API for @racently

Using R and H2O Isolation Forest For Data Quality

[This article was first published on R-Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. suppressWarnings( suppressMessages( library( h2o ) ) ) suppressWarnings( suppressMessages( library( dygraphs ) ) … Read moreUsing R and H2O Isolation Forest For Data Quality

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Florianne Verkroost is a PhD candidate at Nuffield College at the University … Read moreA comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Statistical uncertainty with R and pdqr

General description Statistical estimation usually has the following setup. There is a sample (observed, usually randomly chosen, set of values of measurable quantities) from some general population (whole set of values of the same measurable quantities). We need to make conclusions about the general population based on a sample. This is done by computing summary … Read moreStatistical uncertainty with R and pdqr

Cleaning the Table

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While I’m talking about getting data into R this weekend, here’s … Read moreCleaning the Table

Dangerous streets of Bratislava! Animated maps using open data in R

[This article was first published on Peter Laurinec, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the work recently, I wanted to make some interesting start-up pitch … Read moreDangerous streets of Bratislava! Animated maps using open data in R

future 1.15.0 – Lazy Futures are Now Launched if Queried

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. No dogs were harmed while making this release future 1.15.0 is … Read morefuture 1.15.0 – Lazy Futures are Now Launched if Queried

Reading in Data

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here’s a common situation: you have a folder full of similarly-formatted … Read moreReading in Data

Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

[This article was first published on Jozef’s Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous parts of this series, we have shown how to … Read moreUsing Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

Intrumental variable regression and machine learning

Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. … Read moreIntrumental variable regression and machine learning

Learning Linux – the wrong way – day 2

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Unborking the borked laptop – Recap I’m trying to learn some Linux. Ostensibly … Read moreLearning Linux – the wrong way – day 2

A small simple random sample will often be better than a huge not-so-random one by @ellis2013nz

An interesting big data thought experiment The other day on Twitter I saw someone referencing a paper or a seminar or something that was reported to examine the following situation: if you have an urn with a million balls in it of two colours (say red and white) and you want to estimate the proportion … Read moreA small simple random sample will often be better than a huge not-so-random one by @ellis2013nz

OddsPlotty – the first official package I have ‘officially’ launched

Motivation for this The background to this package linked to a project I undertook about a year ago. The video relates to the project and the how R really sped up the process. The exam question was to use a regression model to predict admissions and we had to evaluate, as a consequence, 60 different … Read moreOddsPlotty – the first official package I have ‘officially’ launched

Tidyverse evolutions: curly-curly operator and pivoting (feat. tidytuesday data & leaflet visuals)

The tidyverse ecosystem is steadily growing and adapting to the needs of its users. As part of this evolution, existing tools are being replaced by new and better methods. As useful as this flexibility is to the strength of the system, sometimes it can be hard to keep track of all the changes. This blogpost … Read moreTidyverse evolutions: curly-curly operator and pivoting (feat. tidytuesday data & leaflet visuals)

Package Manager 1.1.0 – No Interruptions

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. No interruptions. That was our team’s goal for RStudio Package Manager 1.1.0 … Read morePackage Manager 1.1.0 – No Interruptions

3D GPS data animation – virtually climb the Alps

[This article was first published on Sebastian Wolf blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Using the amazing package rayshader I wanted to render a video … Read more3D GPS data animation – virtually climb the Alps

Combining Price Elasticities and Sales Forecastings for Sales Improvement

[This article was first published on r-bloggers | STATWORX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In our blog, we have talked a lot about calculating elasticities. … Read moreCombining Price Elasticities and Sales Forecastings for Sales Improvement

styler 1.2.0

We are pleased to announce that styler 1.2.0 is nowavailable on CRAN. All the below features were added afterstyler 1.1.0, exceptthe ones listed under Other changes were added somewhere between 1.0.0 and1.1.0. Let’s get started: install.packages(“styler”) library(styler) styler can finally detect aligned code and keep it aligned! For example, thefollowing code won’t be modified by … Read morestyler 1.2.0

rOpenSci Announces a New Award From The Gordon and Betty Moore Foundation to Improve the Scientific Package Ecosystem for R

Today we are pleased to announce that we have received new funding from the Gordon and Betty Moore Foundation. The $894k grant will help us improve infrastructure for R packages and enable us to move towards a science first package ecosystem for the R community. You may have already noticed some developments on this front … Read morerOpenSci Announces a New Award From The Gordon and Betty Moore Foundation to Improve the Scientific Package Ecosystem for R

KGC Climate Classification and Solar Irradiance through R Packages

I obviously haven’t been blogging lately, but that doesn’t mean that I haven’t been thinking about what ought to be my next blog post. Fortunately, I’ve had the chance to get to know two particularly impressive R packages which are available to the scientific community through CRAN. Together, the two packages can be used to … Read moreKGC Climate Classification and Solar Irradiance through R Packages

D&D — Cut-off analysis over the Armour Class Score AC

Photo by Clint Bustrillos on Unsplash Cut off analysis is commonly used in Medicine through diagnostic tests that are used to discriminate between diseased and healthy populations using biomarkers analysis. In this case the cut-off value defines the positive and negative test results. Cut off analysis is also used in Finance especially in credit scoring … Read moreD&D — Cut-off analysis over the Armour Class Score AC

Implicit Tax Rates on Consumption and Labor in Europe

The aim of this blog post is to compute the implicit tax rates (ITR) on consumption, labour and corporate income for France, Italy, Spain, Germany and the Euro Area since 1995. We use as reference the report on Taxation trends in the European Union (2019) from the European Commission, and the previous reports since 2014. This … Read moreImplicit Tax Rates on Consumption and Labor in Europe

Data Science on Rails: Analyzing Customer Churn

Customer Relationship Management (CRM) is not only about acquiring new customers but especially about retaining existing ones. That is because acquisition is often much more expensive than retention. In this post, we learn how to analyze the reasons of customer churn (i.e. customers leaving the company). We do this with a very convenient point-and-click interface … Read moreData Science on Rails: Analyzing Customer Churn

RSiteCatalyst Version 1.4.16 Release Notes

[This article was first published on randyzwitch.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. It’s been a while since the last update, but RSiteCatalyst is still going … Read moreRSiteCatalyst Version 1.4.16 Release Notes

Data Scientist or Data Engineer – what’s the difference?

When it was floated that I should write this article, I approached it with trepidation. There is no better way to start an argument in the world of data than by trying to define what a Data Scientist is or isn’t – by adding in the complication of the relatively newly appearing role of Data … Read moreData Scientist or Data Engineer – what’s the difference?

November Thanksgiving – Data Science Style!

Hello All, November is the month of Thanksgiving, and vacations and of course deals galore! As part of saying thanks to my loyal readers, here are some deals specific to data science professionals and students, that you should definitely not miss on. Book deals: If you are exploring Data Science careers or preparing for interviews … Read moreNovember Thanksgiving – Data Science Style!

Why empathy is key for Data Science initiatives

When we think of empathy in a career, we perhaps think of a nurse with a good bedside manner, or perhaps a particularly astute manager or HR professional. Data science is probably one of the last disciplines where empathy would seem to be important. However, this misconception is one that frequently leads to the failure of data … Read moreWhy empathy is key for Data Science initiatives

{tvthemes 1.1.0} is on CRAN: Creating a {pkgdown} website, Gravity Falls palette, and more!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The newest version of {tvthemes} is now on CRAN! v1.1.0 features … Read more{tvthemes 1.1.0} is on CRAN: Creating a {pkgdown} website, Gravity Falls palette, and more!

R Owl of Athena

RBloggers|RBloggers-feedburner After developing the package RAthena, I stumbled quite accidentally into the R SDK for AWS paws. As RAthena utilises Python’s SDK boto3 I thought the development of another AWS Athena package couldn’t hurt. As mentioned in my previous blog the paws syntax is very similar to boto3 so alot of my RAthena code was … Read moreR Owl of Athena

Re-creating survey microdata from marginal totals by @ellis2013nz

I recently did some pro bono work for Gun Control NZ reviewing the analysis by a market research firm of the survey that led to this media release: “Most New Zealanders back stronger gun laws”. The analysis all checked out ok. The task at that time was to make sure that any claims about different … Read moreRe-creating survey microdata from marginal totals by @ellis2013nz

Multiple data imputation and explainability

Introduction Imputing missing values is quite an important task, but in my experience, very often, it is performedusing very simplistic approaches. The basic approach is to impute missing values fornumerical features using the average of each feature, or using the mode for categorical features.There are better ways of imputing missing values, for instance by predicting … Read moreMultiple data imputation and explainability

Command Centre amplification with predictive analytics and machine learning

Recently, our team at Draper and Dash have been busy creating an NHS operational command centre. This command centre is different, as it uses a collection and ensemble of cutting edge predictive and machine learning techniques. To read the blog you can access this below: We have really enjoyed the process and we are in … Read moreCommand Centre amplification with predictive analytics and machine learning

LongCART – Regression tree for longitudinal data

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Longitudinal changes in a population of interest are often heterogeneous and may be … Read moreLongCART – Regression tree for longitudinal data

New package: simfinR

Example 01 – Apples Quarterly Net Profit The first step in using simfinR is finding information about available companies: library(simfinR) library(tidyverse) # You need to get your own api key at https://simfin.com/ my_apy_key <- readLines(‘~/Dropbox/.api_key_simfin.txt’) # get info df_info_companies <- simfinR_get_available_companies(my_apy_key) # check it glimpse(df_info_companies) ## Observations: 2,564 ## Variables: 3 ## $ simId 171401, … Read moreNew package: simfinR

Offensive Programming in action (part III)

[This article was first published on NEONIRA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the third post on offensive programming, dedicated to using offensive programming … Read moreOffensive Programming in action (part III)