A brief primer on Variational Inference

[This article was first published on Fabian Dablander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Bayesian inference using Markov chain Monte Carlo methods can be notoriously slow. … Read moreA brief primer on Variational Inference

You Shouldn’t Call Yourself a Data Scientist if You Don’t Know This

Today I want to break down the central limit theorem and how it relates to so much of the work that a data scientist performs. First things first, a core tool to any data scientist is a very simple chart type called a histogram. While you’re sure to have seen many a histogram, we often … Read moreYou Shouldn’t Call Yourself a Data Scientist if You Don’t Know This

81st TokyoR Meetup Roundup: A Special Session in {Shiny}!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As another sweltering summer ends, another TokyoR Meetup! With globalwarming in … Read more81st TokyoR Meetup Roundup: A Special Session in {Shiny}!

The Mysterious Case of the Ghost Interaction

This spooky post was written in collaboration with Yoav Kessler (@yoav_kessler) and Naama Yavor (@namivor).. Experimental psychology is moving away from repeated-measures-ANOVAs, and towards linear mixed models (LMM). LMMs have many advantages over rmANOVA, including (but not limited to): Analysis of single trial data (as opposed to aggregated means per condition). Specifying more than one … Read moreThe Mysterious Case of the Ghost Interaction

Non-randomly missing data is hard, or why weights won’t solve your survey problems and you need to think generatively

Throw this onto the big pile of stats problems that are a lot more subtle than they seem at first glance. This all started when Lauren pointed me at the post Another way to see why mixed models in survey data are hard on Thomas Lumley’s blog. Part of the problem is all the jargon … Read moreNon-randomly missing data is hard, or why weights won’t solve your survey problems and you need to think generatively

Extracting basic Plots from Novels: Dracula is a Man in a Hole

In 1965 the University of Chicago rejected Kurt Vonnegut’s college thesis, which claimed that all stories shared common structures, or “shapes”, including “Man in a Hole”, “Boy gets Girl” and “Cinderella”. Many years later the then already legendary Vonnegut gave a hilarious lecture on this idea – before continuing to read on please watch it … Read moreExtracting basic Plots from Novels: Dracula is a Man in a Hole

(Re)introducing skimr v2 – A year in the life of an open source R project

Theme song: PSA by Jay-Z We announced the testing version of skimr v2 onJune 19, 2018. After more than ayear of (admittedly intermittent) work, we’re thrilled to be able to say thatthe package is ready to go to CRAN. So, what happened over the last year? Andwhy are we so excited for v2? Wait, what … Read more(Re)introducing skimr v2 – A year in the life of an open source R project

An Amazon SDK for R!?

RBloggers|RBloggers-feedburner For a long time I have found it difficult to appreciate the benefits of “cloud compute” in my R model builds. This was due to my initial lack of understanding and the setting up of R on cloud compute environments. When I noticed that AWS was bringing out a new product AWS Sagemaker, the … Read moreAn Amazon SDK for R!?

Mocking is catching

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. When writing unit tests for a package, you might find … Read moreMocking is catching

Any one interested in a function to quickly generate data with many predictors?

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A couple of months ago, I was contacted about the possibility … Read moreAny one interested in a function to quickly generate data with many predictors?

Dogs of New York

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The other week I took a few publicly-available datasets that I … Read moreDogs of New York

The Chaos Game: an experiment about fractals, recursivity and creative coding

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Mathematics, rightly viewed, possesses not only truth, but supreme beauty (Bertrand … Read moreThe Chaos Game: an experiment about fractals, recursivity and creative coding

How to Grow Your Own Data Scientists – a practical guide for the data-driven C-Suite

Data today is the fuel driving the modern business world. It therefore stands to reason that the ability to read and speak data should be a fairly mainstream skill. Except it isn’t ­- yet. A 2018 report by Qlik suggests that just 24% of business decision were fully confident in their abilities with data. This is despite the … Read moreHow to Grow Your Own Data Scientists – a practical guide for the data-driven C-Suite

New Introduction to rquery

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in Codd’s relational algebra, with the … Read moreNew Introduction to rquery

GRNN vs. GAM

In practice, GRNN is very similar to GAM (Generalized Additive Models) in the sense that they both shared the flexibility of approximating non-linear functions. In the example below, both GRNN and GAM were applied to the Kyphosis data that has been widely experimented in examples of GAM and revealed very similar patterns of functional relationships … Read moreGRNN vs. GAM

Accupedo vs. Fitbit Part 1: Convergent Validity of Hourly Step Counts with R

[This article was first published on Method Matters Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this post, we will investigate the relationship between hourly step … Read moreAccupedo vs. Fitbit Part 1: Convergent Validity of Hourly Step Counts with R

A duck. Giving a look at DuckDB since MonetDBLite was removed from CRAN

[This article was first published on Guillaume Pressiat, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. You may know that MonetDBLite was removed from CRAN.DuckDB comming up. > … Read moreA duck. Giving a look at DuckDB since MonetDBLite was removed from CRAN

A Comprehensive Introduction to Command Line for R Users

[This article was first published on Rsquared Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this tutorial, you will be introduced to the command line. … Read moreA Comprehensive Introduction to Command Line for R Users

RAthena 1.3.0 has arrived

RBloggers|RBloggers-feedburner RAthena is a R package that interfaces into Amazon Athena. However, it doesn’t use the standard ODBC and JDBC drivers like AWR.Athena and metis. Instead RAthena utilises Python’s SDK (software development kit) into Amazon, Boto3. It does this by using the reticulate package that provides an interface into Python. What this means is that … Read moreRAthena 1.3.0 has arrived

Pivoting tidily

One of the fun bits of my job is that I have actual time dedicated to helping colleagues and grad students with statistical or computational problems. Recently I’ve been helping one of our Lab Instructors with some data that from their Plant Physiology Lab course. Whilst I was writing some R code to import the … Read morePivoting tidily

Words that will inspire, a data science project of TED Talks

“Words that will Inspire” is an analysis on 2,500+ TED talks using text analytics and machine learning on R to find the factors that make some talks more popular than others. What was the motivation for doing this project?I am part of a meetup group called Data Scientist speakers in London that meets regularly to … Read moreWords that will inspire, a data science project of TED Talks

The politics of New Mexico: a brief historical-visual account

[This article was first published on Jason Timm, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this post, we piece together a brief political history of New … Read moreThe politics of New Mexico: a brief historical-visual account

Text Mining with the Democratic Debates

There have been 6 official democratic debates so far and more in the future. The debates are long and frankly very boring. The argument for the debates is to introduce America the candidates and understand their policy positions. However, America has a short attention span and with a 3-hour run-time, that’s a lot of time … Read moreText Mining with the Democratic Debates

Decision Making Support Systems #3: Differences between IA and AI

The Differences between Artificial Intelligence and Augmented Intelligence In previous posts, we looked at the definition of Artificial Intelligence (AI) and the definition of Intelligence Augmentation (IA). So, what are the differences between the two? Intelligence Augmentation has always been concerned with aiding human decision making and keeping humans-in-the-loop, whereas the AI endeavor seeks to … Read moreDecision Making Support Systems #3: Differences between IA and AI

Horizontal scaling of data science applications in the cloud

Prediction models, machine learning algorithms and scripts for data storage: The modern data science application not only shows more and more complexity, but also puts the existing infrastructure to the test by temporary resource peaks. In this article, we will show how tools such as the RStudio Job Launcher in conjunction with a Kubernetes cluster … Read moreHorizontal scaling of data science applications in the cloud

Super Solutions for Shiny Architecture #5 of 5: Automated Tests

TL;DR Describes the best practices for setting automated test architecture for Shiny apps. Automate and test early and often with unit tests, user interface tests, and performance tests. Best Practices for Testing Your Shiny App Even your best apps will break down at some point during development or during User Acceptance Tests. I can bet … Read moreSuper Solutions for Shiny Architecture #5 of 5: Automated Tests

Understanding Blockchain Technology by building one in R

By now you will know that it is a good tradition of this blog to explain stuff by rebuilding toy examples of it in R (see e.g. Understanding the Maths of Computed Tomography (CT) scans, So, what is AI really? or Google’s Eigenvector… or how a Random Surfer finds the most relevant Webpages). This time … Read moreUnderstanding Blockchain Technology by building one in R

Ordering Sentinel-2 products from Long Term Archive with sen2r

Until August 2019, allSentinel-2 satellite datacould be directly downloaded from the ESA Data Hub, both through the interactiveOpen Hub or using anAPI interface. Recently this policy was changed: typically, only the most recent productsare available for direct download, while the oldest ones (level 2A archivesolder than 18 months and level 1C older than one year) … Read moreOrdering Sentinel-2 products from Long Term Archive with sen2r

Trends in U.S. Border Crossing Entry since 1996

Introduction Since the 2016 election, inland U.S. Border security has been the huge topic. The construction for the new border wall has started and the tension between Mexico and U.S. has intensified along with it. Many people predicted not only the decrease in number of illegal border entry but also the decrease in number of … Read moreTrends in U.S. Border Crossing Entry since 1996

Avoiding embarrassment by testing data assumptions with expectdata

[This article was first published on Dan Garmat’s Blog — R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Expectdata is an R package that makes it easy … Read moreAvoiding embarrassment by testing data assumptions with expectdata

rmangal: making ecological networks easily accessible

In early September, the version 2.0.0 of rmangal was approved byrOpenSci, four weeks later it made it to CRAN. Following-up on our experience wedetail below the reasons why we wrote rmangal, why we submitted our package torOpenSci and how the peer review improved our package. Mangal, a database for ecological networks Ecological networks are defined … Read morermangal: making ecological networks easily accessible