Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be given to implementing a workflow that facilitates collaboration and ensures R project reproducibility. There are many possible workflows to accomplish this. In this post, we … Read more

Categories R Tags ExcerptFavorite

Vectorizing functions in R is easy

Imagine you have a function that only takes one argument, but you would really like to work on a vector of values. A short example on how function Vectorize() can accomplish this. Let’s say we have a data.frame xy <- data.frame(sample = c(“C_pre_sample1”, “C_post_sample1”, “T_pre_sample2”, “T_post_sample2”, “NA_pre_sample1”), value = runif(5)) # sample value # 1 … Read more

Categories R Tags ExcerptFavorite

Two interesting facts about high-dimensional random projections

John Cook recently wrote an interesting blog post on random vectors and random projections. In the post, he states two surprising facts of high-dimensional geometry and gives some intuition for the second fact. In this post, I will provide R code to demonstrate both of them. Fact 1: Two randomly chosen vectors in a high-dimensional … Read more

Categories R Tags ExcerptFavorite

Controlling Data Layout With cdata

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 … Read more

Categories R Tags ExcerptFavorite

Cutomize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to explore the U.S. EPA data on fuel economy. Our objective is to develop an interactive display that guides the reader on how to explore the fuel … Read more

Categories R Tags ExcerptFavorite

Customize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to explore the U.S. EPA data on fuel economy. Our objective is to develop an interactive display that guides the reader on how to explore the fuel … Read more

Categories R Tags ExcerptFavorite

Writing a letter to DataCamp

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have developed content for the company as a contractor. I have two courses there, one on text mining and one on practical supervised machine learning. About two … Read more

Categories R Tags ExcerptFavorite

Even with randomization, mediation analysis can still be confounded

Randomization is super useful because it usually eliminates the risk that confounding will lead to a biased estimate of a treatment effect. However, this only goes so far. If you are conducting a meditation analysis in the hopes of understanding the underlying causal mechanism of a treatment, it is important to remember that the mediator … Read more

Categories R Tags ExcerptFavorite

The sinh-arcsinh normal distribution

This month’s issue of Significance magazine has a very nice summary article of the sinh-arcsinh normal distribution. (Unfortunately, the article seems to be behind a paywall.) This distribution was first introduced by Chris Jones and Arthur Pewsey in 2009 as a generalization of the normal distribution. While the normal distribution is symmetric and has light … Read more

Categories R Tags ExcerptFavorite

BayesComp 20 [full program]

The full program is now available on the conference webpage of BayesComp 20, next 7-10 Jan 2020. There are eleven invited sessions, including one j-ISBA session, and a further thirteen contributed sessions were selected by the scientific committee. Calls are still open for tutorials on Tuesday 07 January (with two already planed on Nimble and … Read more

Categories R Tags ExcerptFavorite

Bioconductor S4 classes for high-throughput omics data

Bioconductor S4 classes for high-throughput omics data Motivation Multi-omics data integration and analysis. What a beast! It is one of the major challenges in the era of personalized/precision medicine (or whatever you want to call it). Denfinetely mine, as someone who is expected to grab such messy data from multiple sources, allign and annotate it … Read more

Categories R Tags ExcerptFavorite

R Programmers Earn More than Python Programmers

At least globally, that is. According to the 2019 Stack Overflow Developer Survey, R users globally reported earning an average of $64k per year, $1k more than the $63k reported by Python developers. In the United States, that situation reverses, with Python programmers earning $116k and R programmers $108k. Global Average Salaries by Technology United … Read more

Categories R Tags ExcerptFavorite

Describe and understand Bayesian models and posteriors using bayestestR

The Bayesian framework is quickly gaining popularity among scientists, leading to the growing popularity of packages to fit Bayesian models, such as rstanarm or brms. However, extracting summary indices from these models to report them in your manuscript can be quite challenging, especially for new users. To address this, please let us introduce bayestestR! bayestestR … Read more

Categories R Tags ExcerptFavorite

{attempt} 0.3.0 is now on CRAN

Last week, a new version of {attempt} was published on CRAN. Thisversion includes some improvements in the current code base, and theaddition of new functions. You can get it with our old friend install.packages install.packages(“attempt”) News in version 0.3.0 library(attempt) packageVersion(“attempt”) ## [1] ‘0.3.0’ Newcomers in this version: The is_try_error() function, that tests if an … Read more

Categories R Tags ExcerptFavorite

New package: GetBCBData

The Central Bank of Brazil (BCB) offers access to its SGS system (sistema gerenciador de series temporais) with a official API available here. With time, I find myself using more and more of the available datasets in my regular research and studies. Over last weekend I decided to write my own API package that would … Read more

Categories R Tags ExcerptFavorite

Tidying Video Game Metadata: A Case Study

Categories Data Management Tags Case Study Data Manipulation Data Visualisation R Programming tidyverse This article was jointly written by Arvid J. Kingl & Viktor Konakovsky The Battle for Wesnoth is an open-source, turn-based strategy game. The game world is rich, with several factions, maps and literally hundreds of available units. In this tutorial, you will … Read more

Categories R Tags ExcerptFavorite

Understanding Bayesian Inference with a simple example in R!

 Hi there! Last summer, the Royal Botanical Garden (Madrid, Spain) hosted the first edition of MadPhylo, a workshop about Bayesian Inference in phylogeny using RevBayes. It was a pleasure for me to be part of the organization staff with John Huelsenbeck, Brian Moore, Sebastian Hoena, Mike May, Isabel Sanmartin and Tamara Villaverde. Next edition of … Read more

Categories R Tags ExcerptFavorite

Piping is Method Chaining

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called … Read more

Categories R Tags ExcerptFavorite

tsbox 0.1: class-agnostic time series

The R ecosystem knows a vast number of time series classes: ts, xts, zoo, tsibble, tibbletime or timeSeries. The plethora of standards causes confusion. As different packages rely on different classes, it is hard to use them in the same analysis. tsbox provides a set of tools that make it easy to switch between these classes. It also allows … Read more

Categories R Tags ExcerptFavorite

MailR SMTP Setup (Gmail, Outlook, Yahoo) | STARTTLS

The mailR package allows you to easily send e-mails with R, but you need the right mailR SMTP settings. Getting the SMTP settings just right to establish a connection to e-mail hosts like Gmail, Outlook, or Yahoo can be challenging. This is especially true when there are some settings you need to change on the … Read more

Categories R Tags ExcerptFavorite

Batch Processing of Monotonic Binning

In my GitHub repository (https://github.com/statcompute/MonotonicBinning), multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, we often would want to apply the binning algorithm … Read more

Categories R Tags ExcerptFavorite

historical word embeddings & lexical semantic change

I have developed a Git Hub guide that demonstrates a simple workflow for sampling Google n-gram data and building historical word embeddings with the aim of investigating lexical semantic change. Here, we build on this workflow, and unpack some methods presented in Hamilton, Leskovec, and Jurafsky (2016) & Li et al. (2019) for aligning historical … Read more

Categories R Tags ExcerptFavorite

Introducing graphlayouts with Game of Thrones

This post introduces the new R package graphlayouts which is available on CRAN since a few days. We will usenetwork data from the Game of Thrones TV series (seemed timely at the time of writing)to illustrate the core layout algorithms of the package. Most of the algorithms usestress majorization as its basis, which I described … Read more

Categories R Tags ExcerptFavorite

Discrete Event Simulation (DES) Metamodeling – Splines with R and Arena

Simulation Metamodeling – building and using surrogate models that can approximate results from more complicated simulation models – is an interesting approach to analyze results from complicated, computationally expensive simulation models. Metamodels are useful because they can yield good approximations of the original simulation model response variables using less computational resources. For an introduction to … Read more

Categories R Tags ExcerptFavorite

R as GIS for ecologists

Working with spatial data is a key feature in ecological research. Using R to handle this type of data has the great advantage of keeping both variable extraction and modelling in the same environment, instead of recurring to external GIS softwares to compute some variables and then turning to R for modelling. In this example … Read more

Categories R Tags ExcerptFavorite

How to easily automate R analysis, modeling and development work using CI/CD, with working examples

Automating the execution, testing and deployment of R work is a very powerful tool to ensure the reproducibility, quality and overall robustness of the code that we are building, be it for data analysis and modeling purposes, developing R packages or even blogging. Modern tools also provide a free an easy to use way of … Read more

Categories R Tags ExcerptFavorite

A follow up note on our web scraping tutorial

We had published a web scraping tutoriala couple of days back and it had received a good response from the #rstats community. Whilewe thank you for that, we made a mistake in choosing one of the case study aspointed out by @hrbrmstr in this tweet: Whomever runs “R Squared Academy” needs to _really_ learn more … Read more

Categories R Tags ExcerptFavorite

BatchGetSymbols is now parallel!

BatchGetSymbols is my most downloaded package by any count. Computation time, however, has always been an issue. While downloading data for 10 or less stocks is fine, doing it for a large ammount of tickers, say the SP500 composition, gets very boring. I’m glad to report that time is no longer an issue. Today I … Read more

Categories R Tags ExcerptFavorite

Big Data: On RDDs, Dataframes,Hive QL with Pyspark and SparkR-Part 3

Out[90]: [[‘Runs’, ‘Mins’, ‘BF’, ‘4s’, ‘6s’, ‘SR’, ‘Pos’, ‘Dismissal’, ‘Inns’, ‘Opposition’, ‘Ground’, ‘Start Date’], [’15’, ’28’, ’24’, ‘2’, ‘0’, ‘62.5’, ‘6’, ‘bowled’, ‘2’, ‘v Pakistan’, ‘Karachi’, ’15-Nov-89′], [‘DNB’, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘4’, ‘v Pakistan’, ‘Karachi’, ’15-Nov-89′], [’59’, ‘254’, ‘172’, ‘4’, ‘0’, ‘34.3’, ‘6’, ‘lbw’, ‘1’, ‘v Pakistan’, ‘Faisalabad’, ’23-Nov-89′], [‘8′, ’24’, … Read more

Categories R Tags ExcerptFavorite

From Data to Insights: photos and resources

Hi R-Lovers! It was great to meet you on our last event, we hope you felt the same! For those of you who couldn’t make it, here’s a brief resume of what we talked about. From Data to insights: an overview What’s the real value of a data product?Can Design and Development walk hand in … Read more

Categories R Tags ExcerptFavorite

Setting up an R Admin Group

When I set up an R server for clients they often want to be able to install packages so that all users on the machine have access to them. This requires them to be able to install the packages onto the root filesystem rather that under their individual home directories. It would be easy enough … Read more

Categories R Tags ExcerptFavorite

Practical Introduction to Web Scraping in R

As mentioned earlier, we will first check if we can scrape data from the webpage using paths_allowed() from the robotstxt package. We need tospecify the url of the web page using the paths argument. If we can accessthe web page, paths_allowed() will return TRUE, else FALSE. Since it has returned TRUE, let us go ahead … Read more

Categories R Tags ExcerptFavorite

A Shiny Classroom Experiment with Real-Time Results Presentation

Overview Today, I used a shiny app to run a classroom experiment in the first class of my introductory cost accounting course. I uploaded code, data and materials to github so that everybody can reuse it to construct similar experiments and, of course, to replicate the results from our experiment. The experiment tests whether cost … Read more

Categories R Tags ExcerptFavorite

ViennaR Meetup March – Impressions

For all who couldn’t make it to our last ViennaR Meetup on March 18, 2019 at Webster Vienna Private University here just a short summary of the talks and takeaways. R-Ladies Laura Vana introduced the R-Ladies Vienna – a program initiated by the R-Consortium to achieve proportionate representation by encouraging, inspiring, and empowering the minorities … Read more

Categories R Tags ExcerptFavorite

100 Days of Code – Completed!

I finished the #100DaysOfCode challenge and it feels great! I will tell you a little a bit about my experience. Top 5 Takeaways: Sitting down and writing code every day is not easy Planning is critical to your success Staying motivated requires effort Being excited about your project makes a world of difference Learning takes … Read more

Categories R Tags ExcerptFavorite

R Photo

A good friend is now a professor at the University of Auckland and knew to photograph and send us this. Thanks!!! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such … Read more

Categories R Tags ExcerptFavorite

Lost In [SQL] Translation: Charting d[b]plyr Mapped SQL Function Support Across All Backends

Like more posts than I care to admit, this one starts innocently enough with a tweet by @gshotwell: Is there a reference document somewhere of which dplyr commands work on various database backends? #rstats — Gordon Shotwell (@gshotwell) April 9, 2019 Since I use at least 4 different d[b]plyr backends every week, this same question … Read more

Categories R Tags ExcerptFavorite

The evolution of my academic career as seen through posters and talks thanks to hugo academic 4.1

The hugo-academic theme which powers my website is active and frequently updated. I don’t update my website that frequently anymore, but I recently found about many of their changes when I made the CDSB website. We are delighted to share with you our new webpage at https://t.co/rNuiRlNixV with both English and Spanish support Estamos encantados … Read more

Categories R Tags ExcerptFavorite

Visualising Model Response with easyalluvial

In this tutorial I want to show how you can use alluvial plots to visualise model response in up to 4 dimensions. easyalluvial generates an artificial data space using fixed values for unplotted variables or uses the partial dependence plotting method. It is model agnostic but offers some convenient wrappers for caret models. Taking a … Read more

Categories R Tags ExcerptFavorite

Testing numeric variables for NA/NaN/Inf

In R, a numeric variable is either a number (like 0, 42, or -3.14), or one of 4 special values: NA, NaN, Inf or -Inf. It can be hard to remember how the is.x functions treat each of the special values, especially NA and NaN! The table below summarizes how each of these values is … Read more

Categories R Tags ExcerptFavorite

Garmonbozia: Using R to look at Garmin CSV data

Garmin Connect has a number of plots built in, but to take a deeper dive into all your fitness data, you need to export a CSV and fire up R. This post is a quick guide to some possibilities for running data.  There’s a few things that I wanted to look at. For example, how … Read more

Categories R Tags ExcerptFavorite

Register by 14th April! R Trainings in Hamburg

„Introduction to R“ and „Introduction to Machine Learning with R“ in Hamburg! Ensure your participation in our popular eoda courses until April 14th and become an R-expert in May. What you can look forward to? Our program at a glance:   May 14th – 15th | Introduction to R  With practical tips and exercises, this introductory … Read more

Categories R Tags ExcerptFavorite

Easily Built And Share Custom R Menu Scripts With Bio7

09.04.2019 Bio7 can be extended with R packages, Eclipse plugins and ImageJ plugins. Another very easy option to extend the Bio7 Graphical User Interface with R actions are dynamic menus which can be used, e.g., for personalized workflows or repeating tasks.The provision of new menus and nested menus is simple and can be arranged for … Read more

Categories R Tags ExcerptFavorite

The Ultimate Opinionated Guide to Base R Date Format Functions

When I was first learning R, working with dates was one of the hardest and most time consuming tasks I dealt with. There are so many things to learn! What do I do with as.POSIXct(), as.POSIXlt(), strftime(), strptime(), format(), and as.Date()? R date formats were confusing, and it seemed no matter what I did I … Read more

Categories R Tags ExcerptFavorite

Learning and Teaching R | Get to the Plot

As an experienced R user, I have seen numerous social media posts and perhaps even a few stackexchange requests over the years asking the question, “how do I learn R?”. This is a great question. A related one is, “how do I teach or coach R?” To the would-be student, I ask, “What attracted you … Read more

Categories R Tags ExcerptFavorite

Community Call – Security for R

“Security” can be a daunting, scary, and (frankly) quite often a very boring topic. BUT!, we promise that this Community Call on May 7th will be informative, engaging, and enlightening (or, at least not boring)! Applying security best practices is essential not only for developers or sensitive data storage but also for the everyday R … Read more

Categories R Tags ExcerptFavorite