Collecting Content Security Policy Violation Reports in S3 (‘Effortlessly’/’Freely’)

In the previous post I tried to explain what Content Security Policies (CSPs) are and how to work with them in R. In case you didn’t RTFPost the TLDR is that CSPs give you control over what can be loaded along with your web content and can optionally be configured to generate a violation report … Read moreCollecting Content Security Policy Violation Reports in S3 (‘Effortlessly’/’Freely’)

Using R to track NHS winter pressures

1. R packages we used The tidyverse collection of R packages is a useful set of tools for data analysis and visualisation that are designed to work together. It contains the ggplot package, which we use for visualisations. (Need help getting started with R and the tidyverse? Try the R for Data Science website). We use … Read moreUsing R to track NHS winter pressures

Pi day quiz

Today is the Pi Day – an annual celebration of the mathematical constant Pi. It is observed every year on March 14, since Pi can be approximated by 3.14, and the this date is written as 3/14 in the month/day format. To celebrate this day, here is a short quiz about Pi. You can find … Read morePi day quiz

Simulating the Six Nations 2019 Rugby Tournament in R: Final Round Update

In an earlier post I blogged how I had made a Monte Carlo simulation model of the Six Nations Rugby Tournament.  With the final round of the tournament approaching this Saturday, I decided to do a quick update. Who can win at this stage?Wales, England, or Ireland can still win.  Scotland, France and Italy do not have … Read moreSimulating the Six Nations 2019 Rugby Tournament in R: Final Round Update

Visually explore Probability Distributions with vistributions

We are happy to introduce the vistributions package, a set of tools forvisually exploring probability distributions. Installation # Install release version from CRAN install.packages(“vistributions”) # Install development version from GitHub # install.packages(“devtools”) devtools::install_github(“rsquaredacademy/vistributions”) Shiny App vistributions includes a shiny app which can be launched using vdist_launch_app() or try the live version here. [embedded content] Read … Read moreVisually explore Probability Distributions with vistributions

Zotero hacks: unlimited synced storage and its smooth use with rmarkdown

Here is a bit refreshed translation of my 2015 blog post, initially published on Russian blog platform habr.com. The post shows how to organize a personal academic library of unlimited size for free. This is a funny case of a self written manual which I came back to multiple times myself and many many more … Read moreZotero hacks: unlimited synced storage and its smooth use with rmarkdown

Unit Tests in R

I am collecting here some notes on testing in R. There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is … Read moreUnit Tests in R

Unpacking immigration collocations

As part of our road to detecting metaphors we got stuck on a simple problem: compound nouns. If you take the sentence: series of immigration policy changes Series modifies changes in reference to immigration policy, which is a compound noun. “Series of changes” is not what we would consider metaphorical usage, but our detector would label … Read moreUnpacking immigration collocations

An Interesting Subtlety of Statistics: The Hot Hand Fallacy Fallacy

Last week I stumbled across a very interesting recent Econometrica article by Joshua Miller and Adam Sanjuro. I was really surprised by the statistical result they discovered and guess the issue may even have fooled Nobel Prize winning behavioral economists. Before showing the statistical subtlety, let me briefly explain the Hot Hand Fallacy. Consider a … Read moreAn Interesting Subtlety of Statistics: The Hot Hand Fallacy Fallacy

“X affects Y”. What does that even mean?

On my last post I gave an intuitive demonstration of what’s causal inference and how it’s different than classic ML.After receiving some feedback I realize that while the post was easy to digest, some confusion remains. Related To leave a comment for the author, please follow the link and comment on their blog: R on … Read more“X affects Y”. What does that even mean?

World population growth through time

A few months ago I have made an attempt to visualize the world population changes from 1800 to 2100: Inspired by @MaxCRoser and @jkottke, I’ve tried to visualize the world population changes from 1800 to 2100. My new blog post at https://t.co/XpBpkZLO9s describes how this animation was made using #rstats and #OpenData. pic.twitter.com/WI3gj0xUwU — Jakub … Read moreWorld population growth through time

RStudio Package Manager 1.0.6 – README

The 1.0.6 release of RStudio Package Manager helps R users understand packages.The primary feature in this release is embedded package READMEs, detailed below.If you’re new to Package Manager, it is an on-premise product built to give teams and organizations reliable and consistent package management. Download an evaluationtoday. View package READMEs in Package Manager Package READMEs … Read moreRStudio Package Manager 1.0.6 – README

R 3.5.3 now available

The R Core Team announced yesterday the release of R 3.5.3, and updated binaries for Windows and Linux are now available (with Mac sure to follow soon). This update fixes three minor bugs (to the functions writeLines, setClassUnion, and stopifnot), but you might want to upgrade just to avoid the “package built under R 3.5.4” … Read moreR 3.5.3 now available

Installing Socviz

I’ve gotten a couple of reports from people having trouble installing the socviz library that’s meant to be used with Data Visualization: A Practical Introduction. As best as I can tell, the difficulties are being caused by GitHub’s rate limits. The symptom is that, after installing the tidyverse and devtools libraries, you try install_github(“kjhealy/socviz”) and … Read moreInstalling Socviz

Uber/Lyft Maximization: More Money for The Time

Motivation Uber and Lyft who are the main ridesharing companies can make more money at a faster rate by filling their cars with passengers at a higher peak time when they are on the road. The typical Uber/Lyft driver normally have full-time jobs, full-time students, or in between jobs. Being an Uber/Lyft driver to make … Read moreUber/Lyft Maximization: More Money for The Time

DALEX has a new skin! Learn how it was designed at gdansk2019.satRdays

DALEX is an R package for visual explanation, exploration, diagnostic and debugging of predictive ML models (aka XAI – eXplainable Artificial Intelligence). It has a bunch of visual explainers for different aspects of predictive models. Some of them are useful during model development some for fine tuning, model diagnostic or model explanations. Recently Hanna Dyrcz … Read moreDALEX has a new skin! Learn how it was designed at gdansk2019.satRdays

Binning Data with rbin

We are happy to introduce the rbin package, a set of tools for binning/discretizationof data, designed keeping in mind beginner/intermediate R users. It comes withtwo RStudio addins for interactive binning. Installation # Install release version from CRAN install.packages(“rbin”) # Install development version from GitHub # install.packages(“devtools”) devtools::install_github(“rsquaredacademy/rbin”) RStudio Addins rbin includes two RStudio addins for … Read moreBinning Data with rbin

A case where prospective matching may limit bias in a randomized trial

Analysis is important, but study design is paramount. I am involved with the Diabetes Research, Education, and Action for Minorities (DREAM) Initiative, which is, among other things, estimating the effect of a group-based therapy program on weight loss for patients who have been identified as pre-diabetic (which means they have elevated HbA1c levels). The original … Read moreA case where prospective matching may limit bias in a randomized trial

Statistics Sunday: Scatterplots and Correlations with ggpairs

As I conduct some analysis for a content validation study, I wanted to quickly blog about a fun plot I discovered today: ggpairs, which displays scatterplots and correlations in a grid for a set of variables. To demonstrate, I’ll return to my Facebook dataset, which I used for some of last year’s R analysis demonstrations. … Read moreStatistics Sunday: Scatterplots and Correlations with ggpairs

Data Science at AT&T Labs Research

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Noemi Derzsy, a Senior Inventive Scientist at AT&T Labs Research within the Data Science and AI Research organization. Hugo: Hi there, Noemi, and welcome to DataFramed. Noemi: Hi. Thank you for having me. Hugo: It’s a real pleasure to have you on the show, … Read moreData Science at AT&T Labs Research

The Future is now: Maschinelles Lernen in R für Fortgeschrittene

Maschinelles Lernen oder auch künstliche Intelligenz (aus dem Englischen: Machine Learning / Artificial Intelligence) basiert auf Algorithmen, die aus Beispielen und Erfahrungen zu lernen, ohne explizit programmiert zu werden. Anstatt Code zu schreiben, speisen Sie Daten in den Algorithmus ein, woraufhin dieser Zusammenhänge in den Daten „erlernt“ (daher der Begriff „maschinelles Lernen“). In den letzten … Read moreThe Future is now: Maschinelles Lernen in R für Fortgeschrittene

Ranking places with Google to create maps

Today we’re going to use the googleway R package, which allows their user to do requests to the GoogleMaps Places API. The goal is to create maps of specific places (restaurants, museums, etc.) with information from Google Maps rankings (number of stars given by other people). I already discussed this in french here to rank … Read moreRanking places with Google to create maps

This is not normal(ised)

“Sydney stations where commuters fall through gaps, get stuck in lifts” blares the headline. The story tells us that: Central Station, the city’s busiest, topped the list last year with about 54 people falling through gaps Wow! Wait a minute… Central Station, the city’s busiest Some poking around in the NSW Transport Open Data portal … Read moreThis is not normal(ised)

Bayesian Statistics: Analysis of Health Data

Categories Regression Models Tags Bayesian Analysis Linear Regression R Programming t-test The premise of Bayesian statistics is that distributions are based on a personal belief about the shape of such a distribution, rather than the classical assumption which does not take such subjectivity into account. In this regard, Bayesian statistics defines distributions in the following … Read moreBayesian Statistics: Analysis of Health Data

Community Call – Research Applications of rOpenSci Taxonomy and Biodiversity Tools

Our next Community Call, on March 27th, aims to help people learn about using rOpenSci’s R packages to access and analyze taxonomy and biodiversity data, and to recognize the breadth and depth of their applications. We also aim to learn from the discussion how we might improve these tools. Presentations will start with an introduction … Read moreCommunity Call – Research Applications of rOpenSci Taxonomy and Biodiversity Tools

Assessing Causality from Observational Data using Pearl’s Structural Causal Models

SCMs are graphs with nodes, directed edges, and functions mapping exogenous variables to endogenous ones. Denote \(U\) as the set of exogenous variables, \(V\) as the set of endogenous variables, and \(F\) as the set of functions mapping \(U\) to \(V\). A concrete example is: \(U = \{X, Y\}\) \(V = \{Z\}\) \(F = \{f_z\}\) … Read moreAssessing Causality from Observational Data using Pearl’s Structural Causal Models

A Summary of My Home-Brew Binning Algorithms for Scorecard Development

Thus far, I have published four different monotonic binning algorithms for the scorecard development and think that it might be a right timing to do a quick summary. R functions for these binning algorithms are also available on https://github.com/statcompute/MonotonicBinning. The first one was posted back in 2017 (https://statcompute.wordpress.com/2017/01/22/monotonic-binning-with-smbinning-package) based on my SAS macro (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) that … Read moreA Summary of My Home-Brew Binning Algorithms for Scorecard Development

Exploring swings in Australian federal elections by @ellis2013nz

Swings are far from uniform Last week I introduced a Bayesian state space model of two-party-preferred voting intention for Australian federal elections. It treats the surveys from various polling firms as imperfect (potentially systematically imperfect) measurements of an unobserved latent variable of “true” voting intention, which manifests itself only every few years in the form … Read moreExploring swings in Australian federal elections by @ellis2013nz

Fantasy Tips From the Fantasy King

AFL fantasy scores as opposed to supercoach scores are able to be fully derived from the box-score statistics. Why this is good it means if we wanted to, and I guess we do for the purposes of this post lets do a few things library(tidyverse) ## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.2.1 ── ## ✔ … Read moreFantasy Tips From the Fantasy King

(2/2) Book promotion (paperback edition) – “Processing and Analyzing Financial Data with R”

I received many messages regarding my book promotion (see previous post ). I’ll use this post to answer the most frequent questions: Does the paperback edition have a discount? No. The price drop is only valid for the ebook edition but not by choice. Unfortunately, Amazon does not let me do countdown promotions for the … Read more(2/2) Book promotion (paperback edition) – “Processing and Analyzing Financial Data with R”

Wrapper of knitr::include_graphics to Handle URLs & PDF Outputs

Those who use knitr::include_graphics() frequently in their R Markdown files may discover some inconsistencies (from the user point of view) if the same Rmd is used for multiple output formats, especially when PDF (LaTeX) is involved. The following code works fine for HTML outputs but fails when the outputs are PDFs: knitr::include_graphics(‘local.gif’) knitr::include_graphics(‘https://commonmark.org/images/markdown-mark.png’) The first … Read moreWrapper of knitr::include_graphics to Handle URLs & PDF Outputs

Building tidy tools workshop

Join RStudio Chief Data Scientist Hadley Wickham for his popular “Building tidy tools” workshop in Sydney, Australia! If you’d missed the sold out course at rstudio::conf 2019 now is your chance. Register here: https://www.rstudio.com/workshops/building-tidy-tools/ You should take this class if you have some experience programming in R and you want to learn how to tackle … Read moreBuilding tidy tools workshop