pcLasso: a new method for sparse regression

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description … Read morepcLasso: a new method for sparse regression

? R Coding Style Guide

Language is a tool that allows human beings to interact and communicate with each other. The clearer we express ourselves, the better the idea is transferred from our mind to the other. The same applies to programming languages: concise, clear and consistent codes are easier to read and edit. It is especially important, if you … Read more? R Coding Style Guide

Interpreting the coefficients of linear regression

Source: Unsplash Nowadays there is a plethora of machine learning algorithms we can try out to find the best fit for our particular problem. Some of the algorithms have clear interpretation, other work as a blackbox and we can use approaches such as LIME or SHAP to derive some interpretations. In this article I would … Read moreInterpreting the coefficients of linear regression

colorspace: New Tools for Colors and Palettes

A major update (version 1.4.0) of the R package colorspace has been released to CRAN, enhancing many of the package’s capabilities, e.g., more refined palettes, named palettes, ggplot2 color scales, visualizations for assessing palettes, shiny and Tcl/Tk apps, color vision deficiency emulation, and much more. Overview The colorspace package provides a broad toolbox for selecting … Read morecolorspace: New Tools for Colors and Palettes

Travis CI for R — Advanced guide

Travis CI for R — Advanced guide Continuous integration for building an R project in Travis CI including code coverage, pkgdown documentation, osx and multiple R-Versions Photo by Guilherme Cunha on Unsplash Travis CI is a common tool to build R packages. It is in my opinion the best platform to use R in continuous integration. Some of the … Read moreTravis CI for R — Advanced guide

Showing a difference in means between two groups

Visualising a difference in mean between two groups isn’t as straightforward as it should. After all, it’s probably the most common quantitative analysis in science. There are two obvious options: we can either plot the data from the two groups separately, or we can show the estimate of the difference with an interval around it. … Read moreShowing a difference in means between two groups

Medium + r-bloggers — How to integrate?

Medium + r-bloggers — How to integrate? Build up a PHP script that allows you to post your Medium articles on r-bloggers.com. The script filters an RSS feed by item tags. Photo by Ato Aikins on Unsplash Motivation I started my blog about R on Medium. Medium is a wonderful platform with a great user interface. The idea to … Read moreMedium + r-bloggers — How to integrate?

XmR Chart | Step-by-Step Guide by Hand and with R

Is your process in control? The XmR chart is a great statistical process control (SPC) tool that can help you answer this question, reduce waste, and increase productivity. We’ll cover the concepts behind XmR charting and explain the XmR control constant with some super simple R code. Lastly, we’ll cover how to make the XmR … Read moreXmR Chart | Step-by-Step Guide by Hand and with R

Generating Synthetic Data Sets with ‘synthpop’ in R

Synthpop – A great music genre and an aptly named R package for synthesising population data. I recently came across this package while looking for an easy way to synthesise unit record data sets for public release. The goal is to generate a data set which contains no real units, therefore safe for public release … Read moreGenerating Synthetic Data Sets with ‘synthpop’ in R

Making sense of the METS and ALTO XML standards

Last week I wrote a blog post where I analyzedone year of newspapers ads from 19th century newspapers. The data is made available by thenational library of Luxembourg.In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, whichcontains 10 years of publications of the L’Union, … Read moreMaking sense of the METS and ALTO XML standards

Practical Data Science with R, 2nd Edition discount!

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at Manning. The second edition isn’t finished yet, but chapters 1 through 4 are available in the Manning Early Access Program (MEAP), and we have finished … Read morePractical Data Science with R, 2nd Edition discount!

I walk the (train) line – part deux – the weight loss continues

(TL;DR: author continues to use his undiagnosed OCD for good. Breath-first search introduced on simple graph.) We learnt how to get OpenStreetMap data into R last time. And I said that we will be doing a little bit of this: So what the hell is this? This is an example of breadth-first search of a … Read moreI walk the (train) line – part deux – the weight loss continues

10 years of playback history on Last.FM: “Just sit back and listen”

Alright, seems like this is developing into a blog where I am increasingly investigating my own music listening habits.Recently, I’ve come across the analyzelastfm package by Sebastian Wolf. I used it to download my complete listening history from Last.FM for the last ten years. That’s a complete dataset from 2009 to 2018 with exactly 65,356 … Read more10 years of playback history on Last.FM: “Just sit back and listen”

How to combine Multiple ggplot Plots to make Publication-ready Plots

Categories Visualizing Data Tags Best R Packages Data Visualisation R Programming The life cycle of Data science can never be completed without communicating the results of the analysis/research. In fact, Data Visualization is one of the areas where R as a language for Data science has got an edge over the most-celebrated Python. With ggplot2 … Read moreHow to combine Multiple ggplot Plots to make Publication-ready Plots

GetDFPData Ver 1.4

I just released a major update to package GetDFPData. Here are the main changes: Naming conventions for caching system are improved so that it reflects different versions of FRE and DFP files. This means the old caching system no longer works. If you have built yourself your own cache folder with many companies, do clean … Read moreGetDFPData Ver 1.4

Parallelize a For-Loop by Rewriting it as an Lapply Call

A commonly asked question in the R community is: How can I parallelize the following for-loop? The answer almost always involves rewriting the for (…) { … } loop into something that looks like a y <- lapply(…) call. If you can achieve that, you can parallelize it via for instance y <- future.apply::future_lapply(…) or … Read moreParallelize a For-Loop by Rewriting it as an Lapply Call

Satellite imagery generation with Generative Adversarial Networks (GANs)

What are GANs? Some time ago, I showed you how to create a simple Convolutional Neural Network (ConvNet) for satellite imagery classification using Keras. ConvNets are not the only cool thing you can do in Keras, they are actually just the tip of an iceberg. Now,I think it’s about time to show you something more! … Read moreSatellite imagery generation with Generative Adversarial Networks (GANs)

pinp 0.0.7: More small YAML options

A good six months after the previous release, another small feature release of our pinp package for snazzier one or two column Markdown-based pdf vignettes got onto CRAN minutes ago as another [CRAN-pretest-publish] release indicating a fully automated process (as can be done for packages free of NOTES, WARNING, ERRORS, and without ‘changes to worse’ … Read morepinp 0.0.7: More small YAML options

Visualizing the Asian Cup with R!

Another year, another big soccer/football tournament! This time it’s thetop international competition in Asia, the Asian Cup hosted in theU.A.E. In this blog post I’ll be covering (responsible) web-scraping, data wrangling(tidyverse FTW!), and of course, data visualization with ggplot2. Let’s get started! Packages pacman::p_load(tidyverse, scales, lubridate, ggrepel, stringi, magick, glue, extrafont, rvest, ggtextures, cowplot, ggimage, … Read moreVisualizing the Asian Cup with R!

epubr 0.6.0 CRAN release

The epubr R package provides functions supporting the reading and parsing of internal e-book content from EPUB files. It has been updated to v0.6.0 on CRAN. This post highlights new functionality. The key improvements focus on cases where EPUB files have poorly arranged text when loaded into R as a result of their metadata entries … Read moreepubr 0.6.0 CRAN release

Roll Your Own Federal Government Shutdown-caused SSL Certificate Expiration Monitor in R

By now, even remote villages on uncharted islands in the Pacific know that the U.S. is in the midst of a protracted partial government shutdown. It’s having real impacts on the lives of Federal government workers but they aren’t the only ones. Much of the interaction Federal agencies have with the populace takes place online … Read moreRoll Your Own Federal Government Shutdown-caused SSL Certificate Expiration Monitor in R

Waffle Geoms & Other Miscellaneous In-Development Package Updates

More than just sergeant has been hacked on recently, so here’s a run-down of various updates: waffle The square pie chart generating waffle package now contains a nascent geom_waffle() so you can do things like this: library(hrbrthemes) library(waffle) library(tidyverse) tibble( parts = factor(rep(month.abb[1:3], 3), levels=month.abb[1:3]), values = c(10, 20, 30, 6, 14, 40, 30, 20, … Read moreWaffle Geoms & Other Miscellaneous In-Development Package Updates

Linguistic Signals of Album Quality: A Predictive Analysis of Pitchfork Review Scores Using Quanteda

In this post we will return to the Pitchfork music review data, parts of which I’ve analyzed in previous posts. Our goal here will be to use text mining and natural language processing (NLP) to understand linguistic signals of album quality. This type of analysis helps us understand what Pitchfork reviewers appreciate or dislike, and … Read moreLinguistic Signals of Album Quality: A Predictive Analysis of Pitchfork Review Scores Using Quanteda

My presentations on ‘Elements of Neural Networks & Deep Learning’ -Part1,2,3

I will be uploading a series of presentations on ‘Elements of Neural Networks and Deep Learning’. In these video presentations I discuss the derivations of L -Layer Deep Learning Networks, starting from the basics. The corresponding implementations are available in vectorized R, Python and Octave are available in my book ‘Deep Learning from first principles:Second … Read moreMy presentations on ‘Elements of Neural Networks & Deep Learning’ -Part1,2,3

Considering sensitivity to unmeasured confounding: part 2

In part 1 of this 2-part series, I introduced the notion of sensitivity to unmeasured confounding in the context of an observational data analysis. I argued that an estimate of an association between an observed exposure \(D\) and outcome \(Y\) is sensitive to unmeasured confounding if we can conceive of a reasonable alternative data generating … Read moreConsidering sensitivity to unmeasured confounding: part 2

baRcodeR 0.1.2 release – new linear barcodes

baRcodeR 0.1.2 is released on CRAN today! Download and install by install.packages(“baRcodeR”) Example linear barcode The major feature of this release is the ability to print linear (a.k.a normal) barcodes through specifying type = “linear” in create_PDF() rather than type = “matrix” which prints the usual QR code. The github repository is at yihanwu/baRcodeR. Minor … Read morebaRcodeR 0.1.2 release – new linear barcodes

Updated Review: jamovi User Interface to R

Introduction jamovi (spelled with a lower-case “j”) is a free and open source graphical user interface for the R software that targets beginners looking to point-and-click their way through analyses. It is available for Windows, Mac, Linux, and even ChromeOS. Versions are also planned for servers and tablets. This post is one of a series of reviews which … Read moreUpdated Review: jamovi User Interface to R

On the Road to 0.8.0 — Some Additional New Features Coming in the sergeant Package

It was probably not difficult to discern from my previous Drill-themed post that I’m fairly excited about the Apache Drill 1.15.0 release. I’ve rounded out most of the existing corners for it in preparation for a long-overdue CRAN update and have been concentrating on two helper features: configuring & launching Drill embedded Docker containers and … Read moreOn the Road to 0.8.0 — Some Additional New Features Coming in the sergeant Package

R NewYorkers Feeling the Holiday Spirit? Here’s Your Tip

Combining Pivot Billions with R to dive into whether the holiday spirit inspires bigger tips and which parts of New York experience this effect the most. The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect … Read moreR NewYorkers Feeling the Holiday Spirit? Here’s Your Tip

Animating Data Transformations: Part II

In our previous series on Animating Data Transformations, we showed you how to use gganimate to construct an animation which illustrates the process of going between tall and wide representations of data. Today, we will show the same procedure for constructing an animation of the unnest() function. The unnest() function takes a tibble containing a … Read moreAnimating Data Transformations: Part II

An Introduction to R— Merging and filtering data— Part 1

Data understanding by filtering and merging the 2019 Australian Tennis Open data for the Men’s tour. Photo by Christopher Burns on Unsplash You know it’s summer when the Australian Tennis Open visits Melbourne and everyone is excited that Roger and Serena are in town. Problem I am interested to predict who might win the 2019 Australian … Read moreAn Introduction to R— Merging and filtering data— Part 1

Understanding the maths of Computed Tomography (CT) scans

Noseman is having a headache and as an old-school hypochondriac he goes to see his doctor. His doctor is quite worried and makes an appointment with a radiologist for Noseman to get a CT scan. Modern CT scanner from Siemens Because Noseman always wants to know how things work he asks the radiologist about the … Read moreUnderstanding the maths of Computed Tomography (CT) scans

A deep dive into glmnet: offset

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will look at the offset option. For reference, here is the full signature of the glmnet function: glmnet(x, y, family=c(“gaussian”,”binomial”,”poisson”,”multinomial”,”cox”,”mgaussian”), … Read moreA deep dive into glmnet: offset

Dow Jones Stock Market Index (4/4): Trade Volume GARCH Model

Categories Advanced Modeling Tags Data Visualisation Linear Regression R Programming This is the final part of the 4-series posts. In this fourth post, I am going to build an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily trade volume log ratio. You can read the other three parts in the following links: part 1, … Read moreDow Jones Stock Market Index (4/4): Trade Volume GARCH Model

An even better rOpenSci website with Hugo

A bit more than one year ago, rOpenSci launched its new website design, by the designer Maru Lango. Not only did the website appearance change (for the better!), but the underlying framework too. ropensci.org is powered by Hugo, like blogdown! Over the last few months, we’ve made the best of this framework, hopefully improving your … Read moreAn even better rOpenSci website with Hugo

How do Convolutional Neural Nets (CNNs) learn? + Keras example

In this lesson, I am going to explain how computers learn to see; meaning, how do they learn to recognize images or object on images? One of the most commonly used approaches to teach computers “vision” are Convolutional Neural Nets. This lesson builds on top of two other lessons: Computer Vision Basics and Neural Nets. … Read moreHow do Convolutional Neural Nets (CNNs) learn? + Keras example

You did a sentiment analysis with tidytext but you forgot to do dependency parsing to answer WHY is something positive/negative

A small note on the growing list of users of the udpipe R package. In the last month of 2018, we’ve updated the package on CRAN with some noticeable changes The default models which are now downloaded with the function udpipe_download_model are now models built on Universal Dependencies 2.3 (released on 2018-11-15) This means udpipe … Read moreYou did a sentiment analysis with tidytext but you forgot to do dependency parsing to answer WHY is something positive/negative

A Beautiful 2 by 2 Matrix Identity

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations for the Moore–Penrose inverse and … Read moreA Beautiful 2 by 2 Matrix Identity

Dow Jones Stock Market Index (3/4): Log Returns GARCH Model

Categories Advanced Modeling Tags Data Visualisation Linear Regression R Programming In this third post, I am going to build an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily log-returns. You can read the first and second part which I published previously. Packages The packages being used in this post series are herein listed. suppressPackageStartupMessages(library(lubridate)) … Read moreDow Jones Stock Market Index (3/4): Log Returns GARCH Model