Cluster multiple time series using K-means

I have been recently confronted to the issue of finding similarities among time-series and thoughabout using k-means to cluster them. To illustrate the method, I’ll be using data from thePenn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, let’s only select the needed columns: pwt <- … Read moreCluster multiple time series using K-means

Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

[This article was first published on Jozef’s Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous part of this series, we looked at writing R … Read moreUsing Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

Explaining Predictions: Boosted Trees Post-hoc Analysis (Xgboost)

[This article was first published on R on notast, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’ve covered various approaches in explaining model predictions globally. Today we … Read moreExplaining Predictions: Boosted Trees Post-hoc Analysis (Xgboost)

Back in the GSSR

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The General Social Survey, or GSS, is one of the cornerstones … Read moreBack in the GSSR

rBokeh – Tipps and Tricks with JavaScript and beyond!

You want to have a nice interactive visualization, e.g., in RShiny, where you can zoom in and out, subset the plot or even have hover and click effects? I guess your first shot would be plotly, and rightfully so. However, there is also a great alternative: rBokeh. Lately, I had a project where the functionality … Read morerBokeh – Tipps and Tricks with JavaScript and beyond!

Building Interactive World Maps in Shiny

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this post, I will show you how to create interactive world maps and how to … Read moreBuilding Interactive World Maps in Shiny

cran checks API: an update

If you have an R package on CRAN, you probably know about CRAN checks. Each package on CRAN, that is not archived on CRAN, has a checks page, like this one for ropenaq:https://cloud.r-project.org/web/checks/check_results_ropenaq.html The table above is results of running R CMD CHECK on the package on a combination of different operating systems, R versions … Read morecran checks API: an update

Wikipedia Page View Statistics Late 2007 and Beyond

[This article was first published on petermeissner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This blog post covers the major release of the {wikipediatrend} package – namely … Read moreWikipedia Page View Statistics Late 2007 and Beyond

Part II: Deploying a Dash Application to Operationalize Machine Learning Models

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Note: This guide is intended for use with version 3.3 … Read morePart II: Deploying a Dash Application to Operationalize Machine Learning Models

`MCMCvis` 0.13.1 – HPD intervals, multimodel visualization, and more!

[This article was first published on R – Lynch Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. MCMCvis version 0.13.1 is now on CRAN, complete with new … Read more`MCMCvis` 0.13.1 – HPD intervals, multimodel visualization, and more!

Climbing Mt. Whitney with web browser automation and R

Mount Whitney is the tallest mountain in the contiguous United States and you need a permit to climb it. These permits are limited. But from time to time, somebody will return his permit. It will show up on the permit website recreation.cov. I wanted to get one of those and will tell you how. A … Read moreClimbing Mt. Whitney with web browser automation and R

Programming for the rest of us: Introducing a new R package to help solve problems

As a self trained, informal, fly by the seat of my pants programming, I am hardly the “best” person to give advice about teaching other how to program. I have no formal background in computer science and have never had anyone teach me to program, it’s just been something that I’ve played around in by … Read moreProgramming for the rest of us: Introducing a new R package to help solve problems

Super Solutions for Shiny Architecture 3/5: Softcoding Constants in the App

TL;DR Two methods for keeping your Shiny app organized while avoiding hardcoding. Softcoding Constants in a Shiny App They can be found everywhere. Text on buttons, urls to be linked, some numeric thresholds, a font to be used on ggplot, technical IDs to business names mapping, column names from datasets… We are all (at least … Read moreSuper Solutions for Shiny Architecture 3/5: Softcoding Constants in the App

Cambridge Analytica: Microtargeting or How to catch voters with the LASSO

The two most disruptive political events of the last few years are undoubtedly the Brexit referendum to leave the European Union and the election of Donald Trump. Both are commonly associated with the political consulting firm Cambridge Analytica and a technique known as Microtargeting. If you want to understand the data science behind the Cambridge … Read moreCambridge Analytica: Microtargeting or How to catch voters with the LASSO

AI, Machine Learning and Data Science Roundup: September/October 2019

A roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted recently. Open Source AI, ML & Data Science News Tensorflow 2.0.0 has been released. This major update makes many changes to improve … Read moreAI, Machine Learning and Data Science Roundup: September/October 2019

Web Scraping Product Data in R with rvest and purrr

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This article comes from Joon Im, a student in Business Science University. Joon … Read moreWeb Scraping Product Data in R with rvest and purrr

The 5 Packages You Should Know for Text Analysis with R

A Complete Overview of the Most Useful Packages in R Data Scientists Should Know About for Text Analysis Photo by Patrick Tomasso on Unsplash install.packages(“quanteda”)library(quanteda) Quanteda is the go-to package for quantitative text analysis. Developed by Kenneth Benoit and other contributors, this package is a must for any data scientist doing text analysis. Why? Because … Read moreThe 5 Packages You Should Know for Text Analysis with R

Model Explanation with BMuCaret Shiny Application using the IML and DALEX Packages

Category Tags With the BMuCaret Shiny app post you can perform Caret-models comparison and find the best one, which achieves the best performance for a given data set. But isn’t it true that “All models are wrong”? By conducting comparison of predictive models (e.g. by using my tool BMuCaret) the goal of this experience is … Read moreModel Explanation with BMuCaret Shiny Application using the IML and DALEX Packages

Using R: When weird errors occur in packages that used to work, check that you’re not feeding them a tibble

There are some things that are great about the tidyverse family of R packages and the style they encourage. There are also a few gotchas. Here’s a reminder to myself about this phenomenon: tidyverse-style data frames (”tibbles”) do not simplify to vectors upon extracting a single column with hard bracket indexing. Because some packages rely … Read moreUsing R: When weird errors occur in packages that used to work, check that you’re not feeding them a tibble

vtreat Cross Validation

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Nina Zumel finished new documentation on how vtreat‘s cross validation … Read morevtreat Cross Validation

Exploring Highcharts in R

Visualizing trends & patterns using data from ‘How I met your mother’ This article originally started out as exploratory quest to understand data visually in R. What subsequently followed was scrambles of experimenting, testing and making leaps into the dark. During this period I also re-watched my favorite show — How I met your mother, … Read moreExploring Highcharts in R

Updated AUPolitics project to work with S3 instead of DB

Some time ago I developed a little project that collects Aussie politicians tweets and present several visualizations. It’s available at https://rserv.levashov.biz/shiny/rstudio/ That time I has quite generous credits from AWS, so haven’t worried too much about costs. Unfortunately credits are about to expire, so I had to optimize the tech stack a bit to reduce … Read moreUpdated AUPolitics project to work with S3 instead of DB

Estimating the carbon cost of psycholinguistics conferences

Estimating the carbon cost of psycholinguistics conferences Shravan Vasishth 10/5/2019 Note: If I have made some calculation error, please point it out and I will fix it. At the University of Potsdam we are discussing how to reduce our carbon footprint in science-related work. One thought I had was that we could reduce our carbon … Read moreEstimating the carbon cost of psycholinguistics conferences

Split-apply-combine for Maximum Likelihood Estimation of a linear model

Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot ineconometrics and other sciences, but seems, at least to my knowledge, to not be so well known bymachine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to dataused in econometrics … Read moreSplit-apply-combine for Maximum Likelihood Estimation of a linear model

A full RStudio Server setup for Data Science in 5 minutes

[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the end of the post there is a promotional link for free … Read moreA full RStudio Server setup for Data Science in 5 minutes

Expanding binomial counts to binary 0/1 with purrr::pmap()

Data on successes and failures can be summarized and analyzed as counted proportions via the binomial distribution or as long format 0/1 binary data. I most often see summarized data when there are multiple trials done within a study unit; for example, when tallying up the number of dead trees out of the total number … Read moreExpanding binomial counts to binary 0/1 with purrr::pmap()

Colonizing Franky

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Y otra vez me arranco despacito, al sentir que nada necesito … Read moreColonizing Franky

Forecast Stability Guidance for Model Selection

In real world forecasting task, we don’t have luxury of actuals in hand for better model selection, in such realistic situations, forecast stability can guide us to some extent. Forecast Stability in simple terms, is all about how forecasts behave versus forecasts, we can measure it with simple coefficient of variation. This measure also helps … Read moreForecast Stability Guidance for Model Selection

R Forwards Workshop: Package Development for women and other underrepresented groups

[This article was first published on Emma R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Date: Tuesday, 7th January, 2020, 10:30-15:30 Location: University of York, UK Forwards … Read moreR Forwards Workshop: Package Development for women and other underrepresented groups