Should Old Acquaintance be Forgot: Tidying up Mac Mail

As the year is closing down, why not spend some of the free time to explore your email data using R and the tidyverse? When I learned that Mac OS Mail stores its internal data in a SQLite database file I was hooked. A quick dive in your email archive might uncover some of your … Read more

Categories R Tags ExcerptFavorite

Some fun with {gganimate}

Your browser does not support the video tag. In this short blog post I show you how you can use the {gganimate} package to create animationsfrom {ggplot2} graphs with data from UNU-WIDER. WIID data Just before Christmas, UNU-WIDER released a new edition of their World Income Inequality Database: *NEW #DATA*We’ve just released a new version … Read more

Categories R Tags ExcerptFavorite

Le Monde puzzle [#1076]

A cheezy Le Monde mathematical puzzle : (which took me much longer to find [in the sense of locating] than to solve, as Warwick U does not get a daily delivery of the newspaper [and this is pre-Brexit!]): Take a round pizza (or a wheel of Gruyère) cut into seven identical slices and turn one … Read more

Categories R Tags ExcerptFavorite

Very shiny holidays!

How would I miss to program just a little bit during the holiday season? But I didn’t want to work on something serious, so I decided to checkout some ground work on R-Shiny + JQuery + CSS. The result are some nice holiday greetings inside a shiny app: An app to greet your family with shiny I … Read more

Categories R Tags ExcerptFavorite

Finally, You Can Plot H2O Decision Trees in R

Creating and plotting decision trees (like one below) for the models created in H2O will be main objective of this post: Figure 1. Decision Tree Visualization in R Decision Trees with H2O With release 3.22.0.1 H2O-3 (a.k.a. open source H2O or simply H2O) added to its family of tree-based algorithms (which already included DRF, GBM, … Read more

Categories R Tags ExcerptFavorite

Statistical Assessments of AUC

In the scorecard development, the area under ROC curve, also known as AUC, has been widely used to measure the performance of a risk scorecard. Given everything else equal, the scorecard with a higher AUC is considered more predictive than the one with a lower AUC. However, little attention has been paid to the statistical … Read more

Categories R Tags ExcerptFavorite

Rolling Origins and Fama French

Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets. In the machine learning prediction world, these two data sets are often called training data and testing data, but we’re not going … Read more

Categories R Tags ExcerptFavorite

Optimism corrected bootstrapping: a problematic method

There are lots of ways to assess how predictive a model is while correcting for overfitting. In Caret the main methods I use are leave one out cross validation, for when we have relatively few samples, and k fold cross validation when we have more. There also is another method called ‘optimism corrected bootstrapping’, that … Read more

Categories R Tags ExcerptFavorite

Pivot Billions and Deep Learning enhanced trading models achieve 100% net profit

Deep Learning has revolutionized the fields of image classification, personal assistance, competitive board game play, and many more. However, the financial currency markets have been surprisingly stagnant. In our efforts to create a profitable and accurate trading model, we came upon the question: what if financial currency data could be represented as an image? The … Read more

Categories R Tags ExcerptFavorite

Dreaming of a white Christmas – with ggmap in R

With the holidays approaching, one of the most discussed questions at STATWORX was whether we’ll have a white Christmas or not. And what better way to get our hopes up, than by taking a look at the DWD Climate Data Center’s historic data on the snow depth on the past ten Christmas Eves? But how … Read more

Categories R Tags ExcerptFavorite

The Need for Speed Part 2: C++ vs. Fortran vs. C

In my previous post, I described the method I use for compiling Fortran (or C) into an R package using the .Call interface. This post will compare the speed of various implementations of the layer loss cost function. Often, insurance or reinsurance is bought in stratified horizontal layers. For example, an auto policy with a … Read more

Categories R Tags ExcerptFavorite

Objects types and some useful R functions for beginners

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read forfree here. This is taken from Chapter 2, which explainsthe different R objects you can manipulate as well as some functions to get you started. Objects, types and useful R functions to get started All objects in … Read more

Categories R Tags ExcerptFavorite

Text classification with tidy data principles

I am an enthusiastic proponent of using tidy data principles for dealing with text data. This kind of approach offers a fluent and flexible option not just for exploratory data analysis, but also for machine learning for text, including both unsupervised machine learning and supervised machine learning. I haven’t written much about supervised machine learning … Read more

Categories R Tags ExcerptFavorite

R 101

HAPPY HOLIDAYS!!!?⛄??❄ In the spirit of the coming new year and new beginnings, we created a tutorial for getting started or restarted with R. If you are new to R or have dabbled in R but haven’t used it much recently, then this post is for you. We will focus on data classes and types, … Read more

Categories R Tags ExcerptFavorite

Certifiably Gone Phishing

Phishing is [still] the primary way attackers either commit a primary criminal act (i.e. phish a target to, say, install ransomware) or is the initial vehicle used to gain a foothold in an organization so they can perform other criminal operations to achieve some goal. As such, security teams, vendors and active members of the … Read more

Categories R Tags ExcerptFavorite

Custom JavaScript, CSS and HTML in Shiny

In this tutorial, I will cover how to include custom JavaScript, CSS and HTML code in your R shiny app. By including them, you can make a very powerful professional web app using R. First let’s understand basics of a Webpage In general, web page contains the following section of details. Content (Header, Paragraph, Footer, … Read more

Categories R Tags ExcerptFavorite

ShinyProxy Christmas Release

ShinyProxy is a novel, open source platform to deploy Shiny apps for the enterpriseor larger organizations. Since our previous blog post five releases took place, so it is time to provide the‘state of affairs’ before venturing into the New Year. Kerberos and Co To some Kerberos is a multi-headed dog that guards the gates of … Read more

Categories R Tags ExcerptFavorite

The Bear is Here

October and December have been devastating for stocks. It wasn’t until Friday though that we officially reached the depths of a bear market. There are different theories, the most common is 20% pullback in an index. As readers of this blog are aware, I follow a slightly different definition, based on Jack Schannep’s work. Based … Read more

Categories R Tags ExcerptFavorite

Simulating Persian Monarchs gameplay by @ellis2013nz

‘I can teach you in a minute…’ In a recent post I simulated some simple dice games and promised (or threatened) that this was the first of a series of posts about games of combined luck and chance. The main aim of that post was to show how even simple probabilistic games can become complicated … Read more

Categories R Tags ExcerptFavorite

Day 22 – little helper get_files

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

Re-creating a Voronoi-Style Map with R

I’ve written some “tutorial”-like content recently—seehere,here, andhere—butI’ve been lacking on ideas for “original” content since then. With that said,I thought it would to try to re-create something with R. (Not too long agoI saw thatAndrew Heiss did something akin to this withCharles Minard’s well-known visualization of Napoleon’s 1812.) The focus of my re-creation here is … Read more

Categories R Tags ExcerptFavorite

Bubble Packed Chart with R using packcircles package

Tableau has chart type called “Packed Bubble Chart”, while I haven’t really utilized packed bubble chart much, I always thought they are fun and beautiful. I wanted to try creating same chart using R, and I came across package called packcircles. Reading vignettes was really helpful to figure out how to use the package!!– introduction … Read more

Categories R Tags ExcerptFavorite

The Riddler: Santa Needs Some Help With Math

For the last Riddler of the year I attempt to solve both the Express and Classic Riddlers! Riddler Express: How Long Will it Take Santa to Place his Reindeer? We need to find out how long it will take Santa to place his reindeer in the correct sled positions, if he proceeds at random. It … Read more

Categories R Tags ExcerptFavorite

Does imputing model labels using the model predictions can improve it’s performance?

In some scenarios a data scientist may want to train a model for which there exists an abundance of observations, but only a small fraction of is labeled, making the sample size available to train the model rather small. Although there’s plenty of literature on the subject (e.g. “Active learning”, “Semi-supervised learning” etc) one may … Read more

Categories R Tags ExcerptFavorite

Gold-Mining Week 16 (2018)

The post Gold-Mining Week 16 (2018) appeared first on Fantasy Football Analytics. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and … Read more

Categories R Tags ExcerptFavorite

November 2018: “Top 40” New Packages

Having absorbed an average of 181 new packages each month over the last 28 months, CRAN is still growing at a pretty amazing rate. The following plot shows the number of new packages since I started keeping track in August 2016. This November, 171 new packages stuck to CRAN. Here is my selection for the … Read more

Categories R Tags ExcerptFavorite

Blogdown – shortcode for radix-like Bibtex

In the spirit of trying out new things in Hugo since my last post on modifying the RSS feed for this website, I attempted to implement the new citation feature from the new radix package by RStudio. Essentially, I tried using a custom hugo shortcode to replicate the text and BibTex citation at the bottom … Read more

Categories R Tags ExcerptFavorite

R 3.5.2 now available

R 3.5.2, the latest version of the R language for statistical computation and graphics from the R Foundation, was released today. (This release is codenamed “Eggshell Igloo”, likely in reference to this or this Peanuts cartoon.) Compared to R 3.5.1, this update includes only bug fixes, so R scripts and packages compatible with R 3.5.0 … Read more

Categories R Tags ExcerptFavorite

Your AI journey… and Happy Holidays!

I want to draw your attention to a very valuable (and short!) whitepaper from my colleague, Professor Andrew Ng, where he shares important insights on how to lead companies into the AI era. The five steps outlined in the paper are: Execute pilot projects to gain momentum Build an in-house AI team Provide broad AI … Read more

Categories R Tags ExcerptFavorite

Day 20 – little helper char_replace

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

BH 1.69.0-0 pre-releases and three required changes

Our BH package provides a sizeable portion of the Boost C++ libraries as a set of template headers for use by R. It is quite popular, and frequently used together with Rcpp. The BH CRAN page shows e.g. that it is used by rstan, dplyr as well as a few other packages. The current count … Read more

Categories R Tags ExcerptFavorite

Examining the Tweeting Patterns of Prominent Crossfit Gyms

A. Introduction The growth of Crossfit has been one of the biggest developments in the fitness industry over the past decade. Promoted as both a physical exercise philosophy and also as a competitive fitness sport, Crossfit is a high-intensity fitness program incorporating elements from several sports and exercise protocols such as high-intensity interval training, Olympic weightlifting, … Read more

Categories R Tags ExcerptFavorite

Spelling 2.0: Improved Markdown and RStudio Support

We have released updates for the rOpenSci text analysis tools. This technote will highlight some of the major improvements in the spelling package and also the underlying hunspell package, which provides the spelling engine for the spelling package. install.packages(“spelling”) Update to the latest versions to use these cool new features! Upcoming version of #rstats spelling … Read more

Categories R Tags ExcerptFavorite

How to Scrape Data from a JavaScript Website with R

In September 2017, I found myself working on a project that required odds data for football. At the time I didn’t know about resources such as Football-Data or the odds-api, so I decided to build a scraper to collect data directly from the bookmakers. However, most of them used JavaScript to display their odds, so … Read more

Categories R Tags ExcerptFavorite

Data, movies and ggplot2

Yet another boring barplot?No!I’ve asked my students from MiNI WUT to visualize some data about their favorite movies or series.Results are pretty awesome.Believe me or not, but charts in these posters are created with ggplot2 (most of them)! Star Wars Fan of StaR WaRs? Find out which color is the most popular for lightsabers!Yes, these … Read more

Categories R Tags ExcerptFavorite

Spinning Pins

Condenado a estar toda la vida, preparando alguna despedida (Desarraigo, Extremoduro) I live just a few minutes from the Spanish National Museum of Science and Technology (MUNCYT), where I use to go from time to time with my family. The museum is plenty of interesting artifacts, from a portrait of Albert Einstein made with thousands … Read more

Categories R Tags ExcerptFavorite

My R take on Advent of Code – Day 2

This is my second blog post from the series of My R take on Advent of Code. If you’d like to know more about Advent of Code, check out the first post from the series or simply go to their website. Below you’ll find the challnge from Day 2 and the solution that worked for … Read more

Categories R Tags ExcerptFavorite

All the (NBA) box scores you ever wanted

In this previous post, I showed how one can scrape top-level NBA game data from BasketballReference.com. In the post after that, I demonstrated how to scrape play-by-play data for one game. After writing those posts, I thought to myself: why not do both? And that is what I did: scrape all the box scores for … Read more

Categories R Tags ExcerptFavorite

So you want to play a pRank in R…?

So…you want to play a pRank with R? This short post will give you a fun function you can use in R to help you out! How to change a file’s modified time with R Let’s say we have a file, test.txt. What if we want to change the last modified date of the file … Read more

Categories R Tags ExcerptFavorite

Alternative approaches to scaling Shiny with RStudio Shiny Server, ShinyProxy or custom architecture.

Shiny is a great tool for fast prototyping. When a data science team creates a Shiny app, sometimes it becomes very popular. From that point this app becomes a tool used on production by many people, that should be reliable and work fast for many concurrent users. There are many ways to optimize a Shiny app like … Read more

Categories R Tags ExcerptFavorite

vtreat Variable Importance

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes). … Read more

Categories R Tags ExcerptFavorite

Statistics in Glaucoma: Part III

Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science. Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University. Looking Forward in Glaucoma Progression Research The contribution of the womblR package and corresponding statistical methodology is a technique for correctly accounting … Read more

Categories R Tags ExcerptFavorite

rcites – The story behind the package

The Ecology Hackathon Almost one year ago now, ecologists filled a room for the “Ecology Hackathon: Developing R Packages for Accessing, Synthesizing and Analyzing Ecological Data” that was co-organised by rOpenSci Fellow, Nick Golding and Methods in Ecology and Evolution. This hackathon was part of the “Ecology Across Borders” Joint Annual Meeting 2017 of BES, … Read more

Categories R Tags ExcerptFavorite

An R Shiny app to recognize flower species

Introduction Playing around with PyTorch and R Shiny resulted in a simple Shiny app where the user can upload a flower image, the system will then predict the flower species. Steps that I took Download labeled flower data from the Visual Geometry Group, Install Pytorch and download their transfer learning tutorial script, You need to … Read more

Categories R Tags ExcerptFavorite

Day 17 – little helper to_na

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

Phillips-Ouliaris Test For Cointegration

In a project of developing PPNR balance projection models, I tried to use the Phillips-Ouliaris (PO) test to investigate the cointegration between the historical balance and a set of macro-economic variables and noticed that implementation routines of PO test in various R packages, e.g. urca and tseries, would give different results. After reading through the … Read more

Categories R Tags ExcerptFavorite

My R take on Advent of Code – Day 1

Ho, ho, ho! It’s almost Christmas time and I don’t know about you, but I can’t wait for it! And what can be a better way of killing the waiting time (advent!) than participating in excellent Advent od Code. Big thanks to Colin Fay for telling me about it! It’s a series of coding riddles, … Read more

Categories R Tags ExcerptFavorite