Permutation Feature Importance (PFI) of GRNN

In the post https://statcompute.wordpress.com/2019/10/13/assess-variable-importance-in-grnn, it was shown how to assess the variable importance of a GRNN by the decrease in GoF statistics, e.g. AUC, after averaging or dropping the variable of interest. The permutation feature importance evaluates the variable importance in a similar manner by permuting values of the variable, which attempts to break the … Read morePermutation Feature Importance (PFI) of GRNN

Partial Dependence Plot (PDP) of GRNN

The function grnn.margin() (https://github.com/statcompute/yager/blob/master/code/grnn.margin.R) was my first attempt to explore the relationship between each predictor and the response in a General Regression Neural Network, which usually is considered the Black-Box model. The idea is described below: First trained a GRNN with the original training dataset Created an artificial dataset from the training data by keeping … Read morePartial Dependence Plot (PDP) of GRNN

How confident are you? Assessing the uncertainty in forecasting

Introduction Some people think that the main idea of forecasting is in predicting the future as accurately as possible. I have bad news for them. The main idea of forecasting is in decreasing the uncertainty. Think about it: any event that we want to predict has some systematic components \(\mu_t\), which could potentially be captured … Read moreHow confident are you? Assessing the uncertainty in forecasting

Vignette: Google Trends with the gtrendsR package

Background Google Trends is a well-known, free tool provided by Google that allows you to analyse the popularity of top search queries on its Google search engine. In market exploration work, we often use Google Trends to get a very quick view of what behaviours, language, and general things are trending in a market. And … Read moreVignette: Google Trends with the gtrendsR package

Practical Data Science with R 2nd Edition update

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are in the last stages of proofing the galleys/typesetting … Read morePractical Data Science with R 2nd Edition update

Job: Junior Systems Administrator (with a focus on R/Python)

[This article was first published on r – Jumping Rivers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Jumping Rivers is a data science consultancy company focused on … Read moreJob: Junior Systems Administrator (with a focus on R/Python)

rBokeh – Don’t be stopped by missing arguments!

[This article was first published on r-bloggers | STATWORX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In my last article on the STATWORX blog, I have guided … Read morerBokeh – Don’t be stopped by missing arguments!

Repetitive Q: Reading Multiple Files in the Zip Folder

Dear Readers, I always see a repetitive question coming to me and across various forums on how to read multiple files in the zip folder of same separator or multiple separator. Again, here, lets not compromise on speed. Solution is to use easycsv package in R, which in turn uses data.table package function “fread”.Find below … Read moreRepetitive Q: Reading Multiple Files in the Zip Folder

Non-Gaussian forecasting using fable

[This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. library(tidyverse) library(tsibble) library(lubridate) library(feasts) library(fable) In my previous post … Read moreNon-Gaussian forecasting using fable

Map coloring: the color scale styles available in the tmap package

This vignette builds on the making maps chapter of the Geocomputation with R book.Its goal is to demonstrate all possible map styles available in the tmap package. Prerequisites The examples below assume the following packages are attached: library(spData) # example datasets library(tmap) # map creation library(sf) # spatial data reprojection The world object containing a … Read moreMap coloring: the color scale styles available in the tmap package

2 Months in 2 Minutes – rOpenSci News, October 2019

rOpenSci HQ What would you like to hear about in an rOpenSci Community Call? We are soliciting your “votes” and new ideas for Community Call topics and speakers. Find out how you can influence us by checking out our new Community Calls repository. Videos, speaker’s slides, resources and collaborative notes from our Community Call on … Read more2 Months in 2 Minutes – rOpenSci News, October 2019

Advancing Text Mining with R and quanteda

Known categories: Dictionaries Dictionaries contain lists of words that correspond to different categories. If we apply a dictionary approach, we count how often words that are associated with different categories are represented in each document. These dictionaries help us to classify (or categorize) the speeches based on the frequency of the words that they contain. … Read moreAdvancing Text Mining with R and quanteda

Using bwimge R package to describe patterns in images of natural structures

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This tutorial illustrates how to use the bwimge R package (Biagolini-Jr 2019) to … Read moreUsing bwimge R package to describe patterns in images of natural structures

New package: GetEdgarData

Example 01 – Apples Quarterly Net Profit The first step in using GetEdgarData is finding information about available companies: library(GetEdgarData) library(tidyverse) my_year <- 2018 type_form <- ’10-K’ df_info <- get_info_companies(years = my_year, type_data = ‘yearly’, type_form = type_form) glimpse(df_info) ## Observations: 450 ## Variables: 13 ## $ current_name “AIR PRODUCTS & CHEMICALS INC /DE/”, “ALICO … Read moreNew package: GetEdgarData

Le Monde puzzle [#1114]

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Another very low-key arithmetic problem as Le Monde current mathematical … Read moreLe Monde puzzle [#1114]

upcoming AI-related courses

I forgot to do some marketing for the following upcoming AI-related courses which will be given in Leuven, Belgium by BNOSAC 2019-10-17&18: Statistical Machine Learning with R: Subscribe here 2019-11-14&15: Text Mining with R: Subscribe here 2019-12-17&18: Applied Spatial Modelling with R: Subscribe here 2020-02-19&20: Advanced R programming: Subscribe here 2020-03-12&13: Computer Vision with R … Read moreupcoming AI-related courses

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are excited to share a free extract of Zumel, … Read moreFree R/datascience Extract: Evaluating a Classification Model with a Spam Filter

Parsing Sda Pages

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. SDA is a suite of software developed at Berkeley for the … Read moreParsing Sda Pages

Super Solutions for Shiny Apps #4 of 5: Using R6 Classes

TL;DR Why use object-oriented programming in Shiny applications? It’ll help organizize organize the code in your application!   Organize Your Shiny Code with Object-Oriented Programming Classes are used widely in all R programming — usually the S3 ones. Even if you’ve never heard of them, as an R user you’re for sure familiar with object classes … Read moreSuper Solutions for Shiny Apps #4 of 5: Using R6 Classes

JAMA retraction after miscoding – new Finalfit function to check recoding

[This article was first published on R – DataSurg, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Riinu and I are sitting in Frankfurt airport discussing the paper … Read moreJAMA retraction after miscoding – new Finalfit function to check recoding

Merge MLP And CNN in Keras

In the post (https://statcompute.wordpress.com/2017/01/08/an-example-of-merge-layer-in-keras), it was shown how to build a merge-layer DNN by using the Keras Sequential model. In the example below, I tried to scratch a merge-layer DNN with the Keras functional API in both R and Python. In particular, the merge-layer DNN is the average of a multilayer perceptron network and a … Read moreMerge MLP And CNN in Keras

Strange Attractors: an R experiment about maths, recursivity and creative coding

[This article was first published on R on Coding Club UC3M, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. by Antonio Sánchez Learning to code can be quite … Read moreStrange Attractors: an R experiment about maths, recursivity and creative coding

What are Your Use Cases for rOpenSci Tools and Resources?

We want to know how you use rOpenSci packages and resources so we can give them, their developers, and your examples more visibility. It’s valuable to both users and developers of a package to see how it has been used “in the wild”. This goes a long way to encouraging people to keep up development … Read moreWhat are Your Use Cases for rOpenSci Tools and Resources?

Shiny 1.4.0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Shiny 1.4.0 has been released! This release mostly focuses on under-the-hood fixes, … Read moreShiny 1.4.0

Automatic data types checking in predictive models

The problem: We have data, and we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types.Many strange errors appear when we are creating models just because of data format. The new version of funModeling 1.9.3 (Oct 2019) aimed to provide quick and clean assistance on … Read moreAutomatic data types checking in predictive models

Did Russia Use Manafort’s Polling Data in 2016 Election?

Introduction: On August 2, 2016 then Trump campaign manager, Paul Manafort, gave polling data to Konstantin Kalimnik a Russian widely assumed to be a spy. Before then Manafort ordered his protege, Rick Gates, to share polling data with Kilmnik. Gates periodically did so starting April or May. The Mueller Report stated it did not know … Read moreDid Russia Use Manafort’s Polling Data in 2016 Election?

Rename Columns | R

Often data you’re working with has abstract column names, such as (x1, x2, x3…). Typically, the first step I take when renaming columns with r is opening my web browser.  For some reason no matter the amount of times doing this it’s just one of those things. (Hoping that writing about it will change that) … Read moreRename Columns | R

easyMTS: My First R Package (Story, and Results)

This weekend I decided to create my first R package… it’s here! https://github.com/NicoleRadziwill/easyMTS Although I’ve been using R for 15 years, developing a package has been the one thing slightly out of reach for me. Now that I’ve been through the process once, with a package that’s not completely done (but at least has a … Read moreeasyMTS: My First R Package (Story, and Results)

Hyper-Parameter Optimization of General Regression Neural Networks

A major advantage of General Regression Neural Networks (GRNN) over other types of neural networks is that there is only a single hyper-parameter, namely the sigma. In the previous post (https://statcompute.wordpress.com/2019/07/06/latin-hypercube-sampling-in-hyper-parameter-optimization), I’ve shown how to use the random search strategy to find a close-to-optimal value of the sigma by using various random number generators, including … Read moreHyper-Parameter Optimization of General Regression Neural Networks

My First R Package (Part 2)

In Part 1, I set up RStudio with usethis, and created my first Minimum Viable R Package (MVRP?) which was then pushed to Github to create a new repository. I added a README: > use_readme_rmd() ✔ Writing ‘README.Rmd’ ✔ Adding ‘^README\\.Rmd$’ to ‘.Rbuildignore’ ● Modify ‘README.Rmd’ ✔ Writing ‘.git/hooks/pre-commit’ Things were moving along just fine, … Read moreMy First R Package (Part 2)

A Shiny Intro Survey to an Open Science Course

Last week, we started a new course titled “Statistical Programming and OpenScience Methods”. It is being offered under the research program ofTRR 266 “Accounting for Transparency”and enables students to conduct data-based research so that others can contributeand collaborate. This involves making research data and methods FAIR(findable, accessible, interoperable and reusable) and results reproducible.All the materials … Read moreA Shiny Intro Survey to an Open Science Course

Cluster multiple time series using K-means

I have been recently confronted to the issue of finding similarities among time-series and thoughabout using k-means to cluster them. To illustrate the method, I’ll be using data from thePenn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, let’s only select the needed columns: pwt <- … Read moreCluster multiple time series using K-means

Autumn Barnsley Fern

Intro I was playing around generating fractals in R when I realized the monochromatic green Barnsely Fern I had on my screen didn’t quite look like the leaves I could see outside my window. It was already Fall. In this post I describe a technique to generate a Barnsley Fern with autumn foliage. The Barnsley … Read moreAutumn Barnsley Fern