Circular regression trees and forests

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A flexible framework for probabilistic forecasting of circular data is introduced, using … Read more Circular regression trees and forests

Data science trainings in Berlin & Hamburg

R is one of the leading programming languages for data analysis. In April and October 2020 we will bring our popular trainings “Introduction to R“ and “Machine Learning with R“ to Berlin and Hamburg. Save one of the coveted places and become a data science expert with R! Berlin Introduction to R 21.04. – 22.04.2020 … Read more Data science trainings in Berlin & Hamburg

Spatial predictions with GAMs and rasters

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One powerful use of GAMs is for interpolating to unsampled locations. Wecan … Read more Spatial predictions with GAMs and rasters

ChemoSpec2D Update

I’m pleased to announce that ChemoSpec2D, a package for exploratory data analysis of 2D NMR spectra, has been updated on CRAN and is coming to a mirror near you. Barring user reports to the contrary, I feel like the package has pretty much stabilized and is pretty robust. The main area for future expansion is … Read more ChemoSpec2D Update

Working with Statistics Canada Data in R, Part 4: Canadian Census Data – cancensus Package Setup

[This article was first published on Data Enthusiast’s Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Back to Working with Statistics Canada Data in R, Part 3. … Read more Working with Statistics Canada Data in R, Part 4: Canadian Census Data – cancensus Package Setup

Dataviz Workshop at RStudio::conf

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Workshop materials are available here: https://rstd.io/conf20-datavizConsider buying the book; it’s good: … Read more Dataviz Workshop at RStudio::conf

What and who is IT community? What does it take to be part?

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This blog post is long over due and has been rattling … Read more What and who is IT community? What does it take to be part?

How is information gain calculated?

This post will explore the mathematics behind information gain. We’ll start with the base intuition behind information gain, but then explain why it has the calculation that it does. What is information gain? Information gain is a measure frequently used in decision trees to determine which variable to split the input dataset on at each … Read more How is information gain calculated?

Lasso Regression (home made)

To compute Lasso regression, \frac{1}{2}\|\mathbf{y}-\mathbf{X}\mathbf{\beta}\|_{\ell_2}^2+\lambda\|\mathbf{\beta}\|_{\ell_1}define the soft-thresholding functionS(z,\gamma)=\text{sign}(z)\cdot(|z|-\gamma)_+=\begin{cases}z-\gamma&\text{ if }\gamma>|z|\text{ and }z<0\\z+\gamma&\text{ if }\gamma<|z|\text{ and }z<0 \\0&\text{ if }\gamma\geq|z|\end{cases}[/latex]The R function would be</p> <p>57f5ffabb8ff1e7b3d8dbb37273d56d4000</p> <p>To solve our optimization problem, set[latex display=”true”]\mathbf{r}_j=\mathbf{y} – \left(\beta_0\mathbf{1}+\sum_{k\neq j}\beta_k\mathbf{x}_k\right)=\mathbf{y}-\widehat{\mathbf{y}}^{(j)}so that the optimization problem can be written, equivalently\min\left\lbrace\frac{1}{2n}\sum_{j=1}^p [\mathbf{r}_j-\beta_j\mathbf{x}_j]^2+\lambda |\beta_j|\right\rbracehence\min\left\lbrace\frac{1}{2n}\sum_{j=1}^p \beta_j^2\|\mathbf{x}_j\|-2\beta_j\mathbf{r}_j^T\mathbf{x}_j+\lambda |\beta_j|\right\rbraceand one gets\beta_{j,\lambda} = \frac{1}{\|\mathbf{x}_j\|^2}S(\mathbf{r}_j^T\mathbf{x}_j,n\lambda)or, if we develop\beta_{j,\lambda} = \frac{1}{\sum_i … Read more Lasso Regression (home made)

Hyperparameter tuning and #TidyTuesday food consumption

[This article was first published on Rstats on Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Last week I published a screencast demonstrating how to use … Read more Hyperparameter tuning and #TidyTuesday food consumption

Clustered randomized trials and the design effect

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I am always saying that simulation can help illuminate interesting statistical … Read more Clustered randomized trials and the design effect

Part 6: How not to validate your model with optimism corrected bootstrapping

When evaluating a machine learning model if the same data is used to train and test the model this results in overfitting. So the model performs much better in predictive ability  than it would if it was applied on completely new data, this is because the model uses random noise within the data to learn … Read more Part 6: How not to validate your model with optimism corrected bootstrapping

Creating MS Word reports using the officer package

Commonly, the final product that a data scientist or a statistician generates is a report, usually in MS Word format. The officer package enables generating such a report from within R. It also enables generating PowerPoint presentations, but this is beyond the scope of this post. While the package has many great features,  using the … Read more Creating MS Word reports using the officer package

new release of offensive programming packages

[This article was first published on NEONIRA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Offensive programming ecosystem has been upgraded. You may now use new versions of … Read more new release of offensive programming packages

BuyingAHouse

[This article was first published on Blog – healthcare.ai, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The post BuyingAHouse appeared first on healthcare.ai. Related If you got … Read more BuyingAHouse

The Premier Machine Learning Conference (15% discount code)

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Attend any or all of the five jointly scheduled events! Get an additional … Read more The Premier Machine Learning Conference (15% discount code)

Git/Github for contributing to package development

Disclaimer: I have no affiliation with Microsoft’s GitHub, GitLab, CodeCademy or D2L team. Last week, I presented feedback forms for the tools I’m actively maintaining. Forms are the best way to interact with, or contribute to these projects, if you’re not familiar with Git/GitHub yet. A link to these forms can be found in each … Read more Git/Github for contributing to package development

Dynamic discrete choice models, reinforcement learning and Harold, part 2

In this blog post, I present a paper that has really interested me for a long time. This is part2,where I will briefly present the model of the paper, and try to play around with the data.If you haven’t, I suggest you readpart 1 where I provide more context. Rust’s model Welcome to part 2 … Read more Dynamic discrete choice models, reinforcement learning and Harold, part 2

The complete guide to clustering analysis

k-means and hierarchical clustering by hand and in R Photo by Nikola Johnny Mirkovic Clustering analysis is a form of exploratory data analysis in which observations are divided into different groups that share common characteristics. The purpose of cluster analysis (also known as classification) is to construct groups (or classes or clusters) while ensuring the … Read more The complete guide to clustering analysis

Visualizing the Tallest Building in Each State

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Via Digg: This data visualization, put together by takeasecond on Reddit, shows … Read more Visualizing the Tallest Building in Each State

Analysis of the Kenya 2019 Census using R Shiny.

This project provides actionable insights on the Kenya 2019 housing and population census. Census enumerators in one of the Kenyan constituency | GETTY IMAGES. INTRODUCTION. The first post-independence census was undertaken in 1969 where 10.9 million Kenyans were enumerated. Since then, the country has conducted decennial Population and Housing censuses at midnight on 24th/25th August. … Read more Analysis of the Kenya 2019 Census using R Shiny.

8 reasons why you should submit an abstract for EARL

Abstract submissions for the Enterprise Applications of the R  Language Conference are now open! We are back at the Tower Hotel London, from 8-10 September 2020. If you’re considering submitting an abstract and need a little more persuading, let us give you eight reasons why you should apply to present: Networking – EARL attracts over … Read more 8 reasons why you should submit an abstract for EARL

taxadb: A High-Performance Local Taxonomic Database Interface

Dealing with taxonomic inconsistencies within and across datasets is a fundamental challenge of ecology and evolutionary biology. Accounting for species synonyms, taxa splitting and unification is especially important as aggregation of data across time and different data sources becomes increasingly common. One potentially powerful approach for addressing these issues is to resolve scientific names to … Read more taxadb: A High-Performance Local Taxonomic Database Interface

Photo Mosaics in R

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Harrison Schramm, CAP, PStat, is a Senior Fellow at the Center for … Read more Photo Mosaics in R

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R

Perform by hand the k-means algorithm for the points shown in the graph below, with k = 2 and with the points i = 5 and i = 6 as initial centers. Compute the quality of the partition you just found and then check your answers in R. Assume that the variables have the same … Read more The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R

Foreign Demand for France, Germany, Italy and Spain

Our purpose is to build the quarterly foreign demand for France, Germany, Italy and Spain. To construct these series we use data from DBnomics, through the rdbnomics package. All the code is written in R, thanks to the RCoreTeam (2016) and RStudioTeam (2016). For each country, we proceed in three steps: we calculate the growth of imports in … Read more Foreign Demand for France, Germany, Italy and Spain

Video Tutorial: Create and Customize a Simple Shiny Dashboard

By Dominik Krzemiński and Krzysztof Sprycha [embedded content] Data can be very powerful, but it’s useless if you can’t interpret it or navigate through it. For this reason, it’s crucial to have an interactive and understandable visual representation of your data. To achieve this, we frequently use dashboards. Dashboards are the perfect tool for creating … Read more Video Tutorial: Create and Customize a Simple Shiny Dashboard

Monitoring for Changes in Distribution with Resampling Tests

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A client recently came to us with a question: what’s … Read more Monitoring for Changes in Distribution with Resampling Tests