A Newbie’s Guide to Making A Pull Request (for an R package)

I had the wonderful opportunity to participate in the{tidyverse} Developer Daythe day after rstudio::conf2019officially wrapped up. One of the objectives of the eventwas to encourageopen-source contributor newbies (like me ?) togain some experience, namely through submittingpull requests to address issues with {tidyverse} packages. Having only ever worked with my own packages/repos before,I found this was … Read more A Newbie’s Guide to Making A Pull Request (for an R package)

GeoPAT2: Entropy calculations for local landscapes

GeoPAT 2 is an open-source software written in C and dedicated to pattern-based spatial and temporal analysis.Four main types of analysis available in GeoPAT 2 are (i) search, (ii) change detection, (iii) segmentation, and (iv) clustering.However, additional applications are also possible, including extracting information about spatial patterns. Global landscape diversity (based on Shannon entropy of … Read more GeoPAT2: Entropy calculations for local landscapes

Create R Markdown reports and presentations even better with these 3 practical tips

Including R Markdown in the workflow for presenting and publishing analyses that use code in R or other languages is a great way to make presentations, dashboards or reports good looking, reproducible and version controllable. In this post, we will look at three simple ways to improve that workflow even further with methods that are … Read more Create R Markdown reports and presentations even better with these 3 practical tips

simmer 4.2.1

The 4.2.1 release of simmer, the Discrete-Event Simulator for R, is on CRAN with quite interesting new features and fixes. As discussed in the mailing list, there is a way to handle the specific case in which an arrival is rejected because a queue is full: library(simmer) reject <- trajectory() %>% log_(“kicked off…”) patient <- … Read more simmer 4.2.1

Extracting colours from your images with Image Quantization

magick really does the “Magic!” I have been playing around bit with package “magick”, and I think I am now hooked… Although I haven’t been able to understand everything written in vignette just yet. One of function I got really excited is image_quantize. This function will reduce the number of unique colours used in the … Read more Extracting colours from your images with Image Quantization

Window Aggregate operator in batch mode in SQL Server 2019

So this came as a surprise, when working on calculating simple statistics on my dataset, in particular min, max and median. First two are trivial. The last one was the one, that caught my attention. While finding the fastest way on calculating the median (statistic: median) for given dataset, I have stumbled upon an interesting … Read more Window Aggregate operator in batch mode in SQL Server 2019

Rcrastinate is moving.

Hi all, this is just an announcement. I am moving Rcrastinate to a blogdown-based solution and am therefore leaving blogger.com. If you’re interested in the new setup and how you could do the same yourself, please check out the all shiny and new Rcrastinate over at http://rcrastinate.rbind.io/ In my first post over there, I am … Read more Rcrastinate is moving.

Factor Analysis in R with Psych Package: Measuring Consumer Involvement

The post Factor Analysis in R with Psych Package: Measuring Consumer Involvement appeared first on The Lucid Manager. The first step for anyone who wants to promote or sell something is to understand the psychology of potential customers. Getting into the minds of consumers is often problematic because measuring psychological traits is a complex task. … Read more Factor Analysis in R with Psych Package: Measuring Consumer Involvement

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes. Load … Read more Are you parallelizing your raster operations? You should!

RcppArmadillo 0.9.200.7.0

A new RcppArmadillo bugfix release arrived at CRAN today. The version 0.9.200.7.0 is another minor bugfix release, and based on the new Armadillo bugfix release 9.200.7 from earlier this week. I also just uploaded the Debian version, and Uwe’s systems have already create the CRAN Windows binary. Armadillo is a powerful and expressive C++ template … Read more RcppArmadillo 0.9.200.7.0

forecast 8.5

The latest minor release of the forecast package has now been approved on CRAN and should be available in the next day or so. Version 8.5 contains the following new features Updated tsCV() to handle exogenous regressors. Reimplemented naive(), snaive(), rwf() for substantial speed improvements. Added support for passing arguments to auto.arima() unit root tests. … Read more forecast 8.5

Make Teaching R Quasi-Quotation Easier

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts. String-interpolation … Read more Make Teaching R Quasi-Quotation Easier

Winter courses opening! R for Data Science & Statistics for Data Science

We’re back from the Christmas holidays, with a lot of news! We’ll start the New Year with a new edition of our most requested live courses: R for Data Science and Statistics for Data Science. There’s still some available seats, so be sure to be fast! Down below, you can find a quick description of … Read more Winter courses opening! R for Data Science & Statistics for Data Science

Automated Dashboard for Classification Neural Network in R

Categories Programming Tags Data Visualisation Flexdashboard Neural Networks R Programming In this article, you learn how to make Automated Dashboard for Classification Neural Network in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R. After this … Read more Automated Dashboard for Classification Neural Network in R

My course on Hyperparameter Tuning in R is now on Data Camp!

I am very happy to announce that (after many months) my interactive course on Hyperparameter Tuning in R has now been officially launched on Data Camp! Course Description For many machine learning problems, simply running a model out-of-the-box and getting a prediction is not enough; you want the best model with the most accurate prediction. … Read more My course on Hyperparameter Tuning in R is now on Data Camp!

ROC Curves

I have been thinking about writing a short post on R resources for working with (ROC) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer to its historical origins as a … Read more ROC Curves

RStudio Connect 1.7.0

RStudio Connect is the publishing platform for everything you create in R. Inconversations with our customers, R users were excited to have a central placeto share all their data products, but were facing a tough problem. Theircolleagues working in Python didn’t have the same option, leaving their workstranded on their desktops. Today, we are excited … Read more RStudio Connect 1.7.0

AI, Machine Learning and Data Science Roundup: January 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News Preview of Tensorflow 2.0 (the public preview … Read more AI, Machine Learning and Data Science Roundup: January 2019

Using DataCamp reduces anxiety about learning R!

I used DataCamp‘s excellent Introduction to R as Essential Prior Independent Study and found it made people a bit less worried about a term of R! I have a lot of fun teaching first year biology undergraduates but there are a few challenges in teaching data skills when they are not (perceived as) a student’s core discipline … Read more Using DataCamp reduces anxiety about learning R!

Automated Dashboard for Credit Modelling with Decision trees and Random forests in R

Categories Programming Tags Data Visualisation Flexdashboard R Programming RMarkdown In this article, you learn how to make Automated Dashboard for Credit Modelling with Decision trees and Random forests in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script … Read more Automated Dashboard for Credit Modelling with Decision trees and Random forests in R

Lecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)

These are slides from a lecture I gave at the School of Applied Sciences in Münster. In this lecture, I talked about Real-World Data Science and showed examples on Fraud Detection, Customer Churn & Predictive Maintenance. The slides were created with xaringan. Related To leave a comment for the author, please follow the link and … Read more Lecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)

Use foreach with HPC schedulers thanks to the future package

The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It’s ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you’d rather not have locking up your interactive R session. You can … Read more Use foreach with HPC schedulers thanks to the future package

Feature Selection using Genetic Algorithms in R

This is a post about feature selection using genetic algorithms in R, in which we will do a quick review about: What are genetic algorithms? GA in ML? What does a solution look like? GA process and its operators The fitness function Genetics Algorithms in R! Try it yourself Relating concepts Animation source: “Flexible Muscle-Based … Read more Feature Selection using Genetic Algorithms in R

Using clusterlab to benchmark clustering algorithms

Clusterlab is a CRAN package (https://cran.r-project.org/web/packages/clusterlab/index.html) for the routine testing of clustering algorithms. It can simulate positive (data-sets with >1 clusters) and negative controls (data-sets with 1 cluster). Why test clustering algorithms? Because they often fail in identifying the true K in practice, published algorithms are not always well tested, and we need to know … Read more Using clusterlab to benchmark clustering algorithms

Mango Solutions contributes to technology partners RStudio conference

As leading advanced analytics partner for RStudio, Mango Solutions are delighted to be contributing to the upcoming rstudio::conf programme with a workshop and a talk. Two of Mango’s senior consultants, Aimée Gott, Education Practice Lead and Mark Sellors, Head of Data Engineering will be sharing their R expertise with delegates. Aimée Gott will be delivering the Intermediate … Read more Mango Solutions contributes to technology partners RStudio conference

Neural Text Modelling with R package ruimtehol

Last week the R package ruimtehol was released on CRAN (https://github.com/bnosac/ruimtehol) allowing R users to easily build and apply neural embedding models on text data. It wraps the ‘StarSpace’ library “>https://github.com/facebookresearch/StarSpace allowing users to calculate word, sentence, article, document, webpage, link and entity ’embeddings’. By using the ’embeddings’, you can perform text based multi-label classification, … Read more Neural Text Modelling with R package ruimtehol

Understanding the Magic of Neural Networks

Everything “neural” is (again) the latest craze in machine learning and artificial intelligence. Now what is the magic here? Let us dive directly into a (supposedly little silly) example: we have three protagonists in the fairy tail little red riding hood, the wolf, the grandmother and the woodcutter. They all have certain qualities and little … Read more Understanding the Magic of Neural Networks

My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5

This is the next set of presentations on “Elements of Neural Networks and Deep Learning”.  In the 4th presentation I discuss and derive the generalized equations for a multi-unit, multi-layer Deep Learning network.  The 5th presentation derives the equations for a Deep Learning network when performing multi-class classification along with the derivations for cross-entropy loss. The corresponding … Read more My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5

splashr 0.6.0 Now Uses the CRAN-nascent stevedore Package for Docker Orchestration

The splashr package [srht|GL|GH] — an alternative to Selenium for javascript-enabled/browser-emulated web scraping — is now at version 0.6.0 (still in dev-mode but on its way to CRAN in the next 14 days). The major change from version 0.5.x (which never made it to CRAN) is a swap out of the reticulated docker package with … Read more splashr 0.6.0 Now Uses the CRAN-nascent stevedore Package for Docker Orchestration

ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

I’m happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted … Read more ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

pcLasso: a new method for sparse regression

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description … Read more pcLasso: a new method for sparse regression

? R Coding Style Guide

Language is a tool that allows human beings to interact and communicate with each other. The clearer we express ourselves, the better the idea is transferred from our mind to the other. The same applies to programming languages: concise, clear and consistent codes are easier to read and edit. It is especially important, if you … Read more ? R Coding Style Guide

Interpreting the coefficients of linear regression

Source: Unsplash Nowadays there is a plethora of machine learning algorithms we can try out to find the best fit for our particular problem. Some of the algorithms have clear interpretation, other work as a blackbox and we can use approaches such as LIME or SHAP to derive some interpretations. In this article I would … Read more Interpreting the coefficients of linear regression

colorspace: New Tools for Colors and Palettes

A major update (version 1.4.0) of the R package colorspace has been released to CRAN, enhancing many of the package’s capabilities, e.g., more refined palettes, named palettes, ggplot2 color scales, visualizations for assessing palettes, shiny and Tcl/Tk apps, color vision deficiency emulation, and much more. Overview The colorspace package provides a broad toolbox for selecting … Read more colorspace: New Tools for Colors and Palettes

Travis CI for R — Advanced guide

Travis CI for R — Advanced guide Continuous integration for building an R project in Travis CI including code coverage, pkgdown documentation, osx and multiple R-Versions Photo by Guilherme Cunha on Unsplash Travis CI is a common tool to build R packages. It is in my opinion the best platform to use R in continuous integration. Some of the … Read more Travis CI for R — Advanced guide

Showing a difference in means between two groups

Visualising a difference in mean between two groups isn’t as straightforward as it should. After all, it’s probably the most common quantitative analysis in science. There are two obvious options: we can either plot the data from the two groups separately, or we can show the estimate of the difference with an interval around it. … Read more Showing a difference in means between two groups

Medium + r-bloggers — How to integrate?

Medium + r-bloggers — How to integrate? Build up a PHP script that allows you to post your Medium articles on r-bloggers.com. The script filters an RSS feed by item tags. Photo by Ato Aikins on Unsplash Motivation I started my blog about R on Medium. Medium is a wonderful platform with a great user interface. The idea to … Read more Medium + r-bloggers — How to integrate?

XmR Chart | Step-by-Step Guide by Hand and with R

Is your process in control? The XmR chart is a great statistical process control (SPC) tool that can help you answer this question, reduce waste, and increase productivity. We’ll cover the concepts behind XmR charting and explain the XmR control constant with some super simple R code. Lastly, we’ll cover how to make the XmR … Read more XmR Chart | Step-by-Step Guide by Hand and with R

Generating Synthetic Data Sets with ‘synthpop’ in R

Synthpop – A great music genre and an aptly named R package for synthesising population data. I recently came across this package while looking for an easy way to synthesise unit record data sets for public release. The goal is to generate a data set which contains no real units, therefore safe for public release … Read more Generating Synthetic Data Sets with ‘synthpop’ in R

Making sense of the METS and ALTO XML standards

Last week I wrote a blog post where I analyzedone year of newspapers ads from 19th century newspapers. The data is made available by thenational library of Luxembourg.In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, whichcontains 10 years of publications of the L’Union, … Read more Making sense of the METS and ALTO XML standards

Practical Data Science with R, 2nd Edition discount!

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at Manning. The second edition isn’t finished yet, but chapters 1 through 4 are available in the Manning Early Access Program (MEAP), and we have finished … Read more Practical Data Science with R, 2nd Edition discount!

I walk the (train) line – part deux – the weight loss continues

(TL;DR: author continues to use his undiagnosed OCD for good. Breath-first search introduced on simple graph.) We learnt how to get OpenStreetMap data into R last time. And I said that we will be doing a little bit of this: So what the hell is this? This is an example of breadth-first search of a … Read more I walk the (train) line – part deux – the weight loss continues

10 years of playback history on Last.FM: “Just sit back and listen”

Alright, seems like this is developing into a blog where I am increasingly investigating my own music listening habits.Recently, I’ve come across the analyzelastfm package by Sebastian Wolf. I used it to download my complete listening history from Last.FM for the last ten years. That’s a complete dataset from 2009 to 2018 with exactly 65,356 … Read more 10 years of playback history on Last.FM: “Just sit back and listen”