Window Aggregate operator in batch mode in SQL Server 2019

So this came as a surprise, when working on calculating simple statistics on my dataset, in particular min, max and median. First two are trivial. The last one was the one, that caught my attention. While finding the fastest way on calculating the median (statistic: median) for given dataset, I have stumbled upon an interesting … Read more

Categories R Tags ExcerptFavorite

Rcrastinate is moving.

Hi all, this is just an announcement. I am moving Rcrastinate to a blogdown-based solution and am therefore leaving blogger.com. If you’re interested in the new setup and how you could do the same yourself, please check out the all shiny and new Rcrastinate over at http://rcrastinate.rbind.io/ In my first post over there, I am … Read more

Categories R Tags ExcerptFavorite

Factor Analysis in R with Psych Package: Measuring Consumer Involvement

The post Factor Analysis in R with Psych Package: Measuring Consumer Involvement appeared first on The Lucid Manager. The first step for anyone who wants to promote or sell something is to understand the psychology of potential customers. Getting into the minds of consumers is often problematic because measuring psychological traits is a complex task. … Read more

Categories R Tags ExcerptFavorite

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes. Load … Read more

Categories R Tags ExcerptFavorite

RcppArmadillo 0.9.200.7.0

A new RcppArmadillo bugfix release arrived at CRAN today. The version 0.9.200.7.0 is another minor bugfix release, and based on the new Armadillo bugfix release 9.200.7 from earlier this week. I also just uploaded the Debian version, and Uwe’s systems have already create the CRAN Windows binary. Armadillo is a powerful and expressive C++ template … Read more

Categories R Tags ExcerptFavorite

forecast 8.5

The latest minor release of the forecast package has now been approved on CRAN and should be available in the next day or so. Version 8.5 contains the following new features Updated tsCV() to handle exogenous regressors. Reimplemented naive(), snaive(), rwf() for substantial speed improvements. Added support for passing arguments to auto.arima() unit root tests. … Read more

Categories R Tags ExcerptFavorite

Make Teaching R Quasi-Quotation Easier

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts. String-interpolation … Read more

Categories R Tags ExcerptFavorite

Automated Dashboard for Classification Neural Network in R

Categories Programming Tags Data Visualisation Flexdashboard Neural Networks R Programming In this article, you learn how to make Automated Dashboard for Classification Neural Network in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R. After this … Read more

Categories R Tags ExcerptFavorite

RStudio Connect 1.7.0

RStudio Connect is the publishing platform for everything you create in R. Inconversations with our customers, R users were excited to have a central placeto share all their data products, but were facing a tough problem. Theircolleagues working in Python didn’t have the same option, leaving their workstranded on their desktops. Today, we are excited … Read more

Categories R Tags ExcerptFavorite

My course on Hyperparameter Tuning in R is now on Data Camp!

I am very happy to announce that (after many months) my interactive course on Hyperparameter Tuning in R has now been officially launched on Data Camp! Course Description For many machine learning problems, simply running a model out-of-the-box and getting a prediction is not enough; you want the best model with the most accurate prediction. … Read more

Categories R Tags ExcerptFavorite

ROC Curves

I have been thinking about writing a short post on R resources for working with (ROC) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer to its historical origins as a … Read more

Categories R Tags ExcerptFavorite

AI, Machine Learning and Data Science Roundup: January 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News Preview of Tensorflow 2.0 (the public preview … Read more

Categories R Tags ExcerptFavorite

Using DataCamp reduces anxiety about learning R!

I used DataCamp‘s excellent Introduction to R as Essential Prior Independent Study and found it made people a bit less worried about a term of R! I have a lot of fun teaching first year biology undergraduates but there are a few challenges in teaching data skills when they are not (perceived as) a student’s core discipline … Read more

Categories R Tags ExcerptFavorite

Automated Dashboard for Credit Modelling with Decision trees and Random forests in R

Categories Programming Tags Data Visualisation Flexdashboard R Programming RMarkdown In this article, you learn how to make Automated Dashboard for Credit Modelling with Decision trees and Random forests in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script … Read more

Categories R Tags ExcerptFavorite

Lecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)

These are slides from a lecture I gave at the School of Applied Sciences in Münster. In this lecture, I talked about Real-World Data Science and showed examples on Fraud Detection, Customer Churn & Predictive Maintenance. The slides were created with xaringan. Related To leave a comment for the author, please follow the link and … Read more

Categories R Tags ExcerptFavorite

Use foreach with HPC schedulers thanks to the future package

The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It’s ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you’d rather not have locking up your interactive R session. You can … Read more

Categories R Tags ExcerptFavorite

Feature Selection using Genetic Algorithms in R

This is a post about feature selection using genetic algorithms in R, in which we will do a quick review about: What are genetic algorithms? GA in ML? What does a solution look like? GA process and its operators The fitness function Genetics Algorithms in R! Try it yourself Relating concepts Animation source: “Flexible Muscle-Based … Read more

Categories R Tags ExcerptFavorite

Using clusterlab to benchmark clustering algorithms

Clusterlab is a CRAN package (https://cran.r-project.org/web/packages/clusterlab/index.html) for the routine testing of clustering algorithms. It can simulate positive (data-sets with >1 clusters) and negative controls (data-sets with 1 cluster). Why test clustering algorithms? Because they often fail in identifying the true K in practice, published algorithms are not always well tested, and we need to know … Read more

Categories R Tags ExcerptFavorite

Selecting ‘special’ photos on your phone

At the beginning of the new year I always want to clean up my photos on my phone. It just never happens. So now (like so many others I think) I have a lot of photos on my phone from the last 3.5 years. The iPhone photos app helps you a bit to go through … Read more

Categories R Tags ExcerptFavorite

Mango Solutions contributes to technology partners RStudio conference

As leading advanced analytics partner for RStudio, Mango Solutions are delighted to be contributing to the upcoming rstudio::conf programme with a workshop and a talk. Two of Mango’s senior consultants, Aimée Gott, Education Practice Lead and Mark Sellors, Head of Data Engineering will be sharing their R expertise with delegates. Aimée Gott will be delivering the Intermediate … Read more

Categories R Tags ExcerptFavorite

Neural Text Modelling with R package ruimtehol

Last week the R package ruimtehol was released on CRAN (https://github.com/bnosac/ruimtehol) allowing R users to easily build and apply neural embedding models on text data. It wraps the ‘StarSpace’ library “>https://github.com/facebookresearch/StarSpace allowing users to calculate word, sentence, article, document, webpage, link and entity ’embeddings’. By using the ’embeddings’, you can perform text based multi-label classification, … Read more

Categories R Tags ExcerptFavorite

Understanding the Magic of Neural Networks

Everything “neural” is (again) the latest craze in machine learning and artificial intelligence. Now what is the magic here? Let us dive directly into a (supposedly little silly) example: we have three protagonists in the fairy tail little red riding hood, the wolf, the grandmother and the woodcutter. They all have certain qualities and little … Read more

Categories R Tags ExcerptFavorite

Scaling H2O analytics with AWS and p(f)urrr (Part 2)

This is the second installment in a three part series on integrating H2O, AWS and p(f)urrr. In Part II, I will showcase how we can combine purrr and h2o to train and stack ML models. In the first post we looked at starting up an AMI on AWS which acts as the infrastructure upon which … Read more

Categories R Tags ExcerptFavorite

My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5

This is the next set of presentations on “Elements of Neural Networks and Deep Learning”.  In the 4th presentation I discuss and derive the generalized equations for a multi-unit, multi-layer Deep Learning network.  The 5th presentation derives the equations for a Deep Learning network when performing multi-class classification along with the derivations for cross-entropy loss. The corresponding … Read more

Categories R Tags ExcerptFavorite

splashr 0.6.0 Now Uses the CRAN-nascent stevedore Package for Docker Orchestration

The splashr package [srht|GL|GH] — an alternative to Selenium for javascript-enabled/browser-emulated web scraping — is now at version 0.6.0 (still in dev-mode but on its way to CRAN in the next 14 days). The major change from version 0.5.x (which never made it to CRAN) is a swap out of the reticulated docker package with … Read more

Categories R Tags ExcerptFavorite

R Tip: Use Inline Operators For Legibility

R Tip: use inline operators for legibility. A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types: It concatenates lists: [1,2] + [3] is [1, 2, 3]. It concatenates strings: ‘a’ + ‘b’ is ‘ab’. … Read more

Categories R Tags ExcerptFavorite

ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats

I’m happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted … Read more

Categories R Tags ExcerptFavorite

pcLasso: a new method for sparse regression

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description … Read more

Categories R Tags ExcerptFavorite

rOpenSci’s new Code of Conduct

We are pleased to announce the release of our new Code of Conduct. rOpenSci’s community is our best asset and it’s important that we put strong mechanisms in place before we have to act on a report. As before, our Code applies equally to members of the rOpenSci team and to anyone from the community … Read more

Categories R Tags ExcerptFavorite

? R Coding Style Guide

Language is a tool that allows human beings to interact and communicate with each other. The clearer we express ourselves, the better the idea is transferred from our mind to the other. The same applies to programming languages: concise, clear and consistent codes are easier to read and edit. It is especially important, if you … Read more

Categories R Tags ExcerptFavorite

colorspace: New Tools for Colors and Palettes

A major update (version 1.4.0) of the R package colorspace has been released to CRAN, enhancing many of the package’s capabilities, e.g., more refined palettes, named palettes, ggplot2 color scales, visualizations for assessing palettes, shiny and Tcl/Tk apps, color vision deficiency emulation, and much more. Overview The colorspace package provides a broad toolbox for selecting … Read more

Categories R Tags ExcerptFavorite

Travis CI for R — Advanced guide

Travis CI for R — Advanced guide Continuous integration for building an R project in Travis CI including code coverage, pkgdown documentation, osx and multiple R-Versions Photo by Guilherme Cunha on Unsplash Travis CI is a common tool to build R packages. It is in my opinion the best platform to use R in continuous integration. Some of the … Read more

Categories R Tags ExcerptFavorite

Showing a difference in means between two groups

Visualising a difference in mean between two groups isn’t as straightforward as it should. After all, it’s probably the most common quantitative analysis in science. There are two obvious options: we can either plot the data from the two groups separately, or we can show the estimate of the difference with an interval around it. … Read more

Categories R Tags ExcerptFavorite

Medium + r-bloggers — How to integrate?

Medium + r-bloggers — How to integrate? Build up a PHP script that allows you to post your Medium articles on r-bloggers.com. The script filters an RSS feed by item tags. Photo by Ato Aikins on Unsplash Motivation I started my blog about R on Medium. Medium is a wonderful platform with a great user interface. The idea to … Read more

Categories R Tags ExcerptFavorite

XmR Chart | Step-by-Step Guide by Hand and with R

Is your process in control? The XmR chart is a great statistical process control (SPC) tool that can help you answer this question, reduce waste, and increase productivity. We’ll cover the concepts behind XmR charting and explain the XmR control constant with some super simple R code. Lastly, we’ll cover how to make the XmR … Read more

Categories R Tags ExcerptFavorite

Generating Synthetic Data Sets with ‘synthpop’ in R

Synthpop – A great music genre and an aptly named R package for synthesising population data. I recently came across this package while looking for an easy way to synthesise unit record data sets for public release. The goal is to generate a data set which contains no real units, therefore safe for public release … Read more

Categories R Tags ExcerptFavorite

Making sense of the METS and ALTO XML standards

Last week I wrote a blog post where I analyzedone year of newspapers ads from 19th century newspapers. The data is made available by thenational library of Luxembourg.In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, whichcontains 10 years of publications of the L’Union, … Read more

Categories R Tags ExcerptFavorite

Practical Data Science with R, 2nd Edition discount!

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at Manning. The second edition isn’t finished yet, but chapters 1 through 4 are available in the Manning Early Access Program (MEAP), and we have finished … Read more

Categories R Tags ExcerptFavorite

10 years of playback history on Last.FM: “Just sit back and listen”

Alright, seems like this is developing into a blog where I am increasingly investigating my own music listening habits.Recently, I’ve come across the analyzelastfm package by Sebastian Wolf. I used it to download my complete listening history from Last.FM for the last ten years. That’s a complete dataset from 2009 to 2018 with exactly 65,356 … Read more

Categories R Tags ExcerptFavorite

How to combine Multiple ggplot Plots to make Publication-ready Plots

Categories Visualizing Data Tags Best R Packages Data Visualisation R Programming The life cycle of Data science can never be completed without communicating the results of the analysis/research. In fact, Data Visualization is one of the areas where R as a language for Data science has got an edge over the most-celebrated Python. With ggplot2 … Read more

Categories R Tags ExcerptFavorite

GetDFPData Ver 1.4

I just released a major update to package GetDFPData. Here are the main changes: Naming conventions for caching system are improved so that it reflects different versions of FRE and DFP files. This means the old caching system no longer works. If you have built yourself your own cache folder with many companies, do clean … Read more

Categories R Tags ExcerptFavorite

Parallelize a For-Loop by Rewriting it as an Lapply Call

A commonly asked question in the R community is: How can I parallelize the following for-loop? The answer almost always involves rewriting the for (…) { … } loop into something that looks like a y <- lapply(…) call. If you can achieve that, you can parallelize it via for instance y <- future.apply::future_lapply(…) or … Read more

Categories R Tags ExcerptFavorite

R Tip: Use seqi() For Indexes

R Tip: use seqi() for indexing. R‘s “1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use seqi() to write more reliable code and document intent. The issue is, contrary to expectations (formed in working with other programming languages) the sequence 1:0 is … Read more

Categories R Tags ExcerptFavorite

pinp 0.0.7: More small YAML options

A good six months after the previous release, another small feature release of our pinp package for snazzier one or two column Markdown-based pdf vignettes got onto CRAN minutes ago as another [CRAN-pretest-publish] release indicating a fully automated process (as can be done for packages free of NOTES, WARNING, ERRORS, and without ‘changes to worse’ … Read more

Categories R Tags ExcerptFavorite

Add a static pdf vignette to an R package

Most vignettes are built when a package is built, but there are occasions where you just want to include a pdf. For example when you want to include a paper. Of course there is a package supporting this, but in this post I will show you how to do it yourself with ease. The idea … Read more

Categories R Tags ExcerptFavorite

epubr 0.6.0 CRAN release

The epubr R package provides functions supporting the reading and parsing of internal e-book content from EPUB files. It has been updated to v0.6.0 on CRAN. This post highlights new functionality. The key improvements focus on cases where EPUB files have poorly arranged text when loaded into R as a result of their metadata entries … Read more

Categories R Tags ExcerptFavorite