Dollar Signs and Percentages- 3 Different Ways to Convert Data Types in R

[This article was first published on R – data technik, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Working with percentages in R can be a little tricky, … Read more Dollar Signs and Percentages- 3 Different Ways to Convert Data Types in R

How to make Square (Pie) Charts for Infographics in R

Are you looking for some unique way of visualizing your numbers instead of simply using bar charts – which sometimes could be boring the audience – if used, slide after slide? Here’s Square Pie / Waffle Chart for you. Waffle Chart or as it goes technically, Square Pie Chart is just is just a pie … Read more How to make Square (Pie) Charts for Infographics in R

#23: Debugging with Docker and Rocker – A Concrete Example helping on macOS

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Welcome to the 23nd post in the rationally reasonable … Read more #23: Debugging with Docker and Rocker – A Concrete Example helping on macOS

Insurance data science : use and value of unusual data #1

Next week, with , I will be at the Summer School of the Swiss Association of Actuaries, in Lausanne, with Jean-Philippe Boucher (UQAM) and Ewen Gallic (AMSE). I will give an introductionary talk on Monday morning, and the slides are now available There will be some hands-on applications, on R. I will share some codes … Read more Insurance data science : use and value of unusual data #1

Bayes models for estimation in stepped-wedge trials with non-trivial ICC patterns

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Continuing a series of posts discussing the structure of intra-cluster correlations … Read more Bayes models for estimation in stepped-wedge trials with non-trivial ICC patterns

How to create unigrams, bigrams and n-grams of App Reviews

This is one of the frequent questions I’ve heard from the first timer NLP / Text Analytics – programmers (or as the world likes it to be called “Data Scientists”). Prerequisite For simplicity, this post assumes that you already know how to install a package and so you’ve got tidytext installed on your R machine. … Read more How to create unigrams, bigrams and n-grams of App Reviews

Check your (Mixed) Model for Multicollinearity with ‘performance’

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The goal of performance is to provide lightweight tools to assess … Read more Check your (Mixed) Model for Multicollinearity with ‘performance’

mlr-2.15.0

We just released mlr v2.15.0 to CRAN.This version includes some breaking changes and the usual bug fixes from the last three months. We made good progress on the goal of cleaning up the Github repo.We processed nearly all open pull requests (around 40).In the next months we will focus on cleaning up the issue tracker … Read more mlr-2.15.0

Working With Vectors

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous exercise set we practised vectors as a data structure. As … Read more Working With Vectors

Clustering Frankenstein

Necesito para estar sentado, un arbolito en este descampado (Desarraigo, Extremoduro) From time to time I come back to experiment with this stunning photograph of Boris Karloff as Frankenstein’s monster. I have done several of them previously: from decomposing it into Voronoi regions, to draw it as a single line portrait using an algorithm to … Read more Clustering Frankenstein

RTutor: Improving Interactive Problem Sets by Analyzing Submissions

At Ulm University, we currently use RTutor for several elective courses where students solve interactive problem sets at home. They can test their solutions and get automatic hints, and then submit their solution. Starting next year, we plan to use RTutor in a new compulsory data science project course in our business and economics bachelor. … Read more RTutor: Improving Interactive Problem Sets by Analyzing Submissions

Stock Market Data Scenario Set Generation – S&P 100

I just love to create portfolio optimization models based on Optimization theory and such models require a well-defined return scenario set which is nothing more than a matrix where we have a joint possible set of returns of all our assets under consideration. The easiest way is to use historical data for this purpose. While … Read more Stock Market Data Scenario Set Generation – S&P 100

Finding duplicates in data frame across columns and replacing them with unique values using R

Suppose you have a dataset with many variables, and you want to check: if there are any duplicated for each of the observation replace duplicates with random value from pool of existing values. In this manner, let’s create a sample dataset: df <- structure(list( v1 = c(10,20,30,40,50,60,70,80) ,v2 = c(5,7,6,8,6,8,9,4) ,v3 = c(2,4,6,6,7,8,8,4) ,v4 = … Read more Finding duplicates in data frame across columns and replacing them with unique values using R

The Shiny Developer Series

Shiny is one of the best ways to build interactive documents, dashboards, and data science applications. But advancing your skills with Shiny does not come without challenges. Shiny developers often have a stronger background in applied statistics than in areas useful for optimizing an application, like programming, web development, and user-interface design. Though there are … Read more The Shiny Developer Series

Likert Scale Survey: from googleform to #rstats graph

Many Biology students are interested in science communication or the public understanding of science and undertake projects in these areas. They often conduct surveys which include Likert-scale questions. This workflow will teach you how to set up a Google Forms survey with Likert scale questions, read the responses in to R and report on the results. It … Read more Likert Scale Survey: from googleform to #rstats graph

How to Build an Automated Trading System using R

Photo by M. B. M. on Unsplash For all R zealots, we know that we can build any data product very efficiently using R. An automated trading system is not an exception. Whether you are doing high-frequency trading, day trading, swing trading, or even value investing, you can use R to build a trading robot … Read more How to Build an Automated Trading System using R

Interactive Visualization in R with apexcharter

Interactive Visualizations are powerful these days because those are all made for web. Web – simply a combination of html,css and javascript which build interactive visualizations. Thus, paving way for a lot of javascript charting libraries like highcharts.js, apexcharts.js. Thanks to htmlwidgets of R, many R developers have started porting those javascript charting libraries to … Read more Interactive Visualization in R with apexcharter

Program Evaluation: Regression Discontinuity Design in R

Category Tags Regression analysis is one of the most requested machine learning methods in 2019. One group of regression analysis for measuring effects and to evaluate a policy program is Regression Discontinuity Design. This method is well suited for benchmarking and finding improvements for optimization in organizations. It can, therefore, be used to design organizations … Read more Program Evaluation: Regression Discontinuity Design in R

RcppCCTZ 0.2.6

A shiny new release 0.2.6 of RcppCCTZ is now at CRAN. RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and … Read more RcppCCTZ 0.2.6

Time Series Analysis with Wind Resource Assessment in R

Category Tags One of the sectors with a huge demand for data science/analysis is the energy sector. A branch of this sector where demand is high is the green wind energy turbine sector. In this analysis, you will learn to do Time Series Analysis with Wind Ressource Assesment in R. Wind Ressource Assesment The wind … Read more Time Series Analysis with Wind Resource Assessment in R

Programmatically extract TIOBE Index Ratings

TIOBE Index is an index (ranking) that claims to represent the popularity of programming languages. Yihui (The creator of blogdown package), recently wrote a blogpost titled “On TIOBE Index and the era of decision fatigue” and I strongly recommend you to go through that before continuing with this post. So the Disclaimer goes like this: … Read more Programmatically extract TIOBE Index Ratings

R Markdown Workshop

This is an unusual post for me, I have avoided writing about R Markdown because there are so many resources already available on the topic (e.g., here, here, and here). However, recently I ran a session on using RMarkdown for my colleagues in the Centre for Social Issues Research. The aim of this was to … Read more R Markdown Workshop

Machine Learning Training – Draper and Dash Healthcare Predictive Analytics – Summer Discount

There is an offer on with my company Draper & Dash to get a discount on ML training for your organisation. Contact the sales team to find out more about this training opportunity. Plus you get to meet our great data science team at Draper and Dash. The below is me and colleagues in action … Read more Machine Learning Training – Draper and Dash Healthcare Predictive Analytics – Summer Discount

simmer 4.3.0 + JSS publication

The 4.3.0 release of simmer, the Discrete-Event Simulator for R, is on CRAN. Along with this update, we are very glad to announce that our homonymous paper finally appeared in the Journal of Statistical Software. Please, use the following reference for citations (see citation(“simmer”)): Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical … Read more simmer 4.3.0 + JSS publication

Updating MetaLandSim…

A new development version is know available for the package MetaLandSim (v. 1.0.6). It can be downloaded from GitHub. The user has to run the following code in order to install the package from GitHub: library(devtools)install_github(“FMestre1/MetaLandSim”) These changes were required because of recent improvements made to the package rgrass7 (which allows to work on GRASS … Read more Updating MetaLandSim…

dime: Deep Interactive Model Explanations

Hubert Baniecki created an awesome package dime for serverless HTML interactive model exploration. The experimental version is at Github, here is the pkgdown website. It is a part of the DrWhy.AI project. How does it work? With the DALEX package you can create local and global model explanations for machine learning models. Each explanation can … Read more dime: Deep Interactive Model Explanations

Propensity Score Matching in R

Category Tags Regression analysis is one of the most requested machine learning methods in 2019. One group of regression analysis for measuring effects and to evaluate the statistical effect of covariates is Propensity Score Matching (PSM). This method is well suited to investigate if the covariates are changing the effects of the estimates in the … Read more Propensity Score Matching in R

XY problems

The so called XY problem is a classical situation on Q&A sites such as stackoverflow. While answering a recent question there I thought to have spotted such a case and addressed the actual problem. As it turns out, I had stopped to early when going for the root cause. The question asker wanted to speed … Read more XY problems

80th #TokyoR Meetup Roundup: Econometrics vs. ML, Python with R, & Translating tidyverse.org into Japanese!

Within a typhoon, another TokyoR Meetup! … well not really it turned outto be a false alarm and the weather was a wonderful 30 degrees Celsiuswith 800% humidity as usual in Tokyo. My gripes with the weather aside thismonth’s meetup was held at Cresco, an ITmanagement strategy company, in their headquarters in Shinagawa, Tokyo. In … Read more 80th #TokyoR Meetup Roundup: Econometrics vs. ML, Python with R, & Translating tidyverse.org into Japanese!

Data on demand: Azure SQL Database in serverless mode

Azure SQL Database has a new “serverless” mode in preview that eliminates compute costs when not in use. In this post, I’ll show how you can set up a serverless database instance, and access data stored in it from R. I’m working on a demo that I’ll be giving at several upcoming conferences, and for … Read more Data on demand: Azure SQL Database in serverless mode

Learn AB Testing in R to Revolutionize Your Product

When it comes to your typical product or engineering org, team members are often left wondering whether the thing they did had an impact, or whether the option they went with among many different design options was actual the best. As these organizations want to move towards data informed design decision, AB testing is first … Read more Learn AB Testing in R to Revolutionize Your Product

Learn about XAI in R with ,,Predictive Models: Explore, Explain, and Debug”

XAI (eXplainable artificial intelligence) is a fast growing and super interesting area.Working with complex models generates lots of problems with model validation (on test data performance is great but drops at production), model bias, lack of stability and many others. We need more than just local explanations for predictive models. The more complex are models … Read more Learn about XAI in R with ,,Predictive Models: Explore, Explain, and Debug”

useR!2019 roundup: workflow, reproducibility and friends!

Earlier this month I, together with two other Mangoes, made my way to France for the 2019 edition of useR!. useR! brings together users and developers both from academia and the industry. This year it was hosted in Toulouse and together with side-events covered the second week of July. It was my first time attending … Read more useR!2019 roundup: workflow, reproducibility and friends!

Announcing pdqr

Prologue I have been working on ‘pdqr’ package for quite some time now. Initially it was intended only for creating custom distribution functions (analogues of base “p”, “d”, “q”, and “r” functions) from numeric sample. However, after couple of breakthrough ideas, it also became a set of tools for transforming and summarizing distributions.Now I would … Read more Announcing pdqr

How to reshape a dataframe from wide to long or long to wide format

Reshaping a dataframe / table from long to wide format or wide to long format is one of the daily tasks a Data Analyst / Data Scientist would be doing. The long format is similar to the tidy format that the tidyverse advocates. Even while, it’s been a very common task – the tidyr package’s … Read more How to reshape a dataframe from wide to long or long to wide format

Getting started with Tensorflow, Keras in Python and R

The Pale Blue Dot “From this distant vantage point, the Earth might not seem of any particular interest. But for us, it’s different. Consider again that dot. That’s here, that’s home, that’s us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their … Read more Getting started with Tensorflow, Keras in Python and R

mlr3-0.1.0

The mlr-org team is very proud to present the initial release of the mlr3 machine-learning framework for R. mlr3 comes with a clean object-oriented-design using the R6 class system.With this, it overcomes the limitations of R’s S3 classes.It is a rewrite of the well-known mlr package which provides a convenient way of accessing many algorithms … Read more mlr3-0.1.0

Validating Type I and II Errors in A/B Tests in R

In this post, we seek to develop an intuitive sense of what type I (false-positive) and type II (false-negative) errors represent when comparing metrics in A/B tests, in order to gain an appreciation for “peeking”, one of the major problems plaguing the analysis of A/B test today. To better understand what “peeking” is, it helps … Read more Validating Type I and II Errors in A/B Tests in R

Network model trees

The effect of covariates on correlations in psychometric networks is assessed with either model-based recursive partitioning (MOB) or conditional inference trees (CTree). Citation Jones PJ, Mair P, Simon T, Zeileis A (2019). “Network Model Trees”, OSF ha4cw, OSF Preprints. doi:10.31219/osf.io/ha4cw Abstract In many areas of psychology, correlation-based network approaches (i.e., psychometric networks) have become a … Read more Network model trees

Microsoft ML Server 9.4 now available

Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, … Read more Microsoft ML Server 9.4 now available

Dash has gone full R

This is a reblog from the “Announcing Dash for R” announcement originally published July 10. Dash, the fastest growing framework for building analytic web applications on top of Python models, is now available for the R programming language. Installation | Documentation | GitHub | Gallery Dash was released in 2017 as the latest evolution in Plotly’s open-source analytics tools. At the time, Plotly was … Read more Dash has gone full R

Grades Aren’t Normal

This article is also available in PDF form. A while back someone posted on Reddit about the grading policies of their academic department. Specifically, the department chair made a statement claiming that grades should be Normally distributed with a C average. I responded, claiming that no statistician would ever take the idea that grades follow … Read more Grades Aren’t Normal