R Forwards Workshop: Package Development for women and other underrepresented groups

[This article was first published on Emma R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Date: Tuesday, 7th January, 2020, 10:30-15:30 Location: University of York, UK Forwards … Read moreR Forwards Workshop: Package Development for women and other underrepresented groups

A data science approach to personality models

I have personally always found interesting the differences in how people are or act, so this study will allow us to know a little more about it. We will have insights on how the characteristics of a person relate to their personality traits and habits, and even to be able to do predictions based on … Read moreA data science approach to personality models

The best of both worlds: R meets Python via reticulate

As far as rivalries go, R vs Python can almost reach the levels of the glory days of Barca vs Madrid, Stones vs Beatles, or Sega vs Nintendo. Almost. Just dare to venture onto Twitter asking which language is best for data science to witness two tightly entrenched camps. Or at least that’s what seemingly … Read moreThe best of both worlds: R meets Python via reticulate

There’s always at least two ways to do the same thing: an example generating 3-level hierarchical data using simstudy

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. “I am working on a simulation study that requires me to … Read moreThere’s always at least two ways to do the same thing: an example generating 3-level hierarchical data using simstudy

Le Monde puzzle [#1112]

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Another low-key arithmetic problem as Le Monde current mathematical puzzle: … Read moreLe Monde puzzle [#1112]

Evaluating Model Performance by Building Cross-Validation from Scratch

Cross-validation is a widely used technique to assess the generalization performance of a machine learning model. Here at STATWORX, we often discuss performance metrics and how to incorporate them efficiently in our data science workflow. In this blog post, I will introduce the basics of cross-validation, provide guidelines to tweak its parameters, and illustrate how … Read moreEvaluating Model Performance by Building Cross-Validation from Scratch

Kannada MNIST Prediction Classification using H2O AutoML in R

Kannada MNIST dataset is another MNIST-type Digits dataset for Kannada (Indian) Language. All details of the dataset curation has been captured in the paper titled: “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” by Vinay Uday Prabhu. The github repo of the author can be found here. The objective of this post is … Read moreKannada MNIST Prediction Classification using H2O AutoML in R

Goodbye, Disqus! Hello, Utterances!

Removing Disqus from my blogdown blog had been on my mind for a while,ever since I saw Bob Rudis’ tweet enjoining Noam Ross to not useit for hisbrand-new website.The same Twitter thread introduced me toUtterances, a “lightweightcomments widget built on GitHub issues”, which I have at last installedto my blog in lieu of Disqus. How … Read moreGoodbye, Disqus! Hello, Utterances!

Building Data Science Infrastructure at an Enterprise Level with RStudio and ProCogia

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’re hosting a free, half-day event with one of our Full Service … Read moreBuilding Data Science Infrastructure at an Enterprise Level with RStudio and ProCogia

Part I: Operationalizing R models with Dash Enterprise and Microsoft Azure

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While R offers excellent support for machine learning, the process … Read morePart I: Operationalizing R models with Dash Enterprise and Microsoft Azure

New vtreat Documentation (Starting with Multinomial Classification)

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Nina Zumel finished some great new documentation showing how to … Read moreNew vtreat Documentation (Starting with Multinomial Classification)

ODSC West 2019 Talks and Workshops to Expand and Apply R Skills (20% discount)

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Go HERE to learn more about the ODSC West 2019 conference with a … Read moreODSC West 2019 Talks and Workshops to Expand and Apply R Skills (20% discount)

Five levels of analytical automation

I have been thinking more about how programming that requires minimal human input is a virtue in computer science, and hence machine learning, circles. Although there’s no doubt that is one of the central goals of programming a computer in general, I’m not convinced this extends to data analysis, which needs some thought, contextual knowledge … Read moreFive levels of analytical automation

Super Solutions for Shiny Architecture 2/5: Javascript Is Your Friend

TL;DR Three methods for using javascript code in Shiny applications to build faster apps, avoid unnecessary re-rendering, and add components beyond Shiny’s limits. Part 2 of a five part series on super solutions for Shiny architecture.  Why Javascript + Shiny?  Many Shiny creators had a data science background, and not a programming background and are … Read moreSuper Solutions for Shiny Architecture 2/5: Javascript Is Your Friend

Notes from a panel II: Value of successful BBSRC grants

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This post follows on from the last post on BBSRC Responsive … Read moreNotes from a panel II: Value of successful BBSRC grants

Learning Data Science: The Supermarket knows you are pregnant before your Dad does

A few month ago I posted about market basket analysis (see Customers who bought…), in this post we will see another form of it, done with Logistic Regression, so read on… A big supermarket chain wanted to target (wink, wink) certain customer groups better. In this special case we are talking about pregnant women. The … Read moreLearning Data Science: The Supermarket knows you are pregnant before your Dad does

Fast adaptive spectral clustering in R (brain cancer RNA-seq)

Spectral clustering refers to a family of algorithms that cluster eigenvectors derived from the matrix that represents the input data’s graph. An important step in this method is running the kernel function that is applied on the input data to generate a NXN similarity matrix or graph (where N is our number of input observations). … Read moreFast adaptive spectral clustering in R (brain cancer RNA-seq)

New package: GetQuandlData

Example 01 – Inflation in the US Let’s download and plot information about inflation in the US: library(GetQuandlData) library(tidyverse) my_id <- c(‘Inflation USA’ = ‘RATEINF/INFLATION_USA’) my_api <- readLines(‘~/Dropbox/.quandl_api.txt’) # you need your own API (get it at https://www.quandl.com/sign-up-modal?defaultModal=showSignUp>) first_date <- ‘2000-01-01’ last_date <- Sys.Date() df <- get_Quandl_series(id_in = my_id, api_key = my_api, first_date = first_date, … Read moreNew package: GetQuandlData

Cleaning Anomalies to Reduce Forecast Error by 9% with anomalize

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this tutorial, we’ll show how we used clean_anomalies() from the anomalize package … Read moreCleaning Anomalies to Reduce Forecast Error by 9% with anomalize

More models, more features: what’s new in ‘parameters’ 0.2.0

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The easystats project continues to grow, expanding its capabilities and features, … Read moreMore models, more features: what’s new in ‘parameters’ 0.2.0

Understanding Bootstrap Confidence Interval Output from the R boot Package

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a … Read moreUnderstanding Bootstrap Confidence Interval Output from the R boot Package

bamlss: A Lego Toolbox for Flexible Bayesian Regression

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Modular R tools for Bayesian regression are provided by bamlss: From classic … Read morebamlss: A Lego Toolbox for Flexible Bayesian Regression

Does news coverage boost support for presidential candidates in the Democratic primary?

[This article was first published on R on Jacob Long, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Matt Grossmann noted the close relationship betweenthe amount of news … Read moreDoes news coverage boost support for presidential candidates in the Democratic primary?

Announcing tidyUSDA: An R Package for Working with USDA Data

I’m proud to announce the release of an R package that has cured one of my own personal itches: pulling and working with USDA data, specifically Quick Stats data from NASS. tidyUSDA is a minimal package for doing just that. The following is cut out from the package vignette, which you can find here: https://github.com/bradlindblad/tidyUSDA … Read moreAnnouncing tidyUSDA: An R Package for Working with USDA Data

Harry Potter and the Power of Bayesian Constrained Inference

[This article was first published on Fabian Dablander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you are reading this, you are probably a Ravenclaw. Or a … Read moreHarry Potter and the Power of Bayesian Constrained Inference

Coding algorithms in R for models written in Stan

[This article was first published on R – Statisfaction, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Stanislaw Ulam’s auto-biography, “adventures of a mathematician”, originally published in 1976 … Read moreCoding algorithms in R for models written in Stan

Mapping the Underlying Social Structure of Reddit

Reddit is a popular website for opinion sharing and news aggregation. The site consists of thousands of user-made forums, called subreddits, which cover a broad range of subjects, including politics, sports, technology, personal hobbies, and self-improvement. Given that most Reddit users contribute to multiple subreddits, one might think of Reddit as being organized into many … Read moreMapping the Underlying Social Structure of Reddit

Handling dates and times in R: a free online course

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you ever need to work with data involving dates, times or durations … Read moreHandling dates and times in R: a free online course

101 Data Science Interview Questions, Answers, and Key Concepts

In October 2012, the Harvard Business Review described “Data Scientist” as the “sexiest”  job of the 21st century. Well, as we approach 2020 the description still holds true! The world needs more data scientists than there are available for hire. All companies – from the smallest to the biggest – want to hire for a … Read more101 Data Science Interview Questions, Answers, and Key Concepts

Updates to the rOpenSci image suite: magick, tesseract, and av

Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features. Magick 2.2 The magick package is one of the most powerful packages for … Read moreUpdates to the rOpenSci image suite: magick, tesseract, and av

More exploratory plots with ggplot2 and purrr: Adding conditional elements

This summer I was asked to collaborate on an analysis project with many response variables. As usual, I planned on automating my initial graphical data exploration through the use of functions and purrr::map() as I’ve written about previously. However, this particular project was a follow-up to a previous analysis. In the original analysis, different variables … Read moreMore exploratory plots with ggplot2 and purrr: Adding conditional elements

Four Reasons to Apply Early to Data Science Bootcamps

So you’ve decided you want to enroll in a data science bootcamp. The application deadlines aren’t for another month. Should you apply now? It might be easy to put off the application process until a later date. So, how do you decide? It turns out there are some important benefits to applying early. Here are … Read moreFour Reasons to Apply Early to Data Science Bootcamps

Illuminating the Illuminated: A First Look at the Voynich Manuscript

[This article was first published on Weird Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While the world abounds with strange phenomena ripe for analysis in … Read moreIlluminating the Illuminated: A First Look at the Voynich Manuscript