Poisson point processes, mass shootings and clumping by @ellis2013nz

Did the average rate of Australian mass-shooting decline after 1996, or was the drop just chance? I recently came across this letter to the Annals of Internal Medicine by Simon Chapman, Michael Stewart, Philip Alpers and Michael Jones: Fatal Firearm Incidents Before and After Australia’s 1996 National Firearms Agreement Banning Semiautomatic Rifles, via this piece … Read morePoisson point processes, mass shootings and clumping by @ellis2013nz

Predictive Solutions Series – Draper & Dash’s Stranded and Super Stranded Patient Module

Lots of exciting things are happening with Draper and Dash at the moment. We have been working with key healthcare partners to design some core predictive machine learning algorithms to enable trusts to more effectively manage their performance, demand and capacity pressures. This series focuses on Stranded Patients and Super Stranded patients – these are … Read morePredictive Solutions Series – Draper & Dash’s Stranded and Super Stranded Patient Module

Including Optional Functionality from Other Packages in Your Code

[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Let’s say you want to write a function with optional functionality … Read moreIncluding Optional Functionality from Other Packages in Your Code

One week as a Shiny dev, seen through Google search

[This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Some days ago I read an article on dev.to, entitledsomething like “One … Read moreOne week as a Shiny dev, seen through Google search

UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis

The UCSC Xena platform provides an unprecedented resource for public omics data from big projects like The Cancer Genome Atlas (TCGA), however, it is hardfor users to incorporate multiple datasets or data types, integrate the selected data withpopular analysis tools or homebrewed code, and reproduce analysis procedures. To address this issue, we developed an R … Read moreUCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis

Deadline extended for rstudio::conf(2020) abstract submissions

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. rstudio::conf, the conference on all things R and RStudio, will take place … Read moreDeadline extended for rstudio::conf(2020) abstract submissions

Simulating the Rugby World Cup 2019 Japan in R

I really like running simulation models before sporting events because they can give you a much greater depth of understanding of team performance compared to the ‘raw’ odds that you might get from the media or bookmakers, or the often varied opinions of different sports pundits. Yes, Ireland usually get knocked out in the Quarter … Read moreSimulating the Rugby World Cup 2019 Japan in R

{tvthemes 1.0.0} is on CRAN: Code improvements, Kim Possible, Stannis Baratheon, Hilda palettes/themes, and more!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. tvthemes v1.0.0 is finally on CRAN! After a long summer ofprocrastination, … Read more{tvthemes 1.0.0} is on CRAN: Code improvements, Kim Possible, Stannis Baratheon, Hilda palettes/themes, and more!

Delivering Business Impact with Analytics, Quickly

The data set deals with agency performance for a set of property and casualty insurance agencies. The data contains, among other things, a list of agencies over the periods 2005–2015, their premiums by product, and losses incurred by product. To have a business impact, it’s important to understand the context of the data and the … Read moreDelivering Business Impact with Analytics, Quickly

Bucketing and highlighting dominant predictors in your ML models

Our company have been at it again. This time from my colleagues and fellow Data Scientist Alfonso Portabales. For this post we look at Draper and Dash’s custom method of highlighting dominant predictors in your machine learning models: If you are interested in our solutions in healthcare then please contact [email protected]. Additionally, we are also … Read moreBucketing and highlighting dominant predictors in your ML models

How to build analytics platforms – Part 4: Database scalability and business models

Analytics platforms such as YUNA automatically detect machine downtimes or underutilized capacities. The identified, unused resources can be offered as services to other companies by the owners of the machine parks and thus generate new revenues. Even more important: The sustainability of the companies is also promoted in this way. In addition, these platforms put companies in … Read moreHow to build analytics platforms – Part 4: Database scalability and business models

Moderated Network Models for Continuous Data

[This article was first published on Jonas Haslbeck – r, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Statistical network models have become a popular exploratory data analysis … Read moreModerated Network Models for Continuous Data

Bayesian Linear Mixed Models: Random Intercepts, Slopes, and Missing Data

[This article was first published on R on Will Hipson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This past summer, I watched a brilliant lecture series by … Read moreBayesian Linear Mixed Models: Random Intercepts, Slopes, and Missing Data

Advanced Data Reshaping in Python and R

{ row_record <- wrapr::qchar_frame( “id” , “Species”, “Petal.Length”, “Petal.Width”, “Sepal.Length”, “Sepal.Width” | . , . , Petal.Length , Petal.Width , Sepal.Length , Sepal.Width ) row_keys <- c(‘id’, ‘Species’) # becomes block_record <- wrapr::qchar_frame( “id” , “Species”, “Part” , “Measure”, “Value” | . , . , “Petal”, “Length” , Petal.Length | . , . , “Petal”, … Read moreAdvanced Data Reshaping in Python and R

Easy data access: The advantages of a unique database connection with ODBC and DBI

Easy data access: The advantages of a unique database connection with ODBC and DBI Every developer has his favorite tools and frameworks to work with when connecting to the database. The number of different front end and back end combinations increases the complexity of the analysis. Any Relational Database Management System (RDBMS) that needs its … Read moreEasy data access: The advantages of a unique database connection with ODBC and DBI

How to do Tamil Text Analysis & NLP in R

udpipe is a beautiful R package for Text Analytics and NLP and helps in Topic Extraction. While most Text Analytics resources online are only about English, This post picks up a different lanugage – Tamil and fortuntely, udpipe has got a Tamil Language Model. Loading library(udpipe) Tamil Text Below is part extracted from a Tamil … Read moreHow to do Tamil Text Analysis & NLP in R

Grids and graticules in the tmap package

This vignette builds on the making maps chapter of the Geocomputation with R book.Its goal is to demonstrate how to set and modify grids and graticules in the tmap package. Prerequisites The examples below assume the following packages are attached: library(spData) # example datasets library(tmap) # map creation (>=2.3) library(sf) # spatial data classes The … Read moreGrids and graticules in the tmap package

Data Chats: From Physics student to Data Science Consultant

Introduction How do you begin a career in analytics and data science? What’s the best way of learning R? Should I still bother with Excel? Arguably, these are some questions that you can gain more insights on by speaking to people than running models. This week, I have the pleasure of speaking with Abhishek Modi, … Read moreData Chats: From Physics student to Data Science Consultant

Forget about Excel, Use these R Shiny Packages Instead

tl; dr Transferring your Excel sheet to a Shiny app can be the easiest way to create an enterprise ready dashboard. In this post, I  present 6 Shiny alternatives for the table-like data that Excel users love. Intro  Excel has its limitations regarding advanced statistics and calculations, quality and version control, user experience and scalability. … Read moreForget about Excel, Use these R Shiny Packages Instead

Analyzing a binary outcome arising out of within-cluster, pair-matched randomization

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A key motivating factor for the simstudy package and much of … Read moreAnalyzing a binary outcome arising out of within-cluster, pair-matched randomization

Spatial regression in R part 1: spaMM vs glmmTMB

Category Tags Many datasets these days are collected at different locations over space which may generate spatial dependence. Spatial dependence (observation close together are more correlated than those further apart) violate the assumption of independence of the residuals in regression models and require the use of a special class of models to draw the valid … Read moreSpatial regression in R part 1: spaMM vs glmmTMB

Delivering Data Science Without Delivering Software

Data Science Tools (Photo: Author) Do you always need to deliver complete software? From time to time debates such as ‘R vs Python’ or ‘Software skills vs ‘Statistics Skills’ rear their heads in the Data Science world. These debates sometimes appear to have the hidden assumption that the only possible deliverable for a data scientist … Read moreDelivering Data Science Without Delivering Software

colorspace @ useR! 2019

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Conference presentation about the colorspace toolbox for manipulating and assessing color palettes … Read morecolorspace @ useR! 2019

Use ExPanD to Create a Notebook for Your EDA

The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). You can read more about it here. The ‘ExPanD’ shiny app allows you to customize your analysis to some extent but often you might want to continue and extend your analysis with additional models and visualizations that are not part of the ‘ExPanDaR’ … Read moreUse ExPanD to Create a Notebook for Your EDA

Estimating variance: should I use n or n – 1? The answer is not what you think

Estimates of population parameters based on samples are not exact: there is always some error involved. In principle, one can estimate a population parameter with any estimator, but some will be better than others. There is one particular case which was always very confusing to me (because of the multiple alternatives) and that is the … Read moreEstimating variance: should I use n or n – 1? The answer is not what you think

Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining … Read moreUsing Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Implementing Prophet Time Series Forecasting Model

A step-by-step approach to predict the Bitcoin price for the dummies Photo by Aleksi Räisä on Unsplash Understanding time series data is very critical to any kinds of business. If you are working with numbers and analytics, more often than not, you will need to solve questions like how many customers will continue buying in … Read moreImplementing Prophet Time Series Forecasting Model

Explaining Predictions: Random Forest Post-hoc Analysis (randomForestExplainer package)

[This article was first published on R on notast, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Similar to the previous posts, the Cleveland heart dataset will be … Read moreExplaining Predictions: Random Forest Post-hoc Analysis (randomForestExplainer package)

‘There is a game I play’ – Analyzing Metacritic scores for video games

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. There is a game I play / try to make myself okay / … Read more‘There is a game I play’ – Analyzing Metacritic scores for video games

Lesser known dplyr functions

The dplyr package is an essential tool for manipulating data in R. The “Introduction to dplyr” vignette gives a good overview of the common dplyr functions (list taken from the vignette itself): filter() to select cases based on their values. arrange() to reorder the cases. select() and rename() to select variables based on their names. mutate() and transmute() to add new variables that … Read moreLesser known dplyr functions

Seeking postdoc (or contractor) for next generation Stan language research and development

The Stan group at Columbia is looking to hire a postdoc* to work on the next generation compiler for the Stan open-source probabilistic programming language. Ideally, a candidate will bring language development experience and also have research interests in a related field such as programming languages, applied statistics, numerical analysis, or statistical computation. The language … Read moreSeeking postdoc (or contractor) for next generation Stan language research and development

Why R?

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was working with our copy editor on Appendix A … Read moreWhy R?

It is Time for CRAN to Ban Package Ads

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. NPM (a popular Javascript package repository) just banned package advertisements. … Read moreIt is Time for CRAN to Ban Package Ads

Break up with Excel: Intro and Advanced R Data Science Courses at MSACL.org Salzburg Austria, September 21–23, 2019

[This article was first published on The Lab-R-torian, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. MSACL Conference There are two RStats Data Science courses happening in Salzburg … Read moreBreak up with Excel: Intro and Advanced R Data Science Courses at MSACL.org Salzburg Austria, September 21–23, 2019