PlayerIds

Something that has been talked a bit about recently on twitter is the the use of unique playerIDs so that fan analysts, punters and bloggers can track players through time. There are some things that need to be thought about when creating unique playerIDS for analysis. Lets say you are a user of #fitzRoy we … Read more PlayerIds

Meetup 02-2019 Minutes

Self Service Data Preparation und Data Science Peter Jeitschko Peter presented Alteryx, a platform built for Business Analysts to master tasks like data management, data cleaning and modelling. The tool is windows only and will be ported to Linux soon. It can connect to multiple data sources and helps Business Analysts to deploy models in … Read more Meetup 02-2019 Minutes

Gabriel and Hugo discuss his role on helping to make the BBC more data informed.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Gabriel Straub, the Head of Data Science and Architecture at the BBC. Here is the podcast link. Hugo: Hi there Gabriel, and welcome to DataFramed. Gabriel: Hello, thanks a lot Hugo for having me. Hugo: Such a pleasure to have you on the show. … Read more Gabriel and Hugo discuss his role on helping to make the BBC more data informed.

First post!

You’ll find this post in your _posts directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run jekyll serve, which launches a web server and auto-regenerates your site when a file is updated. To … Read more First post!

Direct Optimization of Hyper-Parameter

In the previous post (https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization), it is shown how to identify the optimal hyper-parameter in a General Regression Neural Network by using the Sobol sequence and the uniform random generator respectively through the N-fold cross validation. While the Sobol sequence yields a slightly better performance, outcomes from both approaches are very similar, as shown below … Read more Direct Optimization of Hyper-Parameter

How did Axios rectangle Trump’s PDF schedule? A try with R

Last week, Axios published a very interesting piece reporting onTrump’s private schedule thanks to an insider’sleak.The headlines all were about Trump’s spending more than 60% of his timein “executive time” which admittedly was indeed the most importantaspect of the story. I, however, also got curious about Axios’ work togo from the PDF schedules to the … Read more How did Axios rectangle Trump’s PDF schedule? A try with R

Multilevel Modelling in R: Analysing Vendor Data

Categories Regression Models Tags Linear Mixed Model Linear Regression R Programming One of the main limitations of regression analysis is when one needs to examine changes in data across several categories. This problem can be resolved by using a multilevel model, i.e. one that varies at more than one level and allows for variation between … Read more Multilevel Modelling in R: Analysing Vendor Data

Manipulating strings with the {stringr} package

{stringr} contains functions to manipulate strings. In Chapter 10, I will teach you about regularexpressions, but the functions contained in {stringr} allow you to already do a lot of work onstrings, without needing to be a regular expression expert. I will discuss the most common string operations: detecting, locating, matching, searching andreplacing, and exctracting/removing strings. … Read more Manipulating strings with the {stringr} package

Quick Hit: Speeding Up a Slow/Mundane Task with a Little Rcpp

Over at $DAYJOB’s blog I’ve queued up a post that shows how to use our new opendata package to work with our Open Data portal’s API. I’m not super-sure when it’s going to be posted so keep an RSS reader fixed on https://blog.rapid7.com/ if you’re interested in seeing it (I may make a small note … Read more Quick Hit: Speeding Up a Slow/Mundane Task with a Little Rcpp

Inserting “Edit on GitHub” Buttons in a Single R Markdown Document

As the R Markdown ecosystem becomes larger, users now may encounter situations where they have to make decisions on which output format of R Markdown to use.One may found none of the formats suitable – the features essential to the output document one wants may scatter across different output formats of R Markdown. Here is … Read more Inserting “Edit on GitHub” Buttons in a Single R Markdown Document

Where the German Companies Are

Last week, the German NGO Open Knowledge Foundation Deutschland e.V. has made German Trade Resister data available via the project OffeneRegister.de, together with the British NGO opencorporates. While the data from German Trade Resister is publicly available in principle, retrieving the data is a case-by-case activity and is very cumbersome (try for yourself if you … Read more Where the German Companies Are

Real Net Profit: 150% in just 4 Months

Developing a post-commission profitable currency trading model using Pivot Billions and R. Needle, meet haystack. Searching for the right combination of features to make a consistent trading model can be quite difficult and takes many, many iterations. By incorporating Pivot Billions and R into my research process, I was able to dramatically improve the efficiency … Read more Real Net Profit: 150% in just 4 Months

Benchmarking cast in R from long data frame to wide matrix

In my daily work I often have to transform a long table to a wide matrix so accommodate some function. At some stage in my life I came across the reshape2 package, and I have been with that philosophy ever since – I find it makes data wrangling easy and straight forward. I particularly like … Read more Benchmarking cast in R from long data frame to wide matrix

Deploying an R Shiny App With Docker

If you haven’t heard of Docker, it is a system that allows projects to be split into discrete units (i.e. containers) that each operate within their own virtual environment. Each container has a blueprint written in its Dockerfile that describes all of the operating parameters including operating system and package dependencies/requirements. Docker images are easily … Read more Deploying an R Shiny App With Docker

NSERC – Discovery Grants Program, over the past 5 years

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 library(XML) library(stringr) url=”http://www.nserc-crsng.gc.ca/NSERC-CRSNG/FundingDecisions-DecisionsFinancement/ResearchGrants-SubventionsDeRecherche/ResultsGSC-ResultatsCSS_eng.asp” download.file(url,destfile = “GSC.html”) library(XML) tables=readHTMLTable(“GSC.html”) GSC=tables[[1]]$V1 GSC=as.character(GSC[-(1:2)]) namesGSC=tables[[1]]$V2 namesGSC=as.character(namesGSC[-(1:2)]) Correction = function(x) as.numeric(gsub(‘[$,]’, ”, x)) YEAR=2013:2018 for(i in 1:length(YEAR)){ … Read more NSERC – Discovery Grants Program, over the past 5 years

Launching codecentric.AI Bootcamp course!

Today, I am happy to announce the launch of our codecentric.AI Bootcamp! This bootcamp is a free online course for everyone who wants to learn hands-on machine learning and AI techniques, from basic algorithms to deep learning, computer vision and NLP. However, the course language is German only, but for every chapter I did, you … Read more Launching codecentric.AI Bootcamp course!

Liverpool is the Most Popular City in the World (relative to use as password per inhabitant)

The API of pwnedpasswords.com is quite remarkable. It not only allows you to fetch the results generally obtained by typing in your e-mail into the browser interface and finding out whether or not you’ve been pwned from the comfort of your shell. It further allows you to very simply check whether a certain password has … Read more Liverpool is the Most Popular City in the World (relative to use as password per inhabitant)

Introducing olsrr

I am pleased to announce the olsrr package, a set of tools for improvedoutput from linear regression models, designed keeping in mindbeginner/intermediate R users. The package includes: comprehensive regression output variable selection procedures heteroskedasticiy, collinearity diagnostics and measures of influence various plots and underlying data If you know how to build models using lm(), you … Read more Introducing olsrr

“Correlation is not causation”. So what is?

Machine learning applications have been growing in volume and scope rapidly over the last few years. What’s Causal inference, how is it different than plain good ole’ ML and when should you consider using it? In this report I try giving a short and concrete answer by using an example. Imagine we’re tasked by the … Read more “Correlation is not causation”. So what is?

Create data visualizations like BBC News with the BBC’s R Cookbook

If you’re looking a guide to making publication-ready data visualizations in R, check out the BBC Visual and Data Journalism cookbook for R graphics. Announced in a BBC blog post this week, it provides scripts for making line charts, bar charts, and other visualizations like those below used in the BBC’s data journalism.  The cookbook … Read more Create data visualizations like BBC News with the BBC’s R Cookbook

Statswars

I am stuck at home sick today, so I decided to provide a relational analysis of the Stats Package Wars that have been bubbling away for the past week. True in all its details. If you want something slightly more constructive, consider The Plain Person’s Guide to Plain-Text Social Science. Related To leave a comment … Read more Statswars

An absolute beginner’s guide to creating data frames for a Stack Overflow [r] question

For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it … Read more An absolute beginner’s guide to creating data frames for a Stack Overflow [r] question

Investigating words distribution with R – Zipf’s law

Hello again! Typically I would start by describing a complicated problem that can be solved using machine or deep learning methods, but today I want to do something different, I want to show you some interesting probabilistic phenomena! Have you heard of Zipf’s law? I hadn’t until recently. Zipf’s law is an empirical law that … Read more Investigating words distribution with R – Zipf’s law

PDSwR2: New Chapters!

We have two new chapters of Practical Data Science with R, Second Edition online and available for review! The newly available chapters cover: Data Engineering And Data Shaping – Explores how to use R to organize or wrangle data into a shape useful for analysis. The chapter covers applying data transforms, data manipulation packages, and … Read more PDSwR2: New Chapters!

Visualizing New York City WiFi Access with K-Means Clustering

Categories Advanced Modeling Tags K Means R Programming Unsupervised Learning Visualization has become a key application of data science in the telecommunications industry. Specifically, telecommunication analysis is highly dependent on the use of geospatial data. This is because telecommunication networks in themselves are geographically dispersed, and analysis of such dispersions can yield valuable insights regarding … Read more Visualizing New York City WiFi Access with K-Means Clustering

rstudio::conf 2019 Workshop materials now available

rstudio::conf 2019 featured 15 workshops on tidyverse, Shiny, R Markdown, modeling and machine learning, deep learning, big data, and what they forgot to teach you about working with R. Some of the new workshops for this year touched on topics like putting Shiny applications into production at scale and R & Tensorflow. The conference also … Read more rstudio::conf 2019 Workshop materials now available

R for Quantitative Health Sciences: An Interview with Jarrod Dalton

This interview came about through researching R-based medical applications in preparation for the upcoming R/Medicine conference. When we discovered the impressive number of Shiny-based Risk Calculators developed by the Cleveland Clinic and implemented in public-facing sites, we wanted to learn more about the influence of R Language in the development of statistical science at this … Read more R for Quantitative Health Sciences: An Interview with Jarrod Dalton

R for trial and model-based cost-effectiveness analysis

9 July 2019, University College London Training event (8 July): Torrington (1-19) B07 – Teal Room in Torrington Place, 1-19 (), University College London, United Kingdom Main workshop (9 July): Anatomy G29 J Z Young Lecture Theatre, UCL Medical Sciences and Anatomy (https://goo.gl/maps/biryoFc9CiL2), University College London, United Kingdom. Background and objectives It is our pleasure … Read more R for trial and model-based cost-effectiveness analysis

Ideally, this shouldn’t be happening for such a deep network.

Ideally, this shouldn’t be happening for such a deep network. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you … Read more Ideally, this shouldn’t be happening for such a deep network.

Version 0.7.0 of NIMBLE released

We’ve released the newest version of NIMBLE on CRAN and on our website. NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods (such as MCMC and SMC). Version 0.7.0 provides a variety of new features, as well as various bug fixes. New features include: … Read more Version 0.7.0 of NIMBLE released

The finalfit tables gallery has all the variations you could possibly want

The new finalfit tables gallery vignette is an excellent reference and quick tutorial describing the variety of table outputs available. It focuses on crosstables and regression tables, and demonstrates how to easily generate results in R and export them to Word, PDF or html. https://finalfit.org/articles/tables_gallery.html Related To leave a comment for the author, please follow … Read more The finalfit tables gallery has all the variations you could possibly want

Using the uniform sum distribution to introduce probability

I’ve never taught an intro probability/statistics course. If I ever did, I would certainly want to bring the underlying wonder of the subject to life. I’ve always found it almost magical the way mathematical formulation can be mirrored by computer simulation, the way proof can be guided by observed data generation processes, and the way … Read more Using the uniform sum distribution to introduce probability

Brandeis and Hugo discuss people of color and under-represented groups in data science.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Brandeis Marshall, Associate Professor of Computer Science in the Computer and Information Sciences Department at Spelman College. Here is the podcast link. Hugo: Hi there, Brandeis, and welcome to DataFramed. Brandeis: Well, thank you. Wonderful to be here. Hugo: It’s such a pleasure to … Read more Brandeis and Hugo discuss people of color and under-represented groups in data science.

Organizing R Research Projects: CPAT, A Case Study

Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer. First, some background. In the early months of 2016 I began a research project with my current Ph.D. advisor … Read more Organizing R Research Projects: CPAT, A Case Study

An overview of the NLP ecosystem in R (#nlproc #textasdata)

At BNOSAC, R is used a lot to perform text analytics as it is an excellent tool that provides anything a data scientist needs to perform data analysis on text in a business settings. For users unfamiliar with all the possibilities that the wealth of R packages offers regarding text analytics, we’ve made this small … Read more An overview of the NLP ecosystem in R (#nlproc #textasdata)

Dashboard for Sales Trends in Retail

Overview Retail is probably the most talked about industry when it comes to disruption these days. Empty malls are a common blog topic and unusually high number of bankruptcies span across all subsectors. Some of the familiar names that filed for bankruptcy in the last few years span from well know Sears, ToysRUs, Limited Brands to … Read more Dashboard for Sales Trends in Retail

WooCommerce Image Gallery | Step by Step, Automate with R

Setting up a WooCommerce image gallery for your shop is a grueling process if you use the online forms. Thankfully, you can import goods and setup an image gallery using a simple CSV file. Now, if you have a few products and a few images for each product, preparing the CSV for bulk import using … Read more WooCommerce Image Gallery | Step by Step, Automate with R

Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

Tuning hyper-parameters might be the most tedious yet crucial in various machine learning algorithms, such as neural networks, svm, or boosting. The configuration of hyper-parameters not only impacts the computational efficiency of a learning algorithm but also determines its prediction accuracy. Thus far, manual tuning and grid searching are still the most prevailing strategies. In … Read more Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

The Face of (Dis)Agreement – Intraclass Correlations

I was recently introduced to Google Dataset Search, an extension that searches for open access datasets. There I stumbled upon this dataset on childrens’ and adult’s ratings of facial expressions. The data comes from a published article by Vesker et al. (2018). Briefly, this study involved having adults and 9-year-old children rate a series of … Read more The Face of (Dis)Agreement – Intraclass Correlations

Building a shiny app to explore historical newspapers: a step-by-step guide

Step 2: Joining the data and the metadata Now that I extracted the data from the ALTO files, and the metadata from the METS files, I stillneed to join both data sets and do some cleaning. What is the goal of joining these two sources?Remember, by doing this I will know which words come from … Read more Building a shiny app to explore historical newspapers: a step-by-step guide

Send UDP Probes (with payloads) and Receive/Process Responses in R

We worked pretty hard over at $DAYJOB on helping to quantify and remediate a fairly significant configuration weakness in Ubiquiti network work gear attached to the internet. Ubiquiti network gear — routers, switches, wireless access points, etc. — are enterprise grade components and are a joy to work with. Our home network is liberally populated … Read more Send UDP Probes (with payloads) and Receive/Process Responses in R

Function Objects and Pipelines in R

Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of … Read more Function Objects and Pipelines in R

Retail Data Visualization with R and Shiny

Introduction Because of my marketing background, finding information hiding wihtin a marketing dataset is always an interesting topic to me. It makes me feel a sense of accomplishment when I cleaned up a very messy large dataset, and finally discover some insights from it. Therefore, I’ve decided to practice my skills of data cleaning and … Read more Retail Data Visualization with R and Shiny