Gold-Mining Week 15 (2018)

The post Gold-Mining Week 15 (2018) appeared first on Fantasy Football Analytics. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and … Read more Gold-Mining Week 15 (2018)

RTutor: Better Incentive Contracts For Road Construction

Since about two weeks, I face a large additional traffic jam every morning due to a construction site on the road. When passing the construction site, often only few people or sometimes nobody seems to be working there. Being an economist, I really wonder how much of such traffic jams could be avoided with better … Read more RTutor: Better Incentive Contracts For Road Construction

Recreating the NBA lead tracker graphic

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on 10 Dec 2018: Taken from https://www.nba.com/games/20181210/LACPHX#/matchup I thought it would be cool to try recreating … Read more Recreating the NBA lead tracker graphic

Twins on the up

Are multiple births on the increase? My twin boys turned 5 years old today. Wow, time flies. Life is never dull, because twins are still seen as something of a novelty, so wherever we go, we find ourselves in conversation with strangers, who are intrigued by the whole thing. In order to save time if … Read more Twins on the up

My introductory course on Bayesian statistics

So, after having held workshops introducing Bayes for a couple of years now, I finally pulled myself together and completed my DataCamp course: Fundamentals of Bayesian Data Analysis in R! ? While it’s called a course, it’s more like a 4 hour workshop and — without requiring anything but basic R skills and a vague … Read more My introductory course on Bayesian statistics

Teaching and Learning Materials for Data Visualization

Data Visualization: A Practical Introduction will begin shipping next week. I’ve written an R package that contains datasets, functions, and a course packet to go along with the book. The socviz package contains about twenty five datasets and a number of utility and convenience functions. The datasets range in size from things with just a … Read more Teaching and Learning Materials for Data Visualization

Scraping the Turkey Accordion

Related To leave a comment for the author, please follow the link and comment on their blog: R on datawookie. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more Scraping the Turkey Accordion

Reading List Faster With parallel, doParallel, and pbapply

I have several tables that I would like to load as a sole data frame. Derived functions from read. table () have a lot of convenient features, but it seems like there is a lot of steps in the implementation that would slow things down. The gain in performance of reading 29 CSV files (about … Read more Reading List Faster With parallel, doParallel, and pbapply

Using ggplot2 for functional time series

I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples. French … Read more Using ggplot2 for functional time series

Network Centrality in R: New ways of measuring Centrality

This is the third post of a series on the concept of “network centrality” withapplications in R and the package netrankr. The last part introduced the concept ofneighborhood-inclusion and its implications for centrality. In this post, weextend the concept to a broader class of dominance relations by deconstructing indicesinto a series of building blocks and … Read more Network Centrality in R: New ways of measuring Centrality

Code for case study – Customer Churn with Keras/TensorFlow and H2O

The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. ? Thank you to the following people for … Read more Code for case study – Customer Churn with Keras/TensorFlow and H2O

Geocomputation with R – the afterword

I am extremely proud to announce that Geocomputation with R is complete.It took Robin, Jannes, and me almost 2 years of collaborative planning, writing, refinement, and deployment to make the book available for anyone interested in open source, command-line approaches for handling geographic data.We’re very happy that it’s now ready to present to the world … Read more Geocomputation with R – the afterword

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary data. Our model will try … Read more Sharing Modeling Pipelines in R

Le Monde puzzle [#1075]

A new Le Monde mathematical puzzle in the digit category: Find the largest number such that each of its internal digits is strictly less than the average of its two neighbours. Same question when all digits differ. For instance, n=96433469 is such a number. When trying pure brute force (with the usual integer2digits function!) le=solz=3 … Read more Le Monde puzzle [#1075]

DB connected R application on open-source Shiny server, part 1

As a follow-up of my previous study of Australian politicians on Twitter I’ve decided to build a more sophisticated, autonomous solution. The idea at glance: Collect regularly tweets from Members of Australian Parliament Store them in the database Visualize findings (in up-to-date state) in web dashboard A goal here is to build a solution that … Read more DB connected R application on open-source Shiny server, part 1

Enter the #DataFramedChallenge for a chance to be on an upcoming podcast segment.

We’ll be back with Season 2 early in 2019 and to keep you thinking, curious and data focused in between seasons, we’re having a DataFramed challenge. The winner will get to join me on a segment here on DataFramed: the challenge is to listen to as many episodes as you can & to tweet excerpts … Read more Enter the #DataFramedChallenge for a chance to be on an upcoming podcast segment.

Reflections on the 10th anniversary of the Revolutions blog

On December 9 2008, very nearly ten years ago, the first post on Revolutions was published. Way back then, this blog was part of a young startup called Revolution Computing, which later became Revolution Analytics. (That name persists to this day in the URL of this blog.) The idea at that time was to introduce … Read more Reflections on the 10th anniversary of the Revolutions blog

5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry

The post 5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry appeared first on The Lucid Manager. When I studied civil engineering some decades ago, we solved all our computing problems by writing code. Writing in BASIC or PASCAL, I could quickly perform fundamental engineering analysis, such as reinforced concrete beams, with my … Read more 5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry

The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

Previously in this series: I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probability puzzles can be productively solved through Monte Carlo simulations in R. Here’s one that caught my attention: Suppose that a knight makes a “random walk” on an infinite chessboard. Specifically, … Read more The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

Great post Yash!

Great post Yash! For those readers interested in getting data from the fitbit API using R I’ve documented the process here: https://towardsdatascience.com/the-gamification-of-fitbit-how-an-api-provided-the-next-level-of-training-eaf7b267af00 Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, … Read more Great post Yash!

The Need for Speed Part 1: Building an R Package with Fortran (or C)

Everyone who has ever used R has, at one time or another, wished for an increase in R’s speed. If you haven’t, you’re not using R hard enough! Recently, as part of some research on credibility, I was calculating layer loss costs for millions of simulated loss observations. As I progressed, the R markdown document … Read more The Need for Speed Part 1: Building an R Package with Fortran (or C)

An 8-hour course on R and Data Mining

I will run an 8-hour course on R and Data Mining at Black Mountain, CSIRO, Australia on 10 & 13 December 2018. The course materials, incl. slides, R scripts and datasets, are available at http://www.rdatamining.com/training/course. Below is outline of the course. Part I:– R Programming: basics of R language and programming, parallel computing, and data … Read more An 8-hour course on R and Data Mining

CRAN Release of R/exams 2.3-2

New minor release of the R/exams package to CRAN, containing a range of smaller improvements and bug fixes. Notably scanning of written NOPS exams is enhanced and made more reliable and a new exercise template demonstrates how to use advanced processing of numeric answers in Moodle. Version 2.3-2 of the one-for-all exams generator R/exams has … Read more CRAN Release of R/exams 2.3-2

Canada Map

I taught my Data Visualization seminar in Philadelphia this past Friday and Saturday. It covers most of the content of my book, including a unit on making maps. The examples in the book are from the United States. But what about other places? Two of the participants were from Canada, and so here’s an example … Read more Canada Map

Smartly select and mutate data frame columns, using dict

Motivation The dplyr functions select and mutate nowadays are commonly applied to perform data.frame column operations, frequently combined with magrittrs forward %>% pipe. While working well interactively, however, these methods often would require additional checking if used in “serious” code, for example, to catch column name clashes. In principle, the container package provides a dict-class … Read more Smartly select and mutate data frame columns, using dict

It was twenty years ago …

… this week that I made a first cameo in the debian/changelog for the Debian R package: r-base (0.63.1-1) unstable; urgency=low New upstream release Linked html directory to /usr/doc/r-base/doc/html (Dirk Eddelbuettel) – Douglas Bates [email protected] Fri, 4 Dec 1998 14:22:19 -0600 For the next few years I assisted Doug here and there, and then formally … Read more It was twenty years ago …

Automated Dashboard visualizations with distribution in R

Categories Programming Tags Data Visualisation R Markdown R Programming In this article, you learn how to make Automated Dashboard visualizations with distribution in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R. After this you type … Read more Automated Dashboard visualizations with distribution in R

R some blog 2018-12-08 04:19:00

Motivation The dplyr functions select and mutate nowadays are commonly applied to perform data.frame column operations, frequently combined with magrittrs forward %>% pipe. While working well interactively, however, these methods often would require additional checking if used in “serious” code, for example, to catch column name clashes. In principle, the container package provides a dict-class … Read more R some blog 2018-12-08 04:19:00

“Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

TLDR: Begin with N of 10, increase by 10 until p < 0.05 or max N reached. This design has inflated type-I error. Lower p-value threshold needed to ensure specified type-I error rate. The number of interim analyses and max N affect the type-I error rate. Threshold can be identified using simulation. A recent Facebook … Read more “Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

Automated Dashboard Visualizations with Ranking in R

Categories Programming Tags Data Visualisation R Markdown R Programming In this article, you learn how to make Automated Dashboard Visualizations with Ranking in R. First you need to install the `rmarkdown` package into your R library. Assuming that you installed the `rmarkdown`, next you create a new `rmarkdown` script in R. After this you type … Read more Automated Dashboard Visualizations with Ranking in R

Shinyfit: Advanced regression modelling in a shiny app

Many of our projects involve getting doctors, nurses, and medical students to collect data on the patients they are looking after. We want to involve many of them in data analysis, without the requirement for coding experience or access to statistical software. To achieve this we have built Shinyfit, a shiny app for linear, logistic, … Read more Shinyfit: Advanced regression modelling in a shiny app

R community update: announcing useR Delhi December meetup and CFP

Time really does fly. It’s been 5 months since Delhi NCR useR group had come into being and our first meetup. It was a successful event which included sessions featuring an R-core member and a veteran data scientist. More importantly, the 50+ community members who’d turned up took part in stimulating discussions and got to … Read more R community update: announcing useR Delhi December meetup and CFP

Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

Nov 27, 2018 I recently leveled up to fatherhood. That’s why I am currently on 5 months of parental leave (thank’s to the awesome team @store2be for going along with this!). Every morning at around 5am, I leave the bedroom with my son for the kitchen so his mom can have two real hours of … Read more Map the solar system to a place near you –A NatGeo’s MARS inspired Shiny web app

R Functions for Bayesian Stats and Summaries

A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.

R vs Python: Image Classification with Keras

Many data professionals are strict on the language to be used for ANN models limiting their dev. environment exclusively to Python. I decided to test performance of Python vs. R in terms of time required to train a convolutional neural network based model for image recognition. As the starting point, I took the blog post … Read more R vs Python: Image Classification with Keras

Automatic GPUs

A reproducible R / Python approach to getting up and running quickly on GCloud with GPUs in Tensorflow “A high view of a sea of clouds covering a mountain valley in the Dolomites” by paul morris on Unsplash Backstory After completing Google’s excellent Data Engineering Certified Specialization on Coursera recently (*which I highly recommend), I … Read more Automatic GPUs