Data, movies and ggplot2

Yet another boring barplot?No!I’ve asked my students from MiNI WUT to visualize some data about their favorite movies or series.Results are pretty awesome.Believe me or not, but charts in these posters are created with ggplot2 (most of them)! Star Wars Fan of StaR WaRs? Find out which color is the most popular for lightsabers!Yes, these … Read more Data, movies and ggplot2

Spinning Pins

Condenado a estar toda la vida, preparando alguna despedida (Desarraigo, Extremoduro) I live just a few minutes from the Spanish National Museum of Science and Technology (MUNCYT), where I use to go from time to time with my family. The museum is plenty of interesting artifacts, from a portrait of Albert Einstein made with thousands … Read more Spinning Pins

Alternative approaches to scaling Shiny with RStudio Shiny Server, ShinyProxy or custom architecture.

Shiny is a great tool for fast prototyping. When a data science team creates a Shiny app, sometimes it becomes very popular. From that point this app becomes a tool used on production by many people, that should be reliable and work fast for many concurrent users. There are many ways to optimize a Shiny app like … Read more Alternative approaches to scaling Shiny with RStudio Shiny Server, ShinyProxy or custom architecture.

vtreat Variable Importance

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes). … Read more vtreat Variable Importance

Statistics in Glaucoma: Part III

Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science. Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University. Looking Forward in Glaucoma Progression Research The contribution of the womblR package and corresponding statistical methodology is a technique for correctly accounting … Read more Statistics in Glaucoma: Part III

rcites – The story behind the package

The Ecology Hackathon Almost one year ago now, ecologists filled a room for the “Ecology Hackathon: Developing R Packages for Accessing, Synthesizing and Analyzing Ecological Data” that was co-organised by rOpenSci Fellow, Nick Golding and Methods in Ecology and Evolution. This hackathon was part of the “Ecology Across Borders” Joint Annual Meeting 2017 of BES, … Read more rcites – The story behind the package

An R Shiny app to recognize flower species

Introduction Playing around with PyTorch and R Shiny resulted in a simple Shiny app where the user can upload a flower image, the system will then predict the flower species. Steps that I took Download labeled flower data from the Visual Geometry Group, Install Pytorch and download their transfer learning tutorial script, You need to … Read more An R Shiny app to recognize flower species

Phillips-Ouliaris Test For Cointegration

In a project of developing PPNR balance projection models, I tried to use the Phillips-Ouliaris (PO) test to investigate the cointegration between the historical balance and a set of macro-economic variables and noticed that implementation routines of PO test in various R packages, e.g. urca and tseries, would give different results. After reading through the … Read more Phillips-Ouliaris Test For Cointegration

2018-13 Rendering HTML Content in R Graphics

This report describes several R packages that allow HTML content to be rendered as part of an R plot. The core package is called ‘layoutEngine’, but that package requires a “backend” package to perform HTML layout calculations. Three example backends are demonstrated: ‘layoutEngineCSSBox’, ‘layoutEnginePhantomJS’, and ‘layoutEngineDOM’. We also introduce two new font packages, ‘gyre’ and … Read more 2018-13 Rendering HTML Content in R Graphics

Minimum CRPS vs. maximum likelihood

In a new paper in Monthly Weather Review, minimum CRPS and maximum likelihood estimation are compared for fitting heteroscedastic (or nonhomogenous) regression models under different response distributions. Minimum CRPS is more robust to distributional misspecification while maximum likelihood is slightly more efficient under correct specification. An R implementation is available in the crch package. Citation … Read more Minimum CRPS vs. maximum likelihood

Quoting Concatenate

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more. My position on code-capturing interfaces (or non-standard-evaluation/NSE) is: if poorly handled, they can be a large interface price/risk to pay for the minor convenience of … Read more Quoting Concatenate

Word associations from the Small World of Words

Do you subscribe to the Data is Plural newsletter from Jeremy Singer-Vine? You probably should, because it is a treasure trove of interesting datasets arriving in your email inbox. In the November 28 edition, Jeremy linked to the Small World of Words project, and I was entranced. I love stuff like that, all about words … Read more Word associations from the Small World of Words

Request for comments on planned features for futile.logger 1.5

I will be pushing a new version of futile.logger (version 1.5) to CRAN in January. This version introduces a number of enhancements and fixes some bugs. It will also contain at least one breaking change. I am making the release process public, since the package is now used in a number of other packages. If … Read more Request for comments on planned features for futile.logger 1.5

Advent of Code: Most Popular Languages

You might have heard of the Advent of Code,a 25-day challenge involving a programming puzzle a day, to be solvedwith the language of your choice. I’ve noted the popularity of thisactivity in my Twitter timeline but also in my GitHub timeline whereI’ve seen the creation of a few advent-of-code or so repositories. AoC is largely … Read more Advent of Code: Most Popular Languages

Learning R: A gentle introduction to higher-order functions

Have you ever thought about why the definition of a function in R is different from many other programming languages? The part that causes the biggest difficulties (especially for beginners of R) is that you state the name of the function at the beginning and use the assignment operator – as if functions were like … Read more Learning R: A gentle introduction to higher-order functions

In case you missed it: November 2018 roundup

Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, … Read more In case you missed it: November 2018 roundup

My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

The second edition of my book ‘Deep Learning from first principles:Second Edition- In vectorized Python, R and Octave’, is now available on Amazon, in both paperback ($14.99)  and kindle ($9.99/Rs449/-)  versions. Since this book is almost 70% code, all functions, and code snippets have been formatted to use the fixed-width font ‘Lucida Console’. In addition … Read more My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

Pdftools 2.0: powerful pdf text extraction tools

A new version of pdftools has been released to CRAN. Go get it while it’s hot: install.packages(“pdftools”) This version has two major improvements: low level text extraction and encoding improvements. About PDF textboxes A pdf document may seem to contain paragraphs or tables in a viewer, but this is not actually true. PDF is a … Read more Pdftools 2.0: powerful pdf text extraction tools

Easy CI/CD of GPU applications on Google Cloud including bare-metal using Gitlab and Kubernetes

Summary Are you a data scientist who only wants to focus on modelling and coding and not on setting up a GPU cluster? Then, this blog might be interesting for you. We developed an automated pipeline using gitlab and Kubernetes that is able to run code in two GPU environments, GCP and bare-metal; no need … Read more Easy CI/CD of GPU applications on Google Cloud including bare-metal using Gitlab and Kubernetes

Yet another visualization of the Bayesian Beta-Binomial model

The Beta-Binomial model is the “hello world” of Bayesian statistics. That is, it’s the first model you get to run, often before you even know what you are doing. There are many reasons for this: It only has one parameter, the underlying proportion of success, so it’s easy to visualize and reason about. It’s easy … Read more Yet another visualization of the Bayesian Beta-Binomial model

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%. Piping Piping is … Read more Reusable Pipelines in R

R community update: announcing sessions for useR Delhi December meetup

As referenced in my last blog post, useR Delhi NCR is all set to host our second meetup on 15th December, i.e. upcoming Saturday. We’ve finalized two exciting speaker sessions for the same. They’re as follows: Basics of Shiny and geospatial visualizations by Sean Angiolillo Up in the air: cloud storage based workflows in R … Read more R community update: announcing sessions for useR Delhi December meetup

Gold-Mining Week 15 (2018)

The post Gold-Mining Week 15 (2018) appeared first on Fantasy Football Analytics. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and … Read more Gold-Mining Week 15 (2018)

RTutor: Better Incentive Contracts For Road Construction

Since about two weeks, I face a large additional traffic jam every morning due to a construction site on the road. When passing the construction site, often only few people or sometimes nobody seems to be working there. Being an economist, I really wonder how much of such traffic jams could be avoided with better … Read more RTutor: Better Incentive Contracts For Road Construction

Recreating the NBA lead tracker graphic

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on 10 Dec 2018: Taken from https://www.nba.com/games/20181210/LACPHX#/matchup I thought it would be cool to try recreating … Read more Recreating the NBA lead tracker graphic

Twins on the up

Are multiple births on the increase? My twin boys turned 5 years old today. Wow, time flies. Life is never dull, because twins are still seen as something of a novelty, so wherever we go, we find ourselves in conversation with strangers, who are intrigued by the whole thing. In order to save time if … Read more Twins on the up

My introductory course on Bayesian statistics

So, after having held workshops introducing Bayes for a couple of years now, I finally pulled myself together and completed my DataCamp course: Fundamentals of Bayesian Data Analysis in R! ? While it’s called a course, it’s more like a 4 hour workshop and — without requiring anything but basic R skills and a vague … Read more My introductory course on Bayesian statistics

Teaching and Learning Materials for Data Visualization

Data Visualization: A Practical Introduction will begin shipping next week. I’ve written an R package that contains datasets, functions, and a course packet to go along with the book. The socviz package contains about twenty five datasets and a number of utility and convenience functions. The datasets range in size from things with just a … Read more Teaching and Learning Materials for Data Visualization

Scraping the Turkey Accordion

Related To leave a comment for the author, please follow the link and comment on their blog: R on datawookie. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more Scraping the Turkey Accordion

Reading List Faster With parallel, doParallel, and pbapply

I have several tables that I would like to load as a sole data frame. Derived functions from read. table () have a lot of convenient features, but it seems like there is a lot of steps in the implementation that would slow things down. The gain in performance of reading 29 CSV files (about … Read more Reading List Faster With parallel, doParallel, and pbapply

Using ggplot2 for functional time series

I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples. French … Read more Using ggplot2 for functional time series

Network Centrality in R: New ways of measuring Centrality

This is the third post of a series on the concept of “network centrality” withapplications in R and the package netrankr. The last part introduced the concept ofneighborhood-inclusion and its implications for centrality. In this post, weextend the concept to a broader class of dominance relations by deconstructing indicesinto a series of building blocks and … Read more Network Centrality in R: New ways of measuring Centrality

Code for case study – Customer Churn with Keras/TensorFlow and H2O

The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. ? Thank you to the following people for … Read more Code for case study – Customer Churn with Keras/TensorFlow and H2O

Geocomputation with R – the afterword

I am extremely proud to announce that Geocomputation with R is complete.It took Robin, Jannes, and me almost 2 years of collaborative planning, writing, refinement, and deployment to make the book available for anyone interested in open source, command-line approaches for handling geographic data.We’re very happy that it’s now ready to present to the world … Read more Geocomputation with R – the afterword