Visualizing New York City WiFi Access with K-Means Clustering

Categories Advanced Modeling Tags K Means R Programming Unsupervised Learning Visualization has become a key application of data science in the telecommunications industry. Specifically, telecommunication analysis is highly dependent on the use of geospatial data. This is because telecommunication networks in themselves are geographically dispersed, and analysis of such dispersions can yield valuable insights regarding … Read more

Categories R Tags ExcerptFavorite

rstudio::conf 2019 Workshop materials now available

rstudio::conf 2019 featured 15 workshops on tidyverse, Shiny, R Markdown, modeling and machine learning, deep learning, big data, and what they forgot to teach you about working with R. Some of the new workshops for this year touched on topics like putting Shiny applications into production at scale and R & Tensorflow. The conference also … Read more

Categories R Tags ExcerptFavorite

R for Quantitative Health Sciences: An Interview with Jarrod Dalton

This interview came about through researching R-based medical applications in preparation for the upcoming R/Medicine conference. When we discovered the impressive number of Shiny-based Risk Calculators developed by the Cleveland Clinic and implemented in public-facing sites, we wanted to learn more about the influence of R Language in the development of statistical science at this … Read more

Categories R Tags ExcerptFavorite

R for trial and model-based cost-effectiveness analysis

9 July 2019, University College London Training event (8 July): Torrington (1-19) B07 – Teal Room in Torrington Place, 1-19 (), University College London, United Kingdom Main workshop (9 July): Anatomy G29 J Z Young Lecture Theatre, UCL Medical Sciences and Anatomy (https://goo.gl/maps/biryoFc9CiL2), University College London, United Kingdom. Background and objectives It is our pleasure … Read more

Categories R Tags ExcerptFavorite

Ideally, this shouldn’t be happening for such a deep network.

Ideally, this shouldn’t be happening for such a deep network. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you … Read more

Categories R Tags ExcerptFavorite

Version 0.7.0 of NIMBLE released

We’ve released the newest version of NIMBLE on CRAN and on our website. NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods (such as MCMC and SMC). Version 0.7.0 provides a variety of new features, as well as various bug fixes. New features include: … Read more

Categories R Tags ExcerptFavorite

The finalfit tables gallery has all the variations you could possibly want

The new finalfit tables gallery vignette is an excellent reference and quick tutorial describing the variety of table outputs available. It focuses on crosstables and regression tables, and demonstrates how to easily generate results in R and export them to Word, PDF or html. https://finalfit.org/articles/tables_gallery.html Related To leave a comment for the author, please follow … Read more

Categories R Tags ExcerptFavorite

Learning Data Science: Modelling Basics

Data Science is all about building good models, so let us start by building a very simple model: we want to predict monthly income from age (in a later post we will see that age is indeed a good predictor for income). For illustrative purposes we just make up some numbers for age and income, … Read more

Categories R Tags ExcerptFavorite

Using the uniform sum distribution to introduce probability

I’ve never taught an intro probability/statistics course. If I ever did, I would certainly want to bring the underlying wonder of the subject to life. I’ve always found it almost magical the way mathematical formulation can be mirrored by computer simulation, the way proof can be guided by observed data generation processes, and the way … Read more

Categories R Tags ExcerptFavorite

Brandeis and Hugo discuss people of color and under-represented groups in data science.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Brandeis Marshall, Associate Professor of Computer Science in the Computer and Information Sciences Department at Spelman College. Here is the podcast link. Hugo: Hi there, Brandeis, and welcome to DataFramed. Brandeis: Well, thank you. Wonderful to be here. Hugo: It’s such a pleasure to … Read more

Categories R Tags ExcerptFavorite

Organizing R Research Projects: CPAT, A Case Study

Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer. First, some background. In the early months of 2016 I began a research project with my current Ph.D. advisor … Read more

Categories R Tags ExcerptFavorite

An overview of the NLP ecosystem in R (#nlproc #textasdata)

At BNOSAC, R is used a lot to perform text analytics as it is an excellent tool that provides anything a data scientist needs to perform data analysis on text in a business settings. For users unfamiliar with all the possibilities that the wealth of R packages offers regarding text analytics, we’ve made this small … Read more

Categories R Tags ExcerptFavorite

Dashboard for Sales Trends in Retail

Overview Retail is probably the most talked about industry when it comes to disruption these days. Empty malls are a common blog topic and unusually high number of bankruptcies span across all subsectors. Some of the familiar names that filed for bankruptcy in the last few years span from well know Sears, ToysRUs, Limited Brands to … Read more

Categories R Tags ExcerptFavorite

WooCommerce Image Gallery | Step by Step, Automate with R

Setting up a WooCommerce image gallery for your shop is a grueling process if you use the online forms. Thankfully, you can import goods and setup an image gallery using a simple CSV file. Now, if you have a few products and a few images for each product, preparing the CSV for bulk import using … Read more

Categories R Tags ExcerptFavorite

Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

Tuning hyper-parameters might be the most tedious yet crucial in various machine learning algorithms, such as neural networks, svm, or boosting. The configuration of hyper-parameters not only impacts the computational efficiency of a learning algorithm but also determines its prediction accuracy. Thus far, manual tuning and grid searching are still the most prevailing strategies. In … Read more

Categories R Tags ExcerptFavorite

The Face of (Dis)Agreement – Intraclass Correlations

I was recently introduced to Google Dataset Search, an extension that searches for open access datasets. There I stumbled upon this dataset on childrens’ and adult’s ratings of facial expressions. The data comes from a published article by Vesker et al. (2018). Briefly, this study involved having adults and 9-year-old children rate a series of … Read more

Categories R Tags ExcerptFavorite

A little trick for debugging Shiny

This is gonna be a short post about a little trick I’ve been using while developing Shiny Apps. (Spoiler: nothing revolutionary) A browser anywhere, anytime The first thing to do is to insert an action button, and a browser() in the observeEvent() watching this button. This is a standard approach: at any time, you just … Read more

Categories R Tags ExcerptFavorite

Send UDP Probes (with payloads) and Receive/Process Responses in R

We worked pretty hard over at $DAYJOB on helping to quantify and remediate a fairly significant configuration weakness in Ubiquiti network work gear attached to the internet. Ubiquiti network gear — routers, switches, wireless access points, etc. — are enterprise grade components and are a joy to work with. Our home network is liberally populated … Read more

Categories R Tags ExcerptFavorite

Function Objects and Pipelines in R

Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of … Read more

Categories R Tags ExcerptFavorite

Retail Data Visualization with R and Shiny

Introduction Because of my marketing background, finding information hiding wihtin a marketing dataset is always an interesting topic to me. It makes me feel a sense of accomplishment when I cleaned up a very messy large dataset, and finally discover some insights from it. Therefore, I’ve decided to practice my skills of data cleaning and … Read more

Categories R Tags ExcerptFavorite

Building Our Own Open Source Supercomputer with R and AWS

How to build a scaleable computing cluster on AWS and run hundreds orthousands of models in a short amount of time. We will completely rely on R andopen source R packages. This is post 1 out of 2. Introduction An ever-increasing number of businesses is moving to the cloud and usingplatforms such as Amazon Web … Read more

Categories R Tags ExcerptFavorite

R Package Update: urlscan

The urlscan package (an interface to the urlscan.io API) is now at version 0.2.0 and supports urlscan.io’s authentication requirement when submitting a link for analysis. The service is handy if you want to learn about the details — all the gory technical details — for a website. For instance, say you wanted to check on … Read more

Categories R Tags ExcerptFavorite

Synthesising Multiple Linked Data Sets and Sequences in R

In my last post I looked at generating synthetic data sets with the ‘synthpop’ package, some of the challenges and neat things the package can do. It is simple to use which is great when you have a single data set with independent features. This post will build on the last post by tackling other … Read more

Categories R Tags ExcerptFavorite

Multiple Data (Time Series) Streams Clustering

Related To leave a comment for the author, please follow the link and comment on their blog: Peter Laurinec. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics … Read more

Categories R Tags ExcerptFavorite

Navigate through Decennial Census and American Community Survey

Finding the right content in census data can be daunting. Just give you an idea how complex the census data are, there are 1127 tables and 25070 columns of table contents in the 2012-2017 ACS 5-year summary file alone. 2010 decennial census summary file 1 333 8959 2012-2017 5-year ACS summary file 1127 25070 2017 … Read more

Categories R Tags ExcerptFavorite

The power of tapping into your community for support

This week the owner of my favorite Mexican restaurant in Baltimore, Rosalyn Vera, got death and arson threats. I could have been a bystander, but I tapped into my network and asked for help and she has received it. It’s been great to see the power of the community in action. The backstory So, I … Read more

Categories R Tags ExcerptFavorite

Homebrew 2.0.0 Released == homebrewanalytics package updated

A major new release of Homebrew has landed and now includes support for Linux as well as Windows! via the Windows Subsystem for Linux. There are overall stability and speed improvements baked in as well. The aforelinked notification has all the info you need to see the minutiae. Unless you’ve been super-lax in updating, brew … Read more

Categories R Tags ExcerptFavorite

Simulating the Six Nations 2019 Rugby Tournament in R

I really like running simulation models before sporting events because they can give you so much more depth of understanding compared to the ‘raw’ odds that you get from the media or bookmakers, etc.  Yes, a team might have a “30% chance of winning a tournament we might hear”.  But there might be another strong … Read more

Categories R Tags ExcerptFavorite

Setting up your blog with RStudio and blogdown II: Workflow

Workflow In Part I of this series of post we setup our new blog using blogdown and Hugo. Once the blog is configured, this is the typical workflow I follow to write new posts and update my blog online: Open your blog project with RStudio Load the blogdown library and start the Hugo server and … Read more

Categories R Tags ExcerptFavorite

Tutorial: Sequential Pattern Mining in R for Business Recommendations

by Allison Koenecke, Data Scientist, AI & Research Group at Microsoft, with acknowledgements to Amita Gajewar and John-Mark Agosta. In this tutorial, Allison Koenecke demonstrates how Microsoft could recommend to customers the next set of services they should acquire as they expand their use of the Azure Cloud, by using a temporal extension to conventional … Read more

Categories R Tags ExcerptFavorite

Mandalaxies

One cannot escape the feeling that these mathematical formulas have an independent existence and an intelligence of their own, that they are wiser than we are, wiser even than their discoverers (Heinrich Hertz) I love spending my time doing mathematics: transforming formulas into drawings, experimenting with paradoxes, learning new techniques … and R is a perfect … Read more

Categories R Tags ExcerptFavorite

dqrng v0.0.5: New and updated RNGs

A new version of dqrng has made it onto the CRAN servers after a brief hick-up. Thanks to the CRAN team in general and Uwe Ligges in particular for their relentless efforts. This versions adds a new RNG to be used together with the provided distribution functions: The 64 bit version of the 20 rounds … Read more

Categories R Tags ExcerptFavorite

recogeo: A new R package to reconcile changing geographies boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain … Read more

Categories R Tags ExcerptFavorite

Quantile regression in R

Quantile regression: what is it? Let be some response variable of interest, and let be a vector of features or predictors that we want to use to model the response. In linear regression, we are trying to estimate the conditional mean function, , by a linear combination of the features. While the conditional mean function … Read more

Categories R Tags ExcerptFavorite

rOpenSci Software Peer Review: Still Improving

rOpenSci’s suite of packages is comprised of contributions from staff engineers and the wider R community, bringing considerable diversity of skills, expertise and experience to bear on the suite. How do we ensure that every package is held to a high standard? That’s where our software review system comes into play: packages contributed by the … Read more

Categories R Tags ExcerptFavorite

How GPL makes me leave R for Python :-(

Being a data scientist in a startup I can program with several languages, but often R is a natural choice. Recently I wanted my company to build a product based on R. It simply seemed like a perfect fit. But this turned out to be a slippery slope into the open-source code licensing field, which … Read more

Categories R Tags ExcerptFavorite

Book review: Beyond Spreadsheets with R

Disclaimer: Manning publications gave me the ebook version of Beyond Spreadsheets with R – A beginner’s guide to R and RStudio by Dr. Jonathan Carroll free of charge. Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques … Read more

Categories R Tags ExcerptFavorite

Announcing new software peer review editors: Melina Vidoni and Brooke Anderson

We are pleased to welcome Brooke Anderson and Melina Vidoni to our team of Associate Editors for rOpenSci Software Peer Review. They join Scott Chamberlain, Anna Krystalli, Lincoln Mullen, Karthik Ram, Noam Ross and Maëlle Salmon. With the addition of Brooke and Melina, our editorial board now includes four women and four men, located in … Read more

Categories R Tags ExcerptFavorite

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

I have been playing around with historical newspaper data (seehere andhere). I have extracted thedata from the largest archive available, as described in the previous blog post, and now createda shiny dashboard where it is possible to visualize the most common words per article, as well asread a summary of each article.The summary was made … Read more

Categories R Tags ExcerptFavorite

missing digit in a 114 digit number [a Riddler’s riddle]

A puzzling riddle from The Riddler (as Le Monde had a painful geometry riddle this week): this number with 114 digits 530,131,801,762,787,739,802,889,792,754,109,70?,139,358,547,710,066,257,652,050,346,294,484,433,323,974,747,960,297,803,292,989,236,183,040,000,000,000 is missing one digit and is a product of some of the integers between 2 and 99. By comparison, 76! and 77! have 112 and 114 digits, respectively. While 99! has 156 digits. … Read more

Categories R Tags ExcerptFavorite

Fast Static Maps Built with R

Luke Whyte posted an article (apologies for a Medium link) over on Towards Data Science showing how to use a command line workflow involving curl, node and various D3 libraries and javascript source files to build a series of SVG static maps. It’s well written and you should give it a read especially since he … Read more

Categories R Tags ExcerptFavorite

Price’s Protein Puzzle: 2019 update

Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s … Read more

Categories R Tags ExcerptFavorite

R Markdown Template for Business Reports

In this post I’d like to introduce the R Markdown template for business reports by INWTlab. It’s been my aim to have a nice and clean template that is easy to customize in colors, cover and logo. I know there are quite a few templates available, but I was missing one to be used in … Read more

Categories R Tags ExcerptFavorite

Quick Hit: Using seymour to Subscribe to your Git[la|hu]b Repo Issues in Feedly

The seymour Feedly API package has been updated to support subscribing to RSS/Atom feeds. Previously the package was intended to just treat your Feedly as a data source, but there was a compelling use case for enabling subscription support: subscribing to code repository issues. Sure, there’s already email notice integration for repository issues on most … Read more

Categories R Tags ExcerptFavorite

New R package: load and chart oceanic storms

Mapping historical storms data is now a little bit easier. Off the back of this blog, I have authored an R package (available at basilesimon/noaastorms) that downloads, cleans and parses NOAA IBtrack data for you. The National Oceanic and Atmospheric Administration releases datasets known as International Best Track Archive for Climate Stewardship. These datasets are … Read more

Categories R Tags ExcerptFavorite

Time Travel with RStudio Package Manager 1.0.4

We all love packages. We don’t love when broken package environments prevent usfrom reproducing our work. In version 1.0.4 of RStudio Package Manager,individuals and teams can navigate through repository checkpoints,making it easy to recreate environments and reproduce work. The new release alsoadds important security updates, improvements for Git sources, further access toretired packages, and beta … Read more

Categories R Tags ExcerptFavorite

December 2108: “Top 40” New CRAN Packages

By my count, 157 new packages stuck to CRAN in December. Below are my “Top 40” picks in ten categories: Computational Methods, Data, Finance, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities and Visualization. This is the first time I have used the Medicine category. I am pleased that a few packages that appear to … Read more

Categories R Tags ExcerptFavorite