First Mile

The Electric Pulse Thomas Parker Electric Car (1895) | Fisker Karma (2012) The credit to who invented the first electric vehicle is also debated due to the fact that many scientists and tinkerers were working with various forms of electric sources (batteries and electric motors) around the same time. However, there is a prominent name in electric … Read more

The Kernel Trick

The Kernel Trick We have seen how higher dimensional transformations can allow us to separate data in order to make classification predictions. It seems that in order to train a support vector classifier and optimize our objective function, we would have to perform operations with the higher dimensional vectors in the transformed feature space. In real … Read more

Amazon Customer Analysis

User review networks for customer segmentation Over the past decade or two, Americans have continued to prefer payment methods that are traceable, providing retailers and vendors with a rich source of data on their customers. This data is used by data scientists to help businesses make more informed decisions with respect to inventory, marketing, and … Read more

A Guide for Building Convolutional Neural Networks

Computer Vision it at the forefront of advancements in Artificial Intelligence (AI). It’s moving fast with new research coming out each and every day allowing us to do truly amazing things that we could’t do before with computers and AI. Convolutional Neural Networks (CNNs) are the driving force behind every advancement in Computer Vision research … Read more

The invisible workers of the AI era

50 ways to label data There are different ways to get your data labeled. Some firms label their data themselves — although this can be costly, as hiring people simply for these tasks costs firms both money and flexibility. Other companies even find ways to get people to label their data for free. Ever wonder why Google’s reCAPTCHA … Read more

Day 12 – little helper dive

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

AI and Machine Learning: Moving from Training to Education

The debate of whether AI will ever achieve capabilities at par or beyond human intelligence is ever ongoing. It certainly has intensified with the recent advancements in AI, Machine Learning (ML), and Deep Learning (DL) with some believing that the current technologies are already capable of paving the way for Artificial General Intelligence (AGI). You … Read more

Visualizing Hurricane Data with Shiny

Motivation for Project Around the time that I was selecting a topic for this project, my parents and my hometown found themselves in the path of a Category 1 hurricane. Thankfully, everyone was ok, and there was only minor damage to their property. But this event made me think about how long it had been … Read more

Categories R Tags ExcerptFavorite

Scraping the Turkey Accordion

Related To leave a comment for the author, please follow the link and comment on their blog: R on datawookie. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more

Categories R Tags ExcerptFavorite

Towards Ethical Machine Learning I quit my job to enter an intensive data science bootcamp. I understand the value behind the vast amount of data available that enables us to create predictive machine learning algorithms. In addition to recognizing its value on a professional level, I benefit from these technologies as a consumer. Whenever I find myself in … Read more

Reading List Faster With parallel, doParallel, and pbapply

I have several tables that I would like to load as a sole data frame. Derived functions from read. table () have a lot of convenient features, but it seems like there is a lot of steps in the implementation that would slow things down. The gain in performance of reading 29 CSV files (about … Read more

Categories R Tags ExcerptFavorite

Using ggplot2 for functional time series

I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples. French … Read more

Categories R Tags ExcerptFavorite

Network Centrality in R: New ways of measuring Centrality

This is the third post of a series on the concept of “network centrality” withapplications in R and the package netrankr. The last part introduced the concept ofneighborhood-inclusion and its implications for centrality. In this post, weextend the concept to a broader class of dominance relations by deconstructing indicesinto a series of building blocks and … Read more

Categories R Tags ExcerptFavorite

Geocomputation with R – the afterword

I am extremely proud to announce that Geocomputation with R is complete.It took Robin, Jannes, and me almost 2 years of collaborative planning, writing, refinement, and deployment to make the book available for anyone interested in open source, command-line approaches for handling geographic data.We’re very happy that it’s now ready to present to the world … Read more

Categories R Tags ExcerptFavorite

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary data. Our model will try … Read more

Categories R Tags ExcerptFavorite

Le Monde puzzle [#1075]

A new Le Monde mathematical puzzle in the digit category: Find the largest number such that each of its internal digits is strictly less than the average of its two neighbours. Same question when all digits differ. For instance, n=96433469 is such a number. When trying pure brute force (with the usual integer2digits function!) le=solz=3 … Read more

Categories R Tags ExcerptFavorite

How to give money to the R project

by Mark Niemann-Ross, an author, educator, and writer who teaches about R and Raspberry Pi at LinkedIn Learning I spend a LOT of time at, in particular the sections for documentation and CRAN. But I hadn’t spent much time in the other areas: R Project, R Foundation, and links. When I recently wandered into the foundation area, … Read more

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more

Pew Study Answers on Artificial Intelligence and the Future of Humans

The AI future is uncertain, but generally, I think it will improve life. I was one of the 900+ futurists interviewed for The Pew Research study released yesterday, “Artificial Intelligence and the Future of Humans.” Conducted with Elon University, the study revolved around AI and the 50th anniversary of the Internet. The report asked three questions … Read more

Classification (Part 2) — Linear Discriminant Analysis

An explanation of Bayes’ theorem and linear discriminant analysis Photo by Jerry Kiesewetter on Unsplash Overview Previously, logistic regression was introduced for classification. Unfortunately, like any model, it presents some flaws: When classes are well separated, parameters estimate from logistic regression tend to be unstable When the data set is small, logistic regression is also unstable … Read more

Day 11 – little helper trim

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

DB connected R application on open-source Shiny server, part 1

As a follow-up of my previous study of Australian politicians on Twitter I’ve decided to build a more sophisticated, autonomous solution. The idea at glance: Collect regularly tweets from Members of Australian Parliament Store them in the database Visualize findings (in up-to-date state) in web dashboard A goal here is to build a solution that … Read more

Categories R Tags ExcerptFavorite

AWS Architecture For Your Machine Learning Solutions

The Undertaking Recently, I was involved in developing a machine learning solution for one of the largest North American steel manufacturers. The company wanted to leverage the power of ML to get insights on customer segmentation, order prediction and product-volume recommendations. This article revolves around why and how we leveraged AWS for deploying our deliverables … Read more

How to tune a BigQuery ML classification model to achieve a desired precision or recall

Select the probability threshold based on the ROC curve BigQuery provides an incredibly convenient way to train machine learning models on large, structured datasets. In an earlier article, I showed you how to train a classification model to predict flight delays. Here’s the SQL query that will predict whether a flight is going to be late … Read more

Are we there yet?

Q: How does a data scientist manage projects and teams? How do you make duration and resource estimations? These are great questions, and I think people don’t ask them enough, which is one of the main reasons I wrote Think Like a Data Scientist. One full chapter is dedicated to project planning and another full … Read more

How to deploy a predictive service to Kubernetes with R and the AzureContainers package

It’s easy to create a function in R, but what if you want to call that function from a different application, with the scale to support a large number of simultaneous requests? This article shows how you can deploy an R fitted model as a Plumber web service in Kubernetes, using Azure Container Registry (ACR) and … Read more

Implementing Defensive Design in AI Deployments

A series of insights and battle scars from the world of medical device design With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also shows up once in a while with our consulting engagements, to a lesser degree, but still demands an … Read more

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read more

Reflections on the 10th anniversary of the Revolutions blog

On December 9 2008, very nearly ten years ago, the first post on Revolutions was published. Way back then, this blog was part of a young startup called Revolution Computing, which later became Revolution Analytics. (That name persists to this day in the URL of this blog.) The idea at that time was to introduce … Read more

Categories R Tags ExcerptFavorite

5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry

The post 5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry appeared first on The Lucid Manager. When I studied civil engineering some decades ago, we solved all our computing problems by writing code. Writing in BASIC or PASCAL, I could quickly perform fundamental engineering analysis, such as reinforced concrete beams, with my … Read more

Categories R Tags ExcerptFavorite

The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

Previously in this series: I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probability puzzles can be productively solved through Monte Carlo simulations in R. Here’s one that caught my attention: Suppose that a knight makes a “random walk” on an infinite chessboard. Specifically, … Read more

Categories R Tags ExcerptFavorite

Pitching Artificial Intelligence to Business People

From silver bullet syndrome to silver linings In this article I plan to share with you our recent experience pitching AI to business folk, and what lessons we learned along the way. As a small firm of AI experts, we follow an awareness marketing approach. Rather than relying solely on one marketing channel, we attend conferences … Read more

Great post Yash!

Great post Yash! For those readers interested in getting data from the fitbit API using R I’ve documented the process here: Related offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, … Read more

Categories R Tags ExcerptFavorite

ggmap Tutorial Updated!

Y’all it may have taken me a little time, but I did listen. Thank you for your emails. Because of you, I have now updated my ggmap tutorial to address the Google Static Map API service issues! For those of you who have been following along with issue #51 in the ggmap repo, you’ll notice … Read more

Categories R Tags ExcerptFavorite

A Thought on Using Machine Learning Models

During my training classes, after/during discussion on the common machine learning models I will usually bring up a topic and that is the usage of insights from these models or the implementation of the model into business /organization process. For instance, we can get the most accurate model where its very good at ‘predicting’ which … Read more

Day 10 – little helper %nin%

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

The Need for Speed Part 1: Building an R Package with Fortran (or C)

Everyone who has ever used R has, at one time or another, wished for an increase in R’s speed. If you haven’t, you’re not using R hard enough! Recently, as part of some research on credibility, I was calculating layer loss costs for millions of simulated loss observations. As I progressed, the R markdown document … Read more

Categories R ExcerptFavorite

Improving Patient Flows With Data Science And Analytics

Reducing Costs By Improving Processes Our team was recently asked how data analytics and data science can be used to improve bottlenecks and patient flows in hospitals. Healthcare providers and hospitals can have very complex patient flows. Many steps can intertwine, resources have to shift in between tasks all the time, and severity of patients … Read more

Physics-guided Neural Networks (PGNNs)

Imagine you have sent your alien ? friend (optimization algorithm) to a supermarket (hypothesis space) to buy your favorite cheese (solution). The only clue she has is the picture of the cheese (data) you gave her. Since she lacks the preconceptions we have about supermarkets she will have a hard time finding the cheese. She … Read more

Categories Featured ExcerptFavorite

An 8-hour course on R and Data Mining

I will run an 8-hour course on R and Data Mining at Black Mountain, CSIRO, Australia on 10 & 13 December 2018. The course materials, incl. slides, R scripts and datasets, are available at Below is outline of the course. Part I:– R Programming: basics of R language and programming, parallel computing, and data … Read more

Categories R ExcerptFavorite

CRAN Release of R/exams 2.3-2

New minor release of the R/exams package to CRAN, containing a range of smaller improvements and bug fixes. Notably scanning of written NOPS exams is enhanced and made more reliable and a new exercise template demonstrates how to use advanced processing of numeric answers in Moodle. Version 2.3-2 of the one-for-all exams generator R/exams has … Read more

Categories R ExcerptFavorite

How a High School Junior Made a Self-Driving Car

Questions related to this repository from a project I created almost three years ago are among the most numerous questions I receive. The repository itself is really nothing too special, just an implementation of an Nvidia paper that was released about a year prior. A graduate student later managed to implement my code in an … Read more

Simpson’s Paradox and Interpreting Data

The challenge of finding the right view through data Edward Hugh Simpson, a statistician and former cryptanalyst at Bletchley Park, described the statistical phenomenon that takes his name in a technical paper in 1951. Simpson’s paradox highlights one of my favourite things about data: the need for good intuition regarding the real world and how most … Read more

Categories Featured ExcerptFavorite

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read more