Amazon Customer Analysis

User review networks for customer segmentation Over the past decade or two, Americans have continued to prefer payment methods that are traceable, providing retailers and vendors with a rich source of data on their customers. This data is used by data scientists to help businesses make more informed decisions with respect to inventory, marketing, and … Read more Amazon Customer Analysis

A Guide for Building Convolutional Neural Networks

Computer Vision it at the forefront of advancements in Artificial Intelligence (AI). It’s moving fast with new research coming out each and every day allowing us to do truly amazing things that we could’t do before with computers and AI. Convolutional Neural Networks (CNNs) are the driving force behind every advancement in Computer Vision research … Read more A Guide for Building Convolutional Neural Networks

The invisible workers of the AI era

50 ways to label data There are different ways to get your data labeled. Some firms label their data themselves — although this can be costly, as hiring people simply for these tasks costs firms both money and flexibility. Other companies even find ways to get people to label their data for free. Ever wonder why Google’s reCAPTCHA … Read more The invisible workers of the AI era

AI and Machine Learning: Moving from Training to Education

The debate of whether AI will ever achieve capabilities at par or beyond human intelligence is ever ongoing. It certainly has intensified with the recent advancements in AI, Machine Learning (ML), and Deep Learning (DL) with some believing that the current technologies are already capable of paving the way for Artificial General Intelligence (AGI). You … Read more AI and Machine Learning: Moving from Training to Education

Scraping the Turkey Accordion

Related To leave a comment for the author, please follow the link and comment on their blog: R on datawookie. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more Scraping the Turkey Accordion

Towards Ethical Machine Learning

https://initiatives.provost.uci.edu/event/philosophy-machine-learning-knowledge-causality/ I quit my job to enter an intensive data science bootcamp. I understand the value behind the vast amount of data available that enables us to create predictive machine learning algorithms. In addition to recognizing its value on a professional level, I benefit from these technologies as a consumer. Whenever I find myself in … Read more Towards Ethical Machine Learning

Reading List Faster With parallel, doParallel, and pbapply

I have several tables that I would like to load as a sole data frame. Derived functions from read. table () have a lot of convenient features, but it seems like there is a lot of steps in the implementation that would slow things down. The gain in performance of reading 29 CSV files (about … Read more Reading List Faster With parallel, doParallel, and pbapply

Using ggplot2 for functional time series

I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples. French … Read more Using ggplot2 for functional time series

Network Centrality in R: New ways of measuring Centrality

This is the third post of a series on the concept of “network centrality” withapplications in R and the package netrankr. The last part introduced the concept ofneighborhood-inclusion and its implications for centrality. In this post, weextend the concept to a broader class of dominance relations by deconstructing indicesinto a series of building blocks and … Read more Network Centrality in R: New ways of measuring Centrality

Code for case study – Customer Churn with Keras/TensorFlow and H2O

The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. ? Thank you to the following people for … Read more Code for case study – Customer Churn with Keras/TensorFlow and H2O

Geocomputation with R – the afterword

I am extremely proud to announce that Geocomputation with R is complete.It took Robin, Jannes, and me almost 2 years of collaborative planning, writing, refinement, and deployment to make the book available for anyone interested in open source, command-line approaches for handling geographic data.We’re very happy that it’s now ready to present to the world … Read more Geocomputation with R – the afterword

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary data. Our model will try … Read more Sharing Modeling Pipelines in R

Le Monde puzzle [#1075]

A new Le Monde mathematical puzzle in the digit category: Find the largest number such that each of its internal digits is strictly less than the average of its two neighbours. Same question when all digits differ. For instance, n=96433469 is such a number. When trying pure brute force (with the usual integer2digits function!) le=solz=3 … Read more Le Monde puzzle [#1075]

How to give money to the R project

by Mark Niemann-Ross, an author, educator, and writer who teaches about R and Raspberry Pi at LinkedIn Learning I spend a LOT of time at r-project.org, in particular the sections for documentation and CRAN. But I hadn’t spent much time in the other areas: R Project, R Foundation, and links. When I recently wandered into the foundation area, … Read more How to give money to the R project

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Lynda.com Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more Parsing XML, Named Entity Recognition in One-Shot

An introduction to web scraping with Python

Introduction As a data scientist, I often find myself looking for external data sources that could be relevant for my machine learning projects. The problem is that it is uncommon to find open source data sets that perfectly correspond to what you are looking for, or free APIs that give you access to data. In … Read more An introduction to web scraping with Python

Top Examples of Why Data Science is Not Just .fit().predict()

In this post, I’m going to review some of the top concepts I learned that turned me from a technical data scientist to a good data scientist Two months ago, I finished my second year as a data scientist at YellowRoad so I decided to do a retrospective analysis on my projects, what did I … Read more Top Examples of Why Data Science is Not Just .fit().predict()

Pew Study Answers on Artificial Intelligence and the Future of Humans

The AI future is uncertain, but generally, I think it will improve life. I was one of the 900+ futurists interviewed for The Pew Research study released yesterday, “Artificial Intelligence and the Future of Humans.” Conducted with Elon University, the study revolved around AI and the 50th anniversary of the Internet. The report asked three questions … Read more Pew Study Answers on Artificial Intelligence and the Future of Humans

Classification (Part 2) — Linear Discriminant Analysis

An explanation of Bayes’ theorem and linear discriminant analysis Photo by Jerry Kiesewetter on Unsplash Overview Previously, logistic regression was introduced for classification. Unfortunately, like any model, it presents some flaws: When classes are well separated, parameters estimate from logistic regression tend to be unstable When the data set is small, logistic regression is also unstable … Read more Classification (Part 2) — Linear Discriminant Analysis

DB connected R application on open-source Shiny server, part 1

As a follow-up of my previous study of Australian politicians on Twitter I’ve decided to build a more sophisticated, autonomous solution. The idea at glance: Collect regularly tweets from Members of Australian Parliament Store them in the database Visualize findings (in up-to-date state) in web dashboard A goal here is to build a solution that … Read more DB connected R application on open-source Shiny server, part 1

AWS Architecture For Your Machine Learning Solutions

The Undertaking Recently, I was involved in developing a machine learning solution for one of the largest North American steel manufacturers. The company wanted to leverage the power of ML to get insights on customer segmentation, order prediction and product-volume recommendations. This article revolves around why and how we leveraged AWS for deploying our deliverables … Read more AWS Architecture For Your Machine Learning Solutions

How to tune a BigQuery ML classification model to achieve a desired precision or recall

Select the probability threshold based on the ROC curve BigQuery provides an incredibly convenient way to train machine learning models on large, structured datasets. In an earlier article, I showed you how to train a classification model to predict flight delays. Here’s the SQL query that will predict whether a flight is going to be late … Read more How to tune a BigQuery ML classification model to achieve a desired precision or recall

How to deploy a predictive service to Kubernetes with R and the AzureContainers package

It’s easy to create a function in R, but what if you want to call that function from a different application, with the scale to support a large number of simultaneous requests? This article shows how you can deploy an R fitted model as a Plumber web service in Kubernetes, using Azure Container Registry (ACR) and … Read more How to deploy a predictive service to Kubernetes with R and the AzureContainers package

Implementing Defensive Design in AI Deployments

A series of insights and battle scars from the world of medical device design With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also shows up once in a while with our consulting engagements, to a lesser degree, but still demands an … Read more Implementing Defensive Design in AI Deployments

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read more Object detection and tracking in PyTorch

10 Lessons Learned From Participating in Google AI Challenge

Key Points of My Work Disclaimers: I will present only a portion of the code I wrote for this competition, my teammates are absolutely not responsible for my awful and buggy code. A portion of this code is inspired by great Kagglers sharing their insights and code in Kaggle kernels and forums. I hope I did … Read more 10 Lessons Learned From Participating in Google AI Challenge

Enter the #DataFramedChallenge for a chance to be on an upcoming podcast segment.

We’ll be back with Season 2 early in 2019 and to keep you thinking, curious and data focused in between seasons, we’re having a DataFramed challenge. The winner will get to join me on a segment here on DataFramed: the challenge is to listen to as many episodes as you can & to tweet excerpts … Read more Enter the #DataFramedChallenge for a chance to be on an upcoming podcast segment.

AI: the silver bullet to stop Technical Debt from sucking you dry

You’ve heard a lot about student debt, but what about technical debt? It’s Friday evening in the Bahamas. You’re relaxing under a striped red umbrella with a succulent glass of wine and your favorite book — it’s a great read and you love the way the ocean breeze moves the pages like leaves on a tree. As … Read more AI: the silver bullet to stop Technical Debt from sucking you dry

Reflections on the 10th anniversary of the Revolutions blog

On December 9 2008, very nearly ten years ago, the first post on Revolutions was published. Way back then, this blog was part of a young startup called Revolution Computing, which later became Revolution Analytics. (That name persists to this day in the URL of this blog.) The idea at that time was to introduce … Read more Reflections on the 10th anniversary of the Revolutions blog

5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry

The post 5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry appeared first on The Lucid Manager. When I studied civil engineering some decades ago, we solved all our computing problems by writing code. Writing in BASIC or PASCAL, I could quickly perform fundamental engineering analysis, such as reinforced concrete beams, with my … Read more 5½ Reasons to Ditch Spreadsheets for Data Science: Code is Poetry

The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

Previously in this series: I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probability puzzles can be productively solved through Monte Carlo simulations in R. Here’s one that caught my attention: Suppose that a knight makes a “random walk” on an infinite chessboard. Specifically, … Read more The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

Pitching Artificial Intelligence to Business People

From silver bullet syndrome to silver linings In this article I plan to share with you our recent experience pitching AI to business folk, and what lessons we learned along the way. As a small firm of AI experts, we follow an awareness marketing approach. Rather than relying solely on one marketing channel, we attend conferences … Read more Pitching Artificial Intelligence to Business People

Great post Yash!

Great post Yash! For those readers interested in getting data from the fitbit API using R I’ve documented the process here: https://towardsdatascience.com/the-gamification-of-fitbit-how-an-api-provided-the-next-level-of-training-eaf7b267af00 Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, … Read more Great post Yash!

A Thought on Using Machine Learning Models

During my training classes, after/during discussion on the common machine learning models I will usually bring up a topic and that is the usage of insights from these models or the implementation of the model into business /organization process. For instance, we can get the most accurate model where its very good at ‘predicting’ which … Read more A Thought on Using Machine Learning Models

The Need for Speed Part 1: Building an R Package with Fortran (or C)

Everyone who has ever used R has, at one time or another, wished for an increase in R’s speed. If you haven’t, you’re not using R hard enough! Recently, as part of some research on credibility, I was calculating layer loss costs for millions of simulated loss observations. As I progressed, the R markdown document … Read more The Need for Speed Part 1: Building an R Package with Fortran (or C)

Improving Patient Flows With Data Science And Analytics

Reducing Costs By Improving Processes Our team was recently asked how data analytics and data science can be used to improve bottlenecks and patient flows in hospitals. Healthcare providers and hospitals can have very complex patient flows. Many steps can intertwine, resources have to shift in between tasks all the time, and severity of patients … Read more Improving Patient Flows With Data Science And Analytics

An 8-hour course on R and Data Mining

I will run an 8-hour course on R and Data Mining at Black Mountain, CSIRO, Australia on 10 & 13 December 2018. The course materials, incl. slides, R scripts and datasets, are available at http://www.rdatamining.com/training/course. Below is outline of the course. Part I:– R Programming: basics of R language and programming, parallel computing, and data … Read more An 8-hour course on R and Data Mining

CRAN Release of R/exams 2.3-2

New minor release of the R/exams package to CRAN, containing a range of smaller improvements and bug fixes. Notably scanning of written NOPS exams is enhanced and made more reliable and a new exercise template demonstrates how to use advanced processing of numeric answers in Moodle. Version 2.3-2 of the one-for-all exams generator R/exams has … Read more CRAN Release of R/exams 2.3-2

How a High School Junior Made a Self-Driving Car

Questions related to this repository from a project I created almost three years ago are among the most numerous questions I receive. The repository itself is really nothing too special, just an implementation of an Nvidia paper that was released about a year prior. A graduate student later managed to implement my code in an … Read more How a High School Junior Made a Self-Driving Car

Simpson’s Paradox and Interpreting Data

The challenge of finding the right view through data Edward Hugh Simpson, a statistician and former cryptanalyst at Bletchley Park, described the statistical phenomenon that takes his name in a technical paper in 1951. Simpson’s paradox highlights one of my favourite things about data: the need for good intuition regarding the real world and how most … Read more Simpson’s Paradox and Interpreting Data

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read more Word Representation in Natural Language Processing Part II

AlphaZero implementation and tutorial

A walk-through of implementing AlphaZero using custom TensorFlow operations and a custom Python C module I describe here my implementation of the AlphaZero algorithm, available on Github, written in Python with custom Tensorflow GPU operations and a few accessory functions in C for the tree search. The AlphaZero algorithm has gone through three main iterations, first … Read more AlphaZero implementation and tutorial