How to give money to the R project

by Mark Niemann-Ross, an author, educator, and writer who teaches about R and Raspberry Pi at LinkedIn Learning I spend a LOT of time at, in particular the sections for documentation and CRAN. But I hadn’t spent much time in the other areas: R Project, R Foundation, and links. When I recently wandered into the foundation area, … Read more

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more

Pew Study Answers on Artificial Intelligence and the Future of Humans

The AI future is uncertain, but generally, I think it will improve life. I was one of the 900+ futurists interviewed for The Pew Research study released yesterday, “Artificial Intelligence and the Future of Humans.” Conducted with Elon University, the study revolved around AI and the 50th anniversary of the Internet. The report asked three questions … Read more

Classification (Part 2) — Linear Discriminant Analysis

An explanation of Bayes’ theorem and linear discriminant analysis Photo by Jerry Kiesewetter on Unsplash Overview Previously, logistic regression was introduced for classification. Unfortunately, like any model, it presents some flaws: When classes are well separated, parameters estimate from logistic regression tend to be unstable When the data set is small, logistic regression is also unstable … Read more

AWS Architecture For Your Machine Learning Solutions

The Undertaking Recently, I was involved in developing a machine learning solution for one of the largest North American steel manufacturers. The company wanted to leverage the power of ML to get insights on customer segmentation, order prediction and product-volume recommendations. This article revolves around why and how we leveraged AWS for deploying our deliverables … Read more

How to tune a BigQuery ML classification model to achieve a desired precision or recall

Select the probability threshold based on the ROC curve BigQuery provides an incredibly convenient way to train machine learning models on large, structured datasets. In an earlier article, I showed you how to train a classification model to predict flight delays. Here’s the SQL query that will predict whether a flight is going to be late … Read more

Are we there yet?

Q: How does a data scientist manage projects and teams? How do you make duration and resource estimations? These are great questions, and I think people don’t ask them enough, which is one of the main reasons I wrote Think Like a Data Scientist. One full chapter is dedicated to project planning and another full … Read more

How to deploy a predictive service to Kubernetes with R and the AzureContainers package

It’s easy to create a function in R, but what if you want to call that function from a different application, with the scale to support a large number of simultaneous requests? This article shows how you can deploy an R fitted model as a Plumber web service in Kubernetes, using Azure Container Registry (ACR) and … Read more

Implementing Defensive Design in AI Deployments

A series of insights and battle scars from the world of medical device design With the upcoming launch of one of our AI products, there has been a repeating question that clients kept asking. This same question also shows up once in a while with our consulting engagements, to a lesser degree, but still demands an … Read more

Object detection and tracking in PyTorch

Detecting multiple objects in images and tracking them in videos In my previous story, I went over how to train an image classifier in PyTorch, with your own images, and then use it for image recognition. Now I’ll show you how to use a pre-trained classifier to detect multiple objects in an image, and later track … Read more

Pitching Artificial Intelligence to Business People

From silver bullet syndrome to silver linings In this article I plan to share with you our recent experience pitching AI to business folk, and what lessons we learned along the way. As a small firm of AI experts, we follow an awareness marketing approach. Rather than relying solely on one marketing channel, we attend conferences … Read more

A Thought on Using Machine Learning Models

During my training classes, after/during discussion on the common machine learning models I will usually bring up a topic and that is the usage of insights from these models or the implementation of the model into business /organization process. For instance, we can get the most accurate model where its very good at ‘predicting’ which … Read more

Improving Patient Flows With Data Science And Analytics

Reducing Costs By Improving Processes Our team was recently asked how data analytics and data science can be used to improve bottlenecks and patient flows in hospitals. Healthcare providers and hospitals can have very complex patient flows. Many steps can intertwine, resources have to shift in between tasks all the time, and severity of patients … Read more

Physics-guided Neural Networks (PGNNs)

Imagine you have sent your alien ? friend (optimization algorithm) to a supermarket (hypothesis space) to buy your favorite cheese (solution). The only clue she has is the picture of the cheese (data) you gave her. Since she lacks the preconceptions we have about supermarkets she will have a hard time finding the cheese. She … Read more

Categories Featured ExcerptFavorite

How a High School Junior Made a Self-Driving Car

Questions related to this repository from a project I created almost three years ago are among the most numerous questions I receive. The repository itself is really nothing too special, just an implementation of an Nvidia paper that was released about a year prior. A graduate student later managed to implement my code in an … Read more

Simpson’s Paradox and Interpreting Data

The challenge of finding the right view through data Edward Hugh Simpson, a statistician and former cryptanalyst at Bletchley Park, described the statistical phenomenon that takes his name in a technical paper in 1951. Simpson’s paradox highlights one of my favourite things about data: the need for good intuition regarding the real world and how most … Read more

Categories Featured ExcerptFavorite

Word Representation in Natural Language Processing Part II

In the previous part (Part I) of the word representation series, I talked about fixed word representations that make no assumption about semantics (meaning) and similarity of words. In this part, I will describe a family of distributed word representations. The main idea is to represent words as feature vectors. Each entry in vector stands … Read more

AlphaZero implementation and tutorial

A walk-through of implementing AlphaZero using custom TensorFlow operations and a custom Python C module I describe here my implementation of the AlphaZero algorithm, available on Github, written in Python with custom Tensorflow GPU operations and a few accessory functions in C for the tree search. The AlphaZero algorithm has gone through three main iterations, first … Read more

TensorFlow Filesystem — Access Tensors Differently

Tensorflow is great. Really, I mean it. The problem is it’s great up to a point. Sometimes you want to do very simple things, but tensorflow is giving you a hard time. The motivation I had behind writing TFFS (TensorFlow File System) can be shared by anyone who has used tensorflow, including you. All I … Read more

Maximum Likelihood Estimation: How it Works and Implementing in Python

Previously, I wrote an article about estimating distributions using nonparametric estimators, where I discussed the various methods of estimating statistical properties of data generated from an unknown distribution. This article covers a very powerful method of estimating parameters of a probability distribution given the data, called the Maximum Likelihood Estimator. This article is part of … Read more

Categories Featured ExcerptFavorite

A Data Analysis of Riding The Bus

What should I expect before a round of the popular drinking game? Recommended equipment for Ride The Bus College. It’s a time for things like exploring your personality, finding your values, and making lifelong friends. Those are all well and good, but college is also a time for drinking games! There’s plenty of time in the … Read more

Building a molecular charge classifier

The intersection of Chemistry and A.I A.I has seen unprecedented growth in the past couple years. Although machine learning architectures like Neural Networks (NN) have been known for a long time thanks to breakthroughs from top researchers like Geoffrey Hinton, only recently have NNs become powerful tools in an A.I specialist’s toolbox. This is credited mainly … Read more

A gentle journey from linear regression to neural networks

Deep Learning What are we talking about? A quick search on Google give us the following definition of “deep learning” : “the ensemble of deep learning methods is a part of a broader family of machine learning methods that aims at modelling data with a high level of abstraction”. Here, we should understand that deep learning consists … Read more

Categories Featured ExcerptFavorite

Because it’s Saturday: Go Your Own Way

I was delivering a workshop for AI Live yesterday so I didn’t get the chance to do my Friday post, but I’m here at SatRDays DC and the playlist on the audio while we’re waiting for things to start is amazing. Fleetwood Mac’s Go Your Own Way just came on, which reminded me of this … Read more

Categories Featured ExcerptFavorite

Data network effects for an artificial intelligence startup

Artificial intelligence (AI) ecosystem matures and it is becoming increasingly difficult to impress customers, investors, and potential acquirers by just attaching an .ai domain to whatever you are doing. Therefore, the significance of building a defensible business model in the long run becomes obvious. In this post, I explore how an AI startup may unlock various … Read more

Categories Featured ExcerptFavorite

R some blog 2018-12-08 04:19:00

Motivation The dplyr functions select and mutate nowadays are commonly applied to perform data.frame column operations, frequently combined with magrittrs forward %>% pipe. While working well interactively, however, these methods often would require additional checking if used in “serious” code, for example, to catch column name clashes. In principle, the container package provides a dict-class … Read more

Categories Featured, R ExcerptFavorite

How To Ask The Right Questions As A Data Scientist

How to define a problem statement by asking the right questions? (Source) Admit it or not, defining a problem statement (or data science problem) is one of the most important steps in data science pipeline. A problem well defined is a problem half-solved — Charles Kettering In the following part, we’ll go through the four … Read more

Categories Featured ExcerptFavorite

Day 08 – little helper intersect2

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories Featured, R ExcerptFavorite

Feel discouraged on the sparse data in your hand? Give Factorization Machine a shot (2)

By laying a solid foundation of Matrix Factorization, your exploration on a series of advanced models derived from the concept of matrix factorization will be much more smoother, such as LDA, LSI, PLSA and Tensor Factorization and etc. The models derived from the concept of Matrix Factorization In last session, we talked about the basic … Read more

Categories Featured ExcerptFavorite

Python Virtual Environment

Conda How to set up a virtual environments using conda for the Anaconda Python distribution A virtual environment is a named, isolated, working copy of Python that that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environmets … Read more

“Increase sample size until statistical significance is reached” is not a valid adaptive trial design; but it’s fixable.

TLDR: Begin with N of 10, increase by 10 until p < 0.05 or max N reached. This design has inflated type-I error. Lower p-value threshold needed to ensure specified type-I error rate. The number of interim analyses and max N affect the type-I error rate. Threshold can be identified using simulation. A recent Facebook … Read more

Categories Featured, R ExcerptFavorite

Shortcoming of Under-sampling Algorithms: CCMUT and E-CCMUT

What, Why, Possible Solution and Ultimate Utility In one of my previous articles, “Under-sampling : A Performance Booster on Imbalanced Data”: I have applied Cluster Centroid based Majority Under-sampling Technique (CCMUT) on Adult Census Data and proved the Model Performance Improvement w.r.t State-of-the-Art Model, “A Statistical Approach to Adult Census Income Level Prediction”[1]. But there are … Read more

Categories Featured ExcerptFavorite

“Artist” in Matplotlib — something I wanted to know before spending tremendous hours on googling…

Originally published at and modified a bit to fit Medium’s editing system. It’s true that matplotlib is a fantastic visualizing tool in Python. But it’s also true that tweaking details in matplotlib is a real pain. You may easily lose hours to find out how to change a small part of your plot. Sometimes … Read more

Categories Featured ExcerptFavorite

Avoiding Parking Tickets in San Francisco Using Data Analytics

Although still not a perfect predictor, this model was more accurate than the first. The streets identified as best showed much less variability than those of the worst as well. We could also reduce the amount of tickets by over 50% if we chose the best population compared to the worst. Interestingly, parking density was … Read more

Categories Featured ExcerptFavorite

Comparative study on Classic Machine learning Algorithms

2. Logistic Regression Just like linear regression, Logistic regression is the right algorithm to start with classification algorithms. Eventhough, the name ‘Regression’ comes up, it is not a regression model, but a classification model. It uses a logistic function to frame binary output model. The output of the logistic regression will be a probability (0≤x≤1), … Read more

How should we define AI?

In our very first section, we’ll become familiar with the concept of AI by looking into its definition and some examples. As you have probably noticed, AI is currently a “hot topic”: media coverage and public discussion about AI is almost impossible to avoid. However, you may also have noticed that AI means different things … Read more

F# Advent Calendar — A Christmas Classifier

The ML.NET Model The model is defined in Program.fs The dataLoader specifies the schema of the input data. Input Data Schema The dataLoader is then used to load the training and test data views. Load Training and Test Data The dataPipeline specifies the transforms that should be applied to the input tsv. Since this is a … Read more

Categories Featured ExcerptFavorite

Gender Diversity in the R and Python Communities

Many (if not most) tech communities have far more representation from men than from women (and even fewer from nonbinary folk). This is a shame, because everybody uses software, and these projects would self-evidently benefit from the talent and expertise from across the entire community. Some projects are doing better than others, though, and data … Read more

Categories Featured ExcerptFavorite

I Can Be Your Heroku, Baby

Deploying a Python app in Heroku! Do you like Data Science? <Shakes head up and down> Do you like Data Science DIY deployment? <Shakes head left and right> Me neither. One of the most frustrating parts of early data science learning or personal work is deploying an app through free cloud applications. Your code is juuust … Read more

Roadmap for Conquering Computer Vision

It has become quite a tradition to write blogs on giving guidelines to ace Machine learning. I have had a hard time finding any such roadmap and to-do list for computer vision. As a vision enthusiast and consultant, I have found a lot of people asking about a concrete roadmap (in terms of skills, courses … Read more

Categories Featured ExcerptFavorite

How to determine the best model?

Machine learning models play a critical role in many aspects of today’s business. The use of a predictive model can improve the business bottom line, and a slightly improved model can result in an increase of millions of dollars. Although you may not know all the popular algorithms (and more powerful algorithms in the future), … Read more

Image Processing Class (EGBE443) #3 — Point Operation

The implement of the point operation affect on the histogram. Raising the brightness shift the histogram to right and increasing the contrast of the image expand the histogram. These point operations map the intensity by the mapping function contained the constant which is image content such as the highest intensity and the lowest intensity. Automatic … Read more