Practical Data Science with R 2nd Edition now in-stock at Amazon.com!

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Practical Data Science with R 2nd Edition is now in-stock … Read more Practical Data Science with R 2nd Edition now in-stock at Amazon.com!

Introducing The Amazon Builders’ Library

The Amazon Builders’ Library is a collection of living articles that take readers under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS. The Builders’ Library articles are written by Amazon’s senior technical leaders and engineers, covering topics across architecture, software delivery, and operations. For example, readers can see … Read more Introducing The Amazon Builders’ Library

A Gentle Introduction to Probabilistic Programming Languages

Probabilistic programming is becoming one of the most active areas of development in the machine learning space. What are the top languages we should know about? Probabilistic thinking is an incredibly valuable tool for decision making. From economists to poker players, people that can think in terms of probabilities tend to make better decisions when … Read more A Gentle Introduction to Probabilistic Programming Languages

5 Books To Improve Your Fast & Slow Thinking

Don’t let your System 1 judge this post. Here are 5 books that will strengthen your decision-making skills. They will hopefully make your two systems of cognitive processes better, enhancing both your intuitive and deliberate thinking. Below you will find my reasons of why you should reach each one of them together with a quote … Read more 5 Books To Improve Your Fast & Slow Thinking

Combo Charts with Seaborn and Python

Overlaying two plots to make one chart http://www.terrystickels.com/math-art/images/synchronized_curves_std.jpg Intro The libraries, code, and visuals will be down below but first I wanted to offer a brief introduction as to why I decided to share this with everyone in this community. If you just want to skip down to the tutorial just skip the intro. While … Read more Combo Charts with Seaborn and Python

Inequalities in English NHS talking therapy services: What can the data tell us?

In this blog, we’ve highlighted inequalities in access and outcomes within IAPT. This is well-established within the field and discussed within the IAPT manual, which outlines evidence-based guidance for effective and efficient delivery of IAPT services. The longitudinal evidence suggests improvements are being made. However, further progress is needed and while we focus on deprivation, … Read more Inequalities in English NHS talking therapy services: What can the data tell us?

Digital Skills as a Service (DSaaS)

From one side, we had problems that business faced and from another side, the capabilities of AI technologies. To extract the synergy from the technologies and problems, I started thinking about how we can answer the main question: How to copy myself and sell to all the companies in the world for covering repeated work? … Read more Digital Skills as a Service (DSaaS)

3P Strategy to be successful for entry-level programmers

My advice to an entry-level programmer would be, to stick with the following 3P strategy which I have formulated and it has really helped me in my career. Perseverance Practice Project work Perseverance “Perseverance is not a long race; it is many races one after another.” — Walter Elliot At the beginning, coding always seems … Read more 3P Strategy to be successful for entry-level programmers

Microsoft has validated the Lenovo ThinkSystem SE350 edge server for Azure Stack HCI

Do you need rugged, compact-sized hyperconverged infrastructure (HCI) enabled servers to run your branch office and edge workloads? Do you want to modernize your applications and IoT functions with container technology? Do you want to leverage Azure’s hybrid services such as backup, disaster recovery, update managment, monitoring, and security compliance?   Microsoft and Lenovo have teamed … Read more Microsoft has validated the Lenovo ThinkSystem SE350 edge server for Azure Stack HCI

How to make a precision recall curve in R

Precision recall (PR) curves are useful for machine learning model evaluation when there is an extreme imbalance in the data and the analyst is interested particuarly in one class. A good example is credit card fraud, where the instances of fraud are extremely few compared with non fraud. Here are some facts about PR curves. … Read more How to make a precision recall curve in R

Regression — Why Mean Square Error?

In Machine Learning, our main goal is to minimize the error which is defined by the Loss Function. And every type of Algorithm has different ways of measuring the error. In this article I’ll be going through some basic Loss Functions used in Regression Algorithms and why exactly are they that way. Let’s begin. Suppose … Read more Regression — Why Mean Square Error?

Proper Ways to Pass Environment Variables in JSON for cURL POST

The best practice is using a data generation function. Scrolling to the bottom for the detail. When we build an API-based web (front-end and back-end separated), we usually want to see what the HTTP client is sending or to inspect and debug webhook requests. There are two approaches: Build an API server by ourselves Use … Read more Proper Ways to Pass Environment Variables in JSON for cURL POST

Build A Commission-Free Algo Trading Bot By Machine Learning Quarterly Earnings Reports [Full…

Introduction The following is a complete guide that will teach you how to create your own algorithmic trading bot that will make trades based on quarterly earnings reports (10-Q) filed to the SEC by publicly traded US companies. We will cover everything from downloading historical 10-Q filings, cleaning the text, and building your machine learning … Read more Build A Commission-Free Algo Trading Bot By Machine Learning Quarterly Earnings Reports [Full…

Performing a Time Series Analysis on the AAPL Stock Index.

ARIMA model: ARIMA stands for Auto Regression Integrated Moving Average. It is specified by three ordered parameters (p,d,q). Here p is the order of the autoregressive model(number of time lags) d is the degree of differencing(number of times the data have had past values subtracted) q is the order of moving average model. Before building … Read more Performing a Time Series Analysis on the AAPL Stock Index.

How to deploy ONNX models on NVIDIA Jetson Nano using DeepStream

One feature I particularly liked about DeepStream is that it optimally takes care of the entire I/O processing in a pipelined fashion. We can also stack multiple deep learning algorithms to process information asynchronously. This allows you to increase throughput without the hassle of manually creating and managing a multiprocessing system design. The best part … Read more How to deploy ONNX models on NVIDIA Jetson Nano using DeepStream

Extract and query knowledge graphs using Apache Jena (SPARQL Engine)

Image by LTDatEHU — Pixabay AGROVOC is a controlled vocabulary covering all areas of interest of the Food and Agriculture Organization (FAO) of the United Nations, including food, nutrition, agriculture, fisheries, forestry, environment, etc. AGROVOC consists of Resource Description Framework (RDF) triples. Each triple consists of 3 parts (subject, predicate, and object) such as “ … Read more Extract and query knowledge graphs using Apache Jena (SPARQL Engine)

Machine Learning: Lincoln Was Ahead of His Time

Photo by Jp Valery on Unsplash In the 45th presidency, there is much talk about what constitutes presidential language. But it wasn’t always “fake news,” “haters and losers,” and “covfefe.” There is a long legacy of presidential language in the US, and machine learning can help us gain new insights into this very historical topic. … Read more Machine Learning: Lincoln Was Ahead of His Time

What Is The Best Starter Model In Table Data ML?— Lessons from A High-rank Kagglers’ New Book

GBDT is decision tree-based model. Therefore, the core behavior in model training is to split the node into two branches. This makes GBDT: No variable scaling being necessary, No missing value imputation being necessary (the split rule also determines which node the record goes to if its variable is missing), Being able to handle categorical … Read more What Is The Best Starter Model In Table Data ML?— Lessons from A High-rank Kagglers’ New Book

Artificial Neural Networks are Reconstructing Human Thoughts in Realtime

How artificial neural networks and electroencephalography (EEG) are helping scientists to Identify the extent of stroke-related brain damage. Image Source: Neurobiotics Earlier this year, researchers from Russia’s Neurobotics Corporation and a team at the Moscow Institute of Physics and Technology worked out how to visualize human brain activity by mimicking images observed in real-time. This … Read more Artificial Neural Networks are Reconstructing Human Thoughts in Realtime

Gradient Descent Training With Logistic Regression

Gradient descent algorithm and its variants ( Adam, SGD etc. ) have become very popular training (optimisation) algorithm in many machine learning applications. Optimisation algorithms can be informally grouped into two categories — gradient-based and gradient-free(ex. particle swarm, genetic algorithm etc.). As you can guess, gradient descent is a gradient-based algorithm. Why gradient is important … Read more Gradient Descent Training With Logistic Regression

An overview of model explainability in modern machine learning

Towards a better understanding of why machine learning models make the decisions they do, and why it matters Photo by Chris Ried on Unsplash Model explainability is one of the most important problems in machine learning today. It’s often the case that certain “black box” models such as deep neural networks are deployed to production … Read more An overview of model explainability in modern machine learning

Solving TSP Using Dynamic Programming

While I was conducting research for another post in my transportation series (I, II, stay tuned for III), I was looking for a dynamic programming solution for the Traveling Salesperson Problem (TSP). I did find many resources, but none were to my liking. Either they were too abstract, too theoretical, presented in a long video … Read more Solving TSP Using Dynamic Programming

Multi-Label Image Classification in TensorFlow 2.0

Instead of building and training a new model from scratch, you can use a pre-trained model in a process called transfer learning. The majority of pre-trained models for vision applications where trained on ImageNet which is a large image database with more than 14 million images divided into more than 20 thousand categories. The idea … Read more Multi-Label Image Classification in TensorFlow 2.0

Amazon API Gateway Offers Faster, Cheaper, Simpler APIs Using HTTP APIs (Preview)

To build RESTful APIs, you can use either HTTP APIs or REST APIs from API Gateway. REST APIs offer a wide variety of features for building and managing RESTful APIs. HTTP APIs are up to 71% cheaper compared to REST APIs, but offer only API proxy functionality. HTTP APIs are optimized for performance—they offer the … Read more Amazon API Gateway Offers Faster, Cheaper, Simpler APIs Using HTTP APIs (Preview)

parcats 0.0.1 released

[This article was first published on R on datistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. parcats was released on CRAN. It is an htmlwidget providing bindings … Read more parcats 0.0.1 released

R Shiny for beginners: annotated starter code

This week I decided to get started with the R shiny package for interactive web applications. As an absolute beginner, I want to document my learning journey in the hope that it will be useful for other first-time shiny users. This post assumes some basic familiarity with R and the tidyverse, but no prior knowledge … Read more R Shiny for beginners: annotated starter code

Introducing the Amplify DataStore, a persistent storage engine that synchronizes data between apps and the cloud

Previously, AppSync addressed offline use cases by utilizing an on-device cache to store query results that have been previously returned from the cloud. AppSync’s implementation of on-device caching of query results enabled developers to create a broad range of offline capable apps. However, the data available to the app when a device was offline was … Read more Introducing the Amplify DataStore, a persistent storage engine that synchronizes data between apps and the cloud

The worst Data Science test

Recently I’ve received a promotional email, which claimed that I could challenge myself and accelerate my learning by taking the best-in-class Data Science Adaptive Test. The promotion also claimed that I could track and improve my performance, compare myself with professional Data Scientists and increase my chance of being hired in a company as Data … Read more The worst Data Science test

The Unknown Benefits of using a Soft-F1 Loss in Classification Systems

Before training and evaluating a machine learning algorithm such as neural networks, we need to define two major functions: A loss function: It is the bread and butter of modern machine learning. We need it to measure the model error (cost) on training batches. It has to be differentiable in order to backpropagate the error … Read more The Unknown Benefits of using a Soft-F1 Loss in Classification Systems

Towards precision security

An elegant solution to this problem exists as well — (Full) Homomorphic encryption. FHE is (sort of) a new encryption technology which enables mathematical computations to be made on encrypted data, without requiring any decryption in the process. The encrypted result (once decrypted) of these computations will be the same as if they were made … Read more Towards precision security

Catch Me if You Can: Outlier Detection (Taxi Trajectory Streams)

Predicting Uber demand through historical analysis (Taxi Trajectory Streams) Taxis | Beijing Photo published on thedrive.com Outlier detection is an interesting data mining task that is used quite extensively to detect anomalies in data. Outliers are points that exhibit significantly different properties than the majority of the points. To this end, outlier detection has very … Read more Catch Me if You Can: Outlier Detection (Taxi Trajectory Streams)

The Beginning of Natural Language Processing

Let’s all get back to the early stages of human life where early humans used to communicate with different hand gestures to convey their messages to each other to an extent wherein the present day we have more than 7000 varieties of languages spoken all around the world. Its quite an achievement for the early … Read more The Beginning of Natural Language Processing

Why Data Scientists Must Speak the Language of Python

Since the year 1950, the world has seen the emergence of more than a few programming languages. Be it JAVA, C, C++, Python or C#, every language eas designed to serve a purpose. Over time, people started to communicate with machines in these multiple languages. As a result, plenty of wonderful software applications were born … Read more Why Data Scientists Must Speak the Language of Python

Introduction to Artificial Neural Networks

Activation Function The main purpose of the activation function is to convert the input signal to a node in ANN to output signal. A neural network without an activation function is just a linear regression model. Hence to learn complex and non-linear curves, we need activation functions. Properties that an activation function should follow: Non-Linear … Read more Introduction to Artificial Neural Networks

3 things you need to know before you Train-Test Split

Let’s assume you are doing a multiclass classification and have an imbalanced dataset that has 5 different classes. You do a simple train-test split that does a random split totally disregarding the distribution or proportions of the classes. What happens in this scenario is that you end up with a train and a test set … Read more 3 things you need to know before you Train-Test Split

Bayesian Model Selection: As A Feature Reduction Technique

Bayesian model selection can be applied to situations where we have multiple competing models and need to select the best model. According to the Bayes’s theorem, any model’s posterior probability can be written as, Bayes’s Formula for the probability of a model (M) being a true model given the data (D) Here, P(M|D) is the … Read more Bayesian Model Selection: As A Feature Reduction Technique

Talking with BERT

Improving prompts to better understand language models Photo Mashup w/ Anna Vander Stel X Rock’n Roll Monkey The growth of knowledge and research around language models has been amazing in the past few years. For BERT especially, we have seen some incredible uses for this massive pre-trained language model on tasks like text classification, prediction, … Read more Talking with BERT

A Step-by-Step Introduction to Starting nbdev — Exploratory Programming

A simplified Hello Word Example using nbdev “I really do think [nbdev] is a huge step forward for programming environments”: Chris Lattner, inventer of Swift, LLVM, and Xcode and Swift Playgrounds. Image by Alfons Morales — Unsplash Jeremy Howard does not stop impressing us with his great libraries, e.g., Fastai (a high-level APIs for using … Read more A Step-by-Step Introduction to Starting nbdev — Exploratory Programming

Building a Custom Search Relevance Training Set

Up to 30k/month document classifications are free using Google’s Language API. It can be used to classify passages into 700+ categories, and it also reports confidence scores. You need to sign up for Google Cloud and authenticate your client first, see. Then run: We use vowpal-wabbit (VW) to build a binary text classifier that can … Read more Building a Custom Search Relevance Training Set

A thousand ways to deploy Machine learning models — Part 1

“What use is a machine learning model if you don’t deploy to production “ — Anonymous Image from Pluralsight You have done a great work building that awesome 99% accurate machine learning model but your work most of the time is not done without deploying. Most times our models will be integrated with existing web … Read more A thousand ways to deploy Machine learning models — Part 1

Checking Analyzed Laboratory Data for Errors

Tutorial: Automatically Analyzing Laboratory Data to Create a Performance Map How to Ensure That There are no Errors in Laboratory Data Sets It’s very common that scientists find themselves with large data sets. Sometimes it comes in the form of gigabytes worth of data in a single file. Other times it’s hundreds of files, each … Read more Checking Analyzed Laboratory Data for Errors

Increase model performance by… removing data?

There’s been some interesting research on data valuation. The idea is that you look at a dataset, and rank data points based on their value with respect to a specific model or predictive task. Data Shapley Equitable Valuation of Data for Machine Learning [2] by Ghorbani and Zhou came out earlier this year. The authors … Read more Increase model performance by… removing data?

Seven Important Predictions for Big Data in 2020

Data as a Service, Automation of Data Analysis, Data Governance, NLP, Conversational Analytics, and more Image Source: Pixabay Big Data Analytics is transforming organizations and industries at an alarming rate. This type of technology has recently made a considerable shift, as businesses adopt it to enhance the way they analyze data. Enterprises across the globe … Read more Seven Important Predictions for Big Data in 2020