A “Hello World” Into Image Recognition with MNIST

To begin, we’ll load the library Keras and other necessary inputs: import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K Next, we’ll load the MNIST dataset and split it into X train, X test, Y train, and … Read more

Create a YouTube Video from Code

With a C++ demo to create a fractal zoom Created by Wolfgang Beyer with the program Ultra Fractal 3. (CC BY-SA 3.0) If you are a computer programmer who has ever thought of creating a video that contains a computer-generated animation, this article is for you. Here I will assume that you have existing code, … Read more

Solving Satisfiability Problems with Grover’s Algorithm — Quantum Computing

Photo by Michael Dziedzic on Unsplash Quantum Computing to Enhance Machine Learning Besides database searches, Grover’s Algorithm has several applications, one of which is solving satisfiability problems. We’ll explore what satisfiability (SAT) problems are and how their solutions are documented in Qiskit, IBM’s Python library for Quantum Computing. A Boolean SAT problem is the problem … Read more

BERT for Everyone

NLP models need to be pretrained — it takes several years to get a solid grasp on any language, and even with the speedup computers offer, they can’t learn a language in a few minutes or even a day. Pretraining prior to BERT was limited to word embeddings that mapped each word to a vector … Read more

Deep Learning & Healthcare: All the Glitters Ain’t Gold

Why everyone loves Deep Learning? Contrary to traditional Machine Learning (ML) algorithms, Deep Learning is fueled by massive amounts of data and requires high-end machines with powerful GPUs to run within a reasonable timeframe. Both of these requirements are expensive, so why do companies and research labs think the juice worth the squeeze? In traditional … Read more

Model deployment with Apache Beam and Dataflow

Operating your data science models may sometimes be stressful for some data scientists. The more sophisticated your models are, the more struggles you’ll face when it comes to productising. Have you ever regretted to ensemble 5 different models when developing a customer churn classifier? Don’t worry, Apache Beam comes to rescue. Before getting started with … Read more

How To Painlessly Analyze Your Time Series

An introduction to MPA: the Matrix Profile API Image Source: Needpix We’re surrounded by time series data. From finance to IoT to marketing, many organizations produce thousands of these metrics and mine them to uncover business-critical insights. A Site Reliability Engineer might monitor hundreds of thousands of time series streams from a server farm, in … Read more

Writing good SQL

Further structuring the query language by adapting layers Photo by National Cancer Institute on Unsplash Do you want to write good SQL? Sure, but what does “good” mean actually? In certain real time surroundings only performance counts as “good” and you measure your execution time in milliseconds. In business intelligence and data warehouse environments performance … Read more

Top Google AI Tools for Everyone

With more developers diving into the world of AI seeing its potential, Google is catering to their dynamic needs by providing several powerful tools such as: The revolution is here! Welcome to TensorFlow 2.0. TensorFlow is Google’s offering to the world as an end-to-end open-source deep-learning library utilizing machine learning to improve the services provided … Read more

Log transform or log link? And confounding variables. by @ellis2013nz

Last week I wrote about the relationship between weight and height in US adults, as seen in the US Centers for Disease Control and prevention (CDC) Behavioral Risk Factor Surveillance System, an annual telephone survey of around 400,000 interviews per year. In particular, I tested the widely-circulated claim that Body Mass Index (BMI) exaggerates the … Read more

Categories R Tags ExcerptFavorite

Boosting Machine Learning Models with Explainable AI (XAI)

Insights on Airbnb listings With a typical machine learning model, the traditional correlation of feature importance analysis often has limited value. In a data scientist’s toolkit, are there reliable, systematic, model agnostic methods that measure feature impact accurate to the prediction? The answer is yes. Here we use a model built on Airbnb data to … Read more

What is the most important factor to graduate admission?

PDP Contour / Multidimensional PDP plots are a special gem — they show how the interaction (hence, why they are also called PDP Interaction plots) between two variables results in a certain chance of admission. The following code generates contour plots for university score vs. all features. for column in X_test.columns.drop(‘University Rating’):features = [‘University Rating’,column]inter1 … Read more

Extracting data from semi-structured tweets using Pandas and regex

Using Series string functions and regex to extract numeric data from text Washington State Ferry. Photo by oakie on Unsplash Today we are transforming Washington State Ferry tweets into the wait time in hours. The tweets have some structure to them but don’t seem to be automated. The goal is to transform: Edm/King — Edmonds … Read more

Machine Learning and Translational Research

Expansion of internet web services and recent advances in high-throughput technologies have made access to the significant biological datasets for the public easy, specifically for the scientific community. As a result, ways to process, analyze, and infer knowledge have drastically changed in recent years, whether it is clinical data, sequencing data, electronic health records, and … Read more

Playing God with Data

The Problem Like a checkerboard, alternating 2-dimensional spaces are marked as 0 and 1. All values of 1 (red) have x1 and x2 values of the same parity (either both even or both odd) while all values of 0 (blue) have x1 and x2 values of opposite parity. A visual is provided below: This can … Read more

Creating a Serverless Python Chatbot API in Microsoft Azure from Scratch in 9 Easy Steps

Learn to create and deploy your own serverless chatbot application with Azure Function Apps that can be used in Slack, Skype, MS Teams and others Chatbots and serverless are two tech trends that have completely dominated the corporate world in 2020:Why not kill two birds with one stone and learn how to build your own … Read more

Data Privacy in the Age of Big Data

In this section, I will introduce three techniques that can be used to reduce the probability that certain attacks can be performed. The simplest of these methods is k-anonymity, followed by l-diversity, and then followed by t-closeness. Other methods have been proposed to form a sort of alphabet soup, but these are the three most … Read more

Coronavirus in Wikipedia by language — visualized

Wikipedia pageviews by language for Coronavirus Check the Wikipedia pageviews for language to get deeper look into how the news has spread and trended around the world. First we’ll extract the data out of terabytes of Wikipedia pageviews to create a new dashboard. Stay until the end to see the secret for extremely configurable visualizations … Read more

One Step Closer to Neuralink?

Researchers in the UK, Italy and Switzerland have created a network capable of transferring signals from biological to artificial neurons using the internet, making potentially significant progress towards ideas such as Elon Musk’s Neuralink. Photo by Joshua Sortino on Unsplash In a study published by the University of Southampton this week, it has been demonstrated … Read more

Drawdowns by the data

[This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’re taking a break from our series on portfolio construction for … Read more

Categories R Tags ExcerptFavorite

The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 3

To complete the analysis on the significance of the sector on the salary for different occupational groups in Sweden I will in this post examine the correlation between salary and sector using statistics for education. The F-value from the Anova table is used as the single value to discriminate how much the region and salary … Read more

Categories R Tags ExcerptFavorite

What to know before you adopt Hugo/blogdown

Fancy (re-)creating your website using Hugo, with or without blogdown?Feeling a bit anxious?This post is aimed at being the Hugo equivalent of “What to know before you adopt a pet”.We shall go through things that can/will break in the future, and what you can do to prevent future pain. I’m writing this post with R … Read more

Categories R Tags ExcerptFavorite

SR2 Chapter 2 Medium

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here’s my solutions to the medium exercises in chapter 2 of McElreath’s … Read more

Categories R Tags ExcerptFavorite

Amazon EKS now available in the AWS China (Beijing) Region, operated by Sinnet, and the AWS China (Ningxia) Region, operated by NWCD

Amazon EKS is a managed Kubernetes service that makes it easy for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or worker nodes. Amazon EKS is certified Kubernetes conformant, so existing applications running on upstream Kubernetes are compatible with Amazon EKS. You can also easily … Read more

Categories AWS ExcerptFavorite

Amazon AppStream 2.0 adds support for native application mode on Windows PCs

When AppStream 2.0 users start a streaming session in native application mode and open a streaming application, the application opens in its own window and functions in the same way as a locally installed application. Because AppStream 2.0 also supports file system redirection, users can share their local folders or drives with their streaming applications. … Read more

Categories AWS ExcerptFavorite

Don’t Create or Scrape Fake Data

It’s already been done for you. A great rule of thumb for writing code, especially in Python, is to look for a module on PyPi or just using Google, before you start writing code yourself. If nobody else has done what you’re trying to do then you still might find articles, partial code, or general … Read more

Hypothesis Testing Explained as Simply as Possible

One of the most important concepts for Data Scientists Image Credits: PIRO4D from Pixabay Introduction Terminology Reject or Do not Reject? What is the point of Significance Testing? Steps for Hypothesis Testing If you’ve heard of the terms null hypothesis, p-value, and alpha but don’t really know what they mean or how they’re related then … Read more

Why our machine learning platform supports Python, not R

Machine learning engineering is maturing Source: Python Disclaimer: The following is based on my observations — not an academic survey of the industry. For context, I’m a contributor to Cortex, an open source machine learning platform (the “our” in this article’s title). There are dozens of articles written comparing the relative merits of Python and … Read more

Generating Fake Dating Profiles for Data Science

Forging Dating Profiles for Data Analysis by Webscraping Photo by Yogas Design on Unsplash Data is one of the world’s newest and most precious resources. Most data gathered by companies is held privately and rarely shared with the public. This data can include a person’s browsing habits, financial information, or passwords. In the case of … Read more

Why do we need Tiny AI?

Image by Worawut licensed from Adobe Stock We all know that algorithms are getting smarter every day, but are they also getting greener? Not at all, and that’s becoming a significant problem. As a result, researchers are working hard to discover new ways of developing smaller algorithms. In this article, we’re going to discuss why … Read more

What Is Data Management?

Stats about Data Management Ninety-five percent of C-suite executives list data management as key to business strategy. Data management allows business leaders to leverage the data they collect from customers and suppliers to propel growth. Data management is how you extract answers and insights from raw data to meet your information needs. The proliferation of … Read more

Who Is the Premier League’s Most Important Player?

The Premier League season is more than two-thirds done. Liverpool have the thing pretty much sewn up (see my earlier blog about how they’ve achieved such superiority), and the usual suspects are involved in the annual scrap to avoid relegation. In my ‘On Target’ blog series, I have been documenting my quest to ‘Moneyball’ Fantasy … Read more

Solving Conditional Probability Problems with the Laws of Total Expectation, Variance, and…

In this article, we’ll see how to use the Laws of Total Expectation, Variance, and Covariance, to solve conditional probability problems, such as those you might encounter in a job interview or while modeling business problems where random variables are conditional on other random variables. I am going to start by asking a couple of … Read more

How to Acquire Large Satellite Image Datasets for Machine Learning Projects

Introduction Historically, only governments and large corporations have had access to quality satellite images. In recent years, satellite image datasets have become available to anyone with a computer and an internet connection. The quality, quantity, and precision of these datasets is continuously improving, and there are many free and commercial platforms at your disposal to … Read more

Categories R Tags ExcerptFavorite

All you need to know on PCA …

[This article was first published on François Husson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. All you need to do with PCA is in Factoshiny! PCA – … Read more

Categories R Tags ExcerptFavorite

Machine Learning with R: A Hands-on Introduction from Robert Muenchen at Machine Learning Week, Las Vegas

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Join Robert Muenchen’s workshop about Machine Learning with R at Machine Learning Week … Read more

Categories R Tags ExcerptFavorite

XGBoostLSS – An extension of XGBoost to probabilistic forecasting

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction  To reason rigorously under uncertainty we need to invoke the language of  … Read more

Categories R Tags ExcerptFavorite

8 Common Data Structures every Programmer must know

Data Structures are a specialized means of organizing and storing data in computers in such a way that we can perform operations on the stored data more efficiently. Data structures have a wide and diverse scope of usage across the fields of Computer Science and Software Engineering. Data structures are being used in almost every … Read more

Simulating epidemics using Go and Python

Simulate and analyse different epidemic scenarios with Go and Jupyter Notebook This is something that’s directly impacting me even as I am typing out this story. What started out as a small outbreak of a novel coronavirus in Wuhan, China towards the end of December 2019, quickly spread to the rest of China and beyond … Read more

QR Matrix Factorization

Now that we know about the QR factorization, once we can actually find it, we will be able to solve the LS problem in the following way: so This means that all we need to do is find an inverse of R, transpose Q, and take the product. That will produce the OLS coefficients. We … Read more

Python Numba or NumPy: understand the differences

Short description supported by examples. Photo by Patrick Tomasso on Unsplash NumPy and Numba are two great Python packages for matrix computations. Both of them work efficiently on multidimensional matrices. In Python, the creation of a list has a dynamic nature. Appending values to such a list would grow the size of the matrix dynamically. … Read more

Segmenting Your Customers on Many Dimensions (or Python for Wine Lovers)

Using K-means clustering on more than two or three attributes. I recently read a book on data analytics called Data Smart, written by John Foreman, head of product for MailChimp. This book is an excellent business analytics primer that walks you through a variety of machine learning use cases, complete with sample data sets and … Read more

The Mechanics of Attention Mechanism

Firstly, in the attention mechanism, we are going to use H, the set of all h_j (the set of all the hidden states) instead of just the last one, so let’s keep it there. Secondly, to simplify things, we are going to focus on one RNN decoder unit, as shown on the right of the … Read more