Source code chapter of ‘evidence-based software engineering’ reworked

[This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The Source code chapter of my evidence-based software … Read more Source code chapter of ‘evidence-based software engineering’ reworked

Complete Guide to Data Visualization with Python

Let’s see the main libraries for data visualization with Python and all the types of charts that can be done with them. We will also see which library is recommended to use on each occasion and the unique capabilities of each library. We will start with the most basic visualization that is looking at the … Read more Complete Guide to Data Visualization with Python

A “Hello World” Into Image Recognition with MNIST

To begin, we’ll load the library Keras and other necessary inputs: import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K Next, we’ll load the MNIST dataset and split it into X train, X test, Y train, and … Read more A “Hello World” Into Image Recognition with MNIST

Solving Satisfiability Problems with Grover’s Algorithm — Quantum Computing

Photo by Michael Dziedzic on Unsplash Quantum Computing to Enhance Machine Learning Besides database searches, Grover’s Algorithm has several applications, one of which is solving satisfiability problems. We’ll explore what satisfiability (SAT) problems are and how their solutions are documented in Qiskit, IBM’s Python library for Quantum Computing. A Boolean SAT problem is the problem … Read more Solving Satisfiability Problems with Grover’s Algorithm — Quantum Computing

Build a web data dashboard in just minutes with Python

Exponentially increase power & accessibility by converting your data visualizations into a web-based dashboard with Plotly Dash. Build a web data dashboard — in just a few lines of Python code I don’t know about you, but I occasionally find it a little bit intimidating to have to code something. This is doubly so when … Read more Build a web data dashboard in just minutes with Python

Attention Mechanism in Deep Learning : Simplified

What happened? Well, it’s easy enough to explain. You were ‘focusing’ on a smaller part of the whole thing because you knew the rest of the image/sentence was not useful to you at that particular moment. So when you were trying to figure out the color of the soccer ball, your mind was showing you … Read more Attention Mechanism in Deep Learning : Simplified

Deep Learning & Healthcare: All the Glitters Ain’t Gold

Why everyone loves Deep Learning? Contrary to traditional Machine Learning (ML) algorithms, Deep Learning is fueled by massive amounts of data and requires high-end machines with powerful GPUs to run within a reasonable timeframe. Both of these requirements are expensive, so why do companies and research labs think the juice worth the squeeze? In traditional … Read more Deep Learning & Healthcare: All the Glitters Ain’t Gold

Model deployment with Apache Beam and Dataflow

Operating your data science models may sometimes be stressful for some data scientists. The more sophisticated your models are, the more struggles you’ll face when it comes to productising. Have you ever regretted to ensemble 5 different models when developing a customer churn classifier? Don’t worry, Apache Beam comes to rescue. Before getting started with … Read more Model deployment with Apache Beam and Dataflow

How To Painlessly Analyze Your Time Series

An introduction to MPA: the Matrix Profile API Image Source: Needpix We’re surrounded by time series data. From finance to IoT to marketing, many organizations produce thousands of these metrics and mine them to uncover business-critical insights. A Site Reliability Engineer might monitor hundreds of thousands of time series streams from a server farm, in … Read more How To Painlessly Analyze Your Time Series

Writing good SQL

Further structuring the query language by adapting layers Photo by National Cancer Institute on Unsplash Do you want to write good SQL? Sure, but what does “good” mean actually? In certain real time surroundings only performance counts as “good” and you measure your execution time in milliseconds. In business intelligence and data warehouse environments performance … Read more Writing good SQL

Top Google AI Tools for Everyone

With more developers diving into the world of AI seeing its potential, Google is catering to their dynamic needs by providing several powerful tools such as: The revolution is here! Welcome to TensorFlow 2.0. TensorFlow is Google’s offering to the world as an end-to-end open-source deep-learning library utilizing machine learning to improve the services provided … Read more Top Google AI Tools for Everyone

Log transform or log link? And confounding variables. by @ellis2013nz

Last week I wrote about the relationship between weight and height in US adults, as seen in the US Centers for Disease Control and prevention (CDC) Behavioral Risk Factor Surveillance System, an annual telephone survey of around 400,000 interviews per year. In particular, I tested the widely-circulated claim that Body Mass Index (BMI) exaggerates the … Read more Log transform or log link? And confounding variables. by @ellis2013nz

Boosting Machine Learning Models with Explainable AI (XAI)

Insights on Airbnb listings With a typical machine learning model, the traditional correlation of feature importance analysis often has limited value. In a data scientist’s toolkit, are there reliable, systematic, model agnostic methods that measure feature impact accurate to the prediction? The answer is yes. Here we use a model built on Airbnb data to … Read more Boosting Machine Learning Models with Explainable AI (XAI)

What is the most important factor to graduate admission?

PDP Contour / Multidimensional PDP plots are a special gem — they show how the interaction (hence, why they are also called PDP Interaction plots) between two variables results in a certain chance of admission. The following code generates contour plots for university score vs. all features. for column in X_test.columns.drop(‘University Rating’):features = [‘University Rating’,column]inter1 … Read more What is the most important factor to graduate admission?

Extracting data from semi-structured tweets using Pandas and regex

Using Series string functions and regex to extract numeric data from text Washington State Ferry. Photo by oakie on Unsplash Today we are transforming Washington State Ferry tweets into the wait time in hours. The tweets have some structure to them but don’t seem to be automated. The goal is to transform: Edm/King — Edmonds … Read more Extracting data from semi-structured tweets using Pandas and regex

Machine Learning and Translational Research

Expansion of internet web services and recent advances in high-throughput technologies have made access to the significant biological datasets for the public easy, specifically for the scientific community. As a result, ways to process, analyze, and infer knowledge have drastically changed in recent years, whether it is clinical data, sequencing data, electronic health records, and … Read more Machine Learning and Translational Research

Creating a Serverless Python Chatbot API in Microsoft Azure from Scratch in 9 Easy Steps

Learn to create and deploy your own serverless chatbot application with Azure Function Apps that can be used in Slack, Skype, MS Teams and others Chatbots and serverless are two tech trends that have completely dominated the corporate world in 2020:Why not kill two birds with one stone and learn how to build your own … Read more Creating a Serverless Python Chatbot API in Microsoft Azure from Scratch in 9 Easy Steps

Data Privacy in the Age of Big Data

In this section, I will introduce three techniques that can be used to reduce the probability that certain attacks can be performed. The simplest of these methods is k-anonymity, followed by l-diversity, and then followed by t-closeness. Other methods have been proposed to form a sort of alphabet soup, but these are the three most … Read more Data Privacy in the Age of Big Data

Coronavirus in Wikipedia by language — visualized

Wikipedia pageviews by language for Coronavirus Check the Wikipedia pageviews for language to get deeper look into how the news has spread and trended around the world. First we’ll extract the data out of terabytes of Wikipedia pageviews to create a new dashboard. Stay until the end to see the secret for extremely configurable visualizations … Read more Coronavirus in Wikipedia by language — visualized

One Step Closer to Neuralink?

Researchers in the UK, Italy and Switzerland have created a network capable of transferring signals from biological to artificial neurons using the internet, making potentially significant progress towards ideas such as Elon Musk’s Neuralink. Photo by Joshua Sortino on Unsplash In a study published by the University of Southampton this week, it has been demonstrated … Read more One Step Closer to Neuralink?

Drawdowns by the data

[This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’re taking a break from our series on portfolio construction for … Read more Drawdowns by the data

The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 3

To complete the analysis on the significance of the sector on the salary for different occupational groups in Sweden I will in this post examine the correlation between salary and sector using statistics for education. The F-value from the Anova table is used as the single value to discriminate how much the region and salary … Read more The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 3

SR2 Chapter 2 Medium

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here’s my solutions to the medium exercises in chapter 2 of McElreath’s … Read more SR2 Chapter 2 Medium

What to know before you adopt Hugo/blogdown

Fancy (re-)creating your website using Hugo, with or without blogdown?Feeling a bit anxious?This post is aimed at being the Hugo equivalent of “What to know before you adopt a pet”.We shall go through things that can/will break in the future, and what you can do to prevent future pain. I’m writing this post with R … Read more What to know before you adopt Hugo/blogdown

Amazon EKS now available in the AWS China (Beijing) Region, operated by Sinnet, and the AWS China (Ningxia) Region, operated by NWCD

Amazon EKS is a managed Kubernetes service that makes it easy for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or worker nodes. Amazon EKS is certified Kubernetes conformant, so existing applications running on upstream Kubernetes are compatible with Amazon EKS. You can also easily … Read more Amazon EKS now available in the AWS China (Beijing) Region, operated by Sinnet, and the AWS China (Ningxia) Region, operated by NWCD

Amazon AppStream 2.0 adds support for native application mode on Windows PCs

When AppStream 2.0 users start a streaming session in native application mode and open a streaming application, the application opens in its own window and functions in the same way as a locally installed application. Because AppStream 2.0 also supports file system redirection, users can share their local folders or drives with their streaming applications. … Read more Amazon AppStream 2.0 adds support for native application mode on Windows PCs

Hypothesis Testing Explained as Simply as Possible

One of the most important concepts for Data Scientists Image Credits: PIRO4D from Pixabay Introduction Terminology Reject or Do not Reject? What is the point of Significance Testing? Steps for Hypothesis Testing If you’ve heard of the terms null hypothesis, p-value, and alpha but don’t really know what they mean or how they’re related then … Read more Hypothesis Testing Explained as Simply as Possible

Why our machine learning platform supports Python, not R

Machine learning engineering is maturing Source: Python Disclaimer: The following is based on my observations — not an academic survey of the industry. For context, I’m a contributor to Cortex, an open source machine learning platform (the “our” in this article’s title). There are dozens of articles written comparing the relative merits of Python and … Read more Why our machine learning platform supports Python, not R

Amazon FSx now enables you to create and use file systems in Shared Amazon Virtual Private Clouds (VPCs)

Amazon FSx, a fully managed service that makes it easy for you to launch and run feature-rich and highly performant file systems with just a few clicks, now enables you to create and use file systems in Shared Amazon Virtual Private Clouds (VPCs). This feature is available on both FSx for Windows File Server and … Read more Amazon FSx now enables you to create and use file systems in Shared Amazon Virtual Private Clouds (VPCs)

Generating Fake Dating Profiles for Data Science

Forging Dating Profiles for Data Analysis by Webscraping Photo by Yogas Design on Unsplash Data is one of the world’s newest and most precious resources. Most data gathered by companies is held privately and rarely shared with the public. This data can include a person’s browsing habits, financial information, or passwords. In the case of … Read more Generating Fake Dating Profiles for Data Science

Set up a Dask Cluster for Distributed Machine Learning

By the way, in how many ways can we spin up a Dask cluster? A Dask cluster can be spun up in multiple ways: Using SSH connections On a Hadoop cluster, essentially running with the power of HDFS and YARN With the help of Kubernetes and a bunch of Docker containers On a supercomputer In … Read more Set up a Dask Cluster for Distributed Machine Learning

What Is Data Management?

Stats about Data Management Ninety-five percent of C-suite executives list data management as key to business strategy. Data management allows business leaders to leverage the data they collect from customers and suppliers to propel growth. Data management is how you extract answers and insights from raw data to meet your information needs. The proliferation of … Read more What Is Data Management?

Who Is the Premier League’s Most Important Player?

The Premier League season is more than two-thirds done. Liverpool have the thing pretty much sewn up (see my earlier blog about how they’ve achieved such superiority), and the usual suspects are involved in the annual scrap to avoid relegation. In my ‘On Target’ blog series, I have been documenting my quest to ‘Moneyball’ Fantasy … Read more Who Is the Premier League’s Most Important Player?

Solving Conditional Probability Problems with the Laws of Total Expectation, Variance, and…

In this article, we’ll see how to use the Laws of Total Expectation, Variance, and Covariance, to solve conditional probability problems, such as those you might encounter in a job interview or while modeling business problems where random variables are conditional on other random variables. I am going to start by asking a couple of … Read more Solving Conditional Probability Problems with the Laws of Total Expectation, Variance, and…

How to Acquire Large Satellite Image Datasets for Machine Learning Projects

Introduction Historically, only governments and large corporations have had access to quality satellite images. In recent years, satellite image datasets have become available to anyone with a computer and an internet connection. The quality, quantity, and precision of these datasets is continuously improving, and there are many free and commercial platforms at your disposal to … Read more How to Acquire Large Satellite Image Datasets for Machine Learning Projects

Machine Learning with R: A Hands-on Introduction from Robert Muenchen at Machine Learning Week, Las Vegas

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Join Robert Muenchen’s workshop about Machine Learning with R at Machine Learning Week … Read more Machine Learning with R: A Hands-on Introduction from Robert Muenchen at Machine Learning Week, Las Vegas

XGBoostLSS – An extension of XGBoost to probabilistic forecasting

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction  To reason rigorously under uncertainty we need to invoke the language of  … Read more XGBoostLSS – An extension of XGBoost to probabilistic forecasting

8 Common Data Structures every Programmer must know

Data Structures are a specialized means of organizing and storing data in computers in such a way that we can perform operations on the stored data more efficiently. Data structures have a wide and diverse scope of usage across the fields of Computer Science and Software Engineering. Data structures are being used in almost every … Read more 8 Common Data Structures every Programmer must know

Simulating epidemics using Go and Python

Simulate and analyse different epidemic scenarios with Go and Jupyter Notebook This is something that’s directly impacting me even as I am typing out this story. What started out as a small outbreak of a novel coronavirus in Wuhan, China towards the end of December 2019, quickly spread to the rest of China and beyond … Read more Simulating epidemics using Go and Python

Python Numba or NumPy: understand the differences

Short description supported by examples. Photo by Patrick Tomasso on Unsplash NumPy and Numba are two great Python packages for matrix computations. Both of them work efficiently on multidimensional matrices. In Python, the creation of a list has a dynamic nature. Appending values to such a list would grow the size of the matrix dynamically. … Read more Python Numba or NumPy: understand the differences

Segmenting Your Customers on Many Dimensions (or Python for Wine Lovers)

Using K-means clustering on more than two or three attributes. I recently read a book on data analytics called Data Smart, written by John Foreman, head of product for MailChimp. This book is an excellent business analytics primer that walks you through a variety of machine learning use cases, complete with sample data sets and … Read more Segmenting Your Customers on Many Dimensions (or Python for Wine Lovers)

Uncovering Government Bias with Statistical Modelling

A data-driven analysis of the Australian ‘Sports Rorts’ scandal You can run, but you can’t hide (from statistics). The Australian Liberal party is about to find this out the hard way. In recent weeks, the Liberal party has been accused of using $100M of sporting grants to win votes in the lead up to the … Read more Uncovering Government Bias with Statistical Modelling