Setting up GitHub Package Registry with Docker and Golang

Note: This was originally posted at martinheinz.dev Generally, for any programming language, to run your application you need to create some kind of package ( npm for JavaScript, NuGet for C#, …) and then store it somewhere. In case of Docker, people usually just throw their images into Docker Hub, but we now have new … Read more Setting up GitHub Package Registry with Docker and Golang

Updated AUPolitics project to work with S3 instead of DB

Some time ago I developed a little project that collects Aussie politicians tweets and present several visualizations. It’s available at https://rserv.levashov.biz/shiny/rstudio/ That time I has quite generous credits from AWS, so haven’t worried too much about costs. Unfortunately credits are about to expire, so I had to optimize the tech stack a bit to reduce … Read more Updated AUPolitics project to work with S3 instead of DB

Estimating the carbon cost of psycholinguistics conferences

Estimating the carbon cost of psycholinguistics conferences Shravan Vasishth 10/5/2019 Note: If I have made some calculation error, please point it out and I will fix it. At the University of Potsdam we are discussing how to reduce our carbon footprint in science-related work. One thought I had was that we could reduce our carbon … Read more Estimating the carbon cost of psycholinguistics conferences

Simple Transformers — Introducing The Easiest BERT, RoBERTa, XLNet, and XLM Library.

Want to use Transformer models for NLP? Pages of code got you down? Not anymore because Simple Transformers is on the job. Start, train, and evaluate Transformers with just 3 lines of code! The Simple Transformers library is built as a wrapper around the excellent Transformers library by Hugging Face. I am eternally grateful for … Read more Simple Transformers — Introducing The Easiest BERT, RoBERTa, XLNet, and XLM Library.

Natural Language Processing (NLP) analysis of product reviews by online shoppers

This is my 4th project in Metis Data Science Bootcamp. The goal is to use Natural Language Processing (NLP) to analyse product reviews submitted by online shoppers. I started working on this project towards 3 business objectives: to find principal components on the ratings, using NLP unsupervised machine learning to predict product ratings based on … Read more Natural Language Processing (NLP) analysis of product reviews by online shoppers

Regression Analysis on Life Expectancy

Picture taken from https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy Python codes are available on my GitHub. JNYH/Project-Luther Second project in Metis Data Science Bootcamp. Has dengue affected the life expectancy of people in any country? Do… github.com The topic has been randomly chosen when I was exploring the dengue trend in Singapore. There has been a recent spike in dengue … Read more Regression Analysis on Life Expectancy

How to get a fuller picture of a model’s accuracy

Going back to the Advertising data set, with the same single factor linear regression model (tv spend * radio spend), there are only a few changes needed to implement k-fold cross_val_score. # import necessary python modules and classesfrom sklearn.model_selection import cross_val_scorefrom sklearn.linear_model import LinearRegression # your model herefrom sklearn.model_selection import train_test_splitimport pandas as pdimport numpy … Read more How to get a fuller picture of a model’s accuracy

How Are Insurance Companies Implementing Artificial Intelligence (AI)?

Insurers are using AI to provide better, faster and cheaper services to customers. Artificial Intelligence (AI) has become a buzzword in the insurance industry. Still, the industry has made significant progress in AI implementation, although we are still in the early days. In this article we will look at: Why the insurance industry needs AI … Read more How Are Insurance Companies Implementing Artificial Intelligence (AI)?

How to Use Convolutional Neural Networks for Time Series Classification

A gentle introduction, state-of-the-art model overview, and a hands-on example. Photo by Christin Hume on Unsplash. A large amount of data is stored in the form of time series: stock indices, climate measurements, medical tests, etc. Time series classification has a wide range of applications: from identification of stock market anomalies to automated detection of … Read more How to Use Convolutional Neural Networks for Time Series Classification

How to get started with Data Science : A brief tutorial on using Anaconda, Python, Jupyter…

In this article, I wanted to write about my experience of overcoming the initial hurdle and getting started with learning Data Science. Learning data science is a journey and you will keep learning once you get started. In this article we will go through following 5 starting steps for getting into the field of learning … Read more How to get started with Data Science : A brief tutorial on using Anaconda, Python, Jupyter…

Python Tips and Tricks, You Haven’t Already Seen, Part 2

Note: This was originally posted at martinheinz.dev Few weeks ago I posted an article (here) about some not so commonly known Python features and quite a few people seemed to like it, so here comes another round of Python features that you hopefully haven’t seen yet. Using lots of hardcoded index values can quickly become … Read more Python Tips and Tricks, You Haven’t Already Seen, Part 2

Data Science is Boring (Part 2)

In short, I argue that a) boring problems are good and b) we should apply ML to solve more boring problems, but innovatively. Why are boring problems good? Boring problems are good because they represent steady-state operational issues. These operations drive the core of businesses. The core of the business creates consistent and substantive value. … Read more Data Science is Boring (Part 2)

AWS Backup Enhances SNS Notifications to filter on job status

AWS Backup offers a centralized, managed service to back up data across AWS services in the cloud and on premises using Storage Gateway. AWS Backup serves as a single dashboard for backup, restore, and policy-based retention of different AWS resources, including Amazon EBS volumes, Amazon RDS databases, Amazon DynamoDB tables, Amazon EFS file systems, and … Read more AWS Backup Enhances SNS Notifications to filter on job status

Archive

Easily access the thousands of incredible articles we’ve published through the years. Welcome to the Towards Data Science Archive! Over the years, we’ve published thousands of incredible pieces by talented and amazing writers. We’ve covered groundbreaking concepts, new technologies, cutting-edge research, and more. Unfortunately, we can’t keep everything on the front page forever. Our archives … Read more Archive

Split-apply-combine for Maximum Likelihood Estimation of a linear model

Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot ineconometrics and other sciences, but seems, at least to my knowledge, to not be so well known bymachine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to dataused in econometrics … Read more Split-apply-combine for Maximum Likelihood Estimation of a linear model

A full RStudio Server setup for Data Science in 5 minutes

[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the end of the post there is a promotional link for free … Read more A full RStudio Server setup for Data Science in 5 minutes

How Machine Learning is Transforming Healthcare at Google and Beyond

Google and others are recruiting algorithms to spot cancer in medical scans, predict the outcome of hospital visits, and more. Here’s how. Machine Learning — the art of using patterns in data to make predictions — stands to transform almost every industry, from finance, retail, and marketing to digital assistants and self-driving cars. But when … Read more How Machine Learning is Transforming Healthcare at Google and Beyond

Introducing Amazon SageMaker ml.p3dn.24xlarge instances, optimized for distributed machine learning with up to 4x the network bandwidth of ml.p3.16xlarge instances

The ml.p3dn.24xlarge instances provide up to 100 Gbps of networking throughput, 96 custom Intel® Xeon® Scalable (Skylake) vCPUs, 8 NVIDIA® V100 Tensor Core GPUs with 32 GB of memory each, 300 GB/s NVLINK GPU interconnect, and 1.8 TB of local NVMe-based SSD storage. Compared to the next largest P3 instance, the 4X increase in network … Read more Introducing Amazon SageMaker ml.p3dn.24xlarge instances, optimized for distributed machine learning with up to 4x the network bandwidth of ml.p3.16xlarge instances

Amazon EventBridge is now available in AWS Middle East (Bahrain) Region

Amazon EventBridge is a serverless event bus that delivers a stream of real-time data from event sources, including SaaS applications like Zendesk, Datadog, and Pagerduty, and routes that data to targets like AWS Lambda. You can set up routing rules to determine where to send your data to build application architectures that react in real … Read more Amazon EventBridge is now available in AWS Middle East (Bahrain) Region

Automated Machine Learning for Data Analysts & Business Users

Augmented AI — Machine Learning Build your first machine learning model in less than five minutes. Automated Machine Learning for Business Users & Data Analysts Automated Machine Learning (AutoML) represents a fundamental shift in the way organizations of all sizes approach machine learning and data science. Machine Learning is a branch of Artificial Intelligence based … Read more Automated Machine Learning for Data Analysts & Business Users

How to find Feature importances for BlackBox Models?

1. For sklearn Models ELI5 library makes it quite easy for us to use permutation importance for sklearn models. First, we train our model. from sklearn.ensemble import RandomForestClassifiermy_model = RandomForestClassifier(n_estimators=100,random_state=0).fit(X, y) Then we use the function PermutationImportance from the eli5.sklearn module. from eli5.sklearn import PermutationImportanceimport eli5perm = PermutationImportance(my_model,n_iter=2).fit(X, y)eli5.show_weights(perm, feature_names = X.columns.tolist()) The results look … Read more How to find Feature importances for BlackBox Models?

Pedestrian detection using Non Maximum Suppression

A complete pipeline for detecting pedestrians on the road Pedestrian detection is still an unsolved problem in computer science. While many object detection algorithms like YOLO, SSD, RCNN, Fast R-CNN and Faster R-CNN have been researched a lot to great success but still pedestrian detection in crowded scenes remains an open challenge. In recent years, … Read more Pedestrian detection using Non Maximum Suppression

Web App vs Cloud App: How Can You Choose The Right Technology?

Do you want to start your online venture, but confused between two terms: cloud-based apps and web-based apps? Do you want to turn your idea into reality, but don’t know where to proceed: cloud or web way? This is the exact case with many people. Let’s find out the answer to these questions in this … Read more Web App vs Cloud App: How Can You Choose The Right Technology?

19 Innovative Ways to Use Information Visualization Across a Variety of Fields

When large amounts of data are presented as numbers on a spreadsheet, it’s not uncommon to hear groaning in the room. It’s worse if there are tons of variables and time frames. Information visualization can make a huge difference. Turning numbers, percentages, statistics, differences, ratios and other kind of boring, numerical data into a creative … Read more 19 Innovative Ways to Use Information Visualization Across a Variety of Fields

A closer look into the Spanish railway passenger transportation pricing

As someone who lives and works in a Spanish city 400km away from home, I have found that the most convenient way to travel back and forth is to resort to the train. As a frequent user I have grown baffled of the pricing pattern upon buying the tickets, moving sometimes along the same levels, … Read more A closer look into the Spanish railway passenger transportation pricing

How I Learned Data Science And The 1 Course That Changed Everything

If you don’t know where to start Unsplash If you are reading this, you have probably just started on your data science journey and am wondering what course to take to catapult you to the next level. I previously answered a question about this topic on Quora, which got a decent amount of views, so … Read more How I Learned Data Science And The 1 Course That Changed Everything

Transitioning a typical engineering ops team into an SRE powerhouseTransitioning a typical engineering ops team into an SRE powerhouseSRE

Perpetually adding engineers to ops teams to meet customer growth doesn’t scale. Google’s Site Reliability Engineering (SRE) principles can help, bringing software engineering solutions to operational problems. In this post, we’ll take a look at how we transformed our global network ops team by abandoning traditional network engineering orthodoxy and replacing it with SRE. Read … Read more Transitioning a typical engineering ops team into an SRE powerhouseTransitioning a typical engineering ops team into an SRE powerhouseSRE

Last month today: GCP in SeptemberLast month today: GCP in September

Here at Google Cloud Platform (GCP), we welcomed fall and back-to-school season in September with new Anthos and Kubernetes features, along with sharing new customer stories. Here are the top stories from last month. Building your cloud, your way A few new Anthos capabilities came out last month, adding even more flexibility to our hybrid … Read more Last month today: GCP in SeptemberLast month today: GCP in September

Black Knight and the quest to conquer an ecosystem of partners, developers, and customers with ApigeeBlack Knight and the quest to conquer an ecosystem of partners, developers, and customers with ApigeeSenior API Strategy Product Manager, Black Knight

Editor’s note: Today we hear from Brad Homer, Senior API Strategy Product Manager at Black Knight Inc., on how the company uses the Apigee API Management Platform from Google Cloud to transform integrated software, data, and analytics solutions for the mortgage industry. Read on to learn how Black Knight uses APIs to facilitate and automate … Read more Black Knight and the quest to conquer an ecosystem of partners, developers, and customers with ApigeeBlack Knight and the quest to conquer an ecosystem of partners, developers, and customers with ApigeeSenior API Strategy Product Manager, Black Knight

Build a Docker Container with Your Machine Learning Model

A Complete Guide with Template Scripts for Docker Beginners Photo by Chris Leipelt on Unsplash As a data scientist, I don’t have a lot of software engineering experience but I have certainly heard a lot of great comments about containers. I have heard about how lightweight they are compared to traditional VMs and how good … Read more Build a Docker Container with Your Machine Learning Model

How Exactly UMAP Works

And why exactly it is better than tSNE This is the twelfth post in the column Mathematical Statistics and Machine Learning for Life Sciences where I try to cover analytical techniques common for Bioinformatics, Biomedicine, Genetics etc. Today we are going to dive into an exciting dimension reduction technique called UMAP that dominates the Single … Read more How Exactly UMAP Works

Why your machine learning model may be melting icebergs.

Like many of you in Towards Data Science, I am self-taught in machine learning and can remember the day I first left my laptop in model training, fan whirring away, to return hours later and find it still buzzing along. “Hmm”, I thought, “maybe I need a GPU”. Fellow DIY thinkers: it is not a … Read more Why your machine learning model may be melting icebergs.

Know the Calculations behind Kernel Regression — with example and code

Photo by Markus Spiske on Unsplash In this article, how kernel function is used as a weighing function to develop non-parametric regression model is discussed. In the beginning of the article, a brief discussion about properties of kernel functions and steps to build kernels around data points are presented. In non-parametric statistics, a kernel is … Read more Know the Calculations behind Kernel Regression — with example and code

The Unspoken Data-Science Soft Skills

One of the most unspoken, and most important soft-skills that every data-scientist needs to master is proficient reading and writing. If you are really good with computers, but failed English class, Data Science might not be for you. One of the most important things to any software-engineer is being able to easily and quickly decipher … Read more The Unspoken Data-Science Soft Skills

Creating an automated framework for predicting sports results

How I built an automated machine learning framework to predict the results of rugby matches and tweet the outputs without any oversight I recently decided to start a side project which combined my love of rugby with my love of Data Science – and so Mel Rugby was born. Mel is a framework I created … Read more Creating an automated framework for predicting sports results

A demonstration of carrying data analysis (Crimes in Denver EDA)

This is my second demonstration of carrying data analysis using Python. My previous article is about New York City Airbnb Open Data. Please have a look and give me your comments and thoughts so I can keep improving. A demonstration of carrying data analysis (New York City Airbnb Open Data) In this article, I will … Read more A demonstration of carrying data analysis (Crimes in Denver EDA)

Simple neural network implementation in C

Let’s begin! We start by defining a couple of helper functions, including the activation function and its corresponding derivative. A third function is used to initialize weights between 0.0 and 1.0: // Activation function and its derivativedouble sigmoid(double x) { return 1 / (1 + exp(-x)); }double dSigmoid(double x) { return x * (1 — … Read more Simple neural network implementation in C

Expanding binomial counts to binary 0/1 with purrr::pmap()

Data on successes and failures can be summarized and analyzed as counted proportions via the binomial distribution or as long format 0/1 binary data. I most often see summarized data when there are multiple trials done within a study unit; for example, when tallying up the number of dead trees out of the total number … Read more Expanding binomial counts to binary 0/1 with purrr::pmap()

Prim’s Minimum Spanning Tree Implementation

Graph is a non linear data structure that has nodes and edges. Minimum Spanning Tree is a set of edges in an undirected weighted graph that connects all the vertices with no cycles and minimum total edge weight. When number of edges to vertices is high, Prim’s algorithm is preferred over Kruskal’s. This content is … Read more Prim’s Minimum Spanning Tree Implementation

Amazon Elasticsearch Service provides option to mandate HTTPS

The require HTTPS configuration is now available for Amazon Elasticsearch Service domains across 21 regions globally: US East (N. Virginia, Ohio), US West (Oregon, N. California), AWS GovCloud (US-Gov-East, US-Gov-West), Canada (Central), South America (Sao Paulo), EU (Ireland, London, Frankfurt, Paris, Stockholm), Asia Pacific (Singapore, Sydney, Tokyo, Seoul, Mumbai, Hong Kong), and China (Beijing – … Read more Amazon Elasticsearch Service provides option to mandate HTTPS

Colonizing Franky

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Y otra vez me arranco despacito, al sentir que nada necesito … Read more Colonizing Franky

Bias and Algorithmic Fairness

The modern business leader’s new responsibility in a brave new world ruled by data. As Data Science moves along the hype cycle and matures as a business function, so do the challenges that face the discipline. The problem statement for data science went from “we waste 80% of our time preparing data” via “production deployment … Read more Bias and Algorithmic Fairness