Rethinking Continuous Integration for Data Science

Software Engineering for Data Science A widely used practice in software engineering deserves its own flavor in our field Photo by Yancy Min on Unsplash As Data Science and Machine learning get wider industry adoption, practitioners realize that deploying data products comes with a high (and often unexpected) maintenance cost. As Sculley and co-authors argue … Read more Rethinking Continuous Integration for Data Science

Roadmap to Machine Learning: Key Concepts Explained

What if our memory was a storage device? How much easier the learning process would be. But the reality is to become an excellent professional in something you need to go through the thorny path. You learn, you forget, you make mistakes, you learn again, absorb new things, and thus you form a picture of … Read more Roadmap to Machine Learning: Key Concepts Explained

10 Minutes to Building a Fully-Connected Binary Image Classifier in TensorFlow

Photo by Waranont (Joe) on Unsplash How to build a binary image classifier using fully-connected layers in TensorFlow/Keras This is a short introduction to computer vision — namely, how to build a binary image classifier using only fully-connected layers in TensorFlow/Keras, geared mainly towards new users. This easy-to-follow tutorial is broken down into 3 sections: … Read more 10 Minutes to Building a Fully-Connected Binary Image Classifier in TensorFlow

beta: Evidence-based Software Engineering – book

[This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. My book, Evidence-based software engineering: based on the … Read more beta: Evidence-based Software Engineering – book

K Nearest Neighbors by hand: A Computer Science exercise for the Data Scientist

Opening the “black box” and understanding the algorithm within Data scientists sometimes talk about a “black box” approach to data science. That is, when you understand the use cases for different machine learning algorithms and how to plug in the data without understanding how the algorithm works beneath the surface. But the algorithms are just … Read more K Nearest Neighbors by hand: A Computer Science exercise for the Data Scientist

New IT Cost Assessment program: Unlock value to reinvest for growthNew IT Cost Assessment program: Unlock value to reinvest for growthVP, Solutions Engineering

If you’re in IT, chances are you’re under pressure to prioritize investments and optimize costs in response to the current economic climate. According to a recent survey of our customers1, that situation describes 84% of IT decision makers. Likewise, Forrester Research has said CIOs could face a minimum of 5% budget cuts in 20202, and … Read more New IT Cost Assessment program: Unlock value to reinvest for growthNew IT Cost Assessment program: Unlock value to reinvest for growthVP, Solutions Engineering

Measuring Financial Risk: A Step-by-Step Guide

To calculate our own VaR and ES, we’ll use data for the Wilshire 5000, a stock market index widely considered to be the broadest measure of U.S. stock prices. We can use quantmod to import our data from FRED, the Federal Reserve Economic Database. We’ll also use ggplot2 to visualize our data. Let’s load our … Read more Measuring Financial Risk: A Step-by-Step Guide

How to create Latex tables directly from Python code

Copying tables of results from the console into a Latex report can be tedious and error fraught — so why not automate it? Making tables should be simple and elegant (Photo by Roman Bozhko on Unsplash). Creating tables of results plays a major part in communicating the outcomes of experiments in data science. Various solutions … Read more How to create Latex tables directly from Python code

Serverless BERT with HuggingFace and AWS Lambda

A typical transformers model consists of a pytorch_model.bin, config.json, special_tokens_map.json, tokenizer_config.json, and vocab.txt. The pytorch_model.bin has already been extracted and uploaded to S3. We are going to add config.json, special_tokens_map.json, tokenizer_config.json, and vocab.txt directly into our Lambda function because they are only a few KB in size. Therefore we create a model directory in our … Read more Serverless BERT with HuggingFace and AWS Lambda

Machine Learning Basics: Multiple Linear Regression

Learn to Implement Multiple Linear Regression with Python programming. In the previous story, I had given a brief of Linear Regression and showed how to perform Simple Linear Regression. In Simple Linear Regression, we had one dependent variable (y) and one independent variable (x). What if the marks of the student depended on two or … Read more Machine Learning Basics: Multiple Linear Regression

Deploying Python script to Docker container and connect to external SQL Server(in 10 minutes)

Finally we want to build and run the image. #Build the imagedocker build -t my-app .#Run itdocker run my-app#Find container namedocker ps –last 1#Check logsdocker logs <container name> If you want to explore the container and run the script manually then modify last line of the Dockerfile, build and run again: #CMD [“python”,”-i”,”main.py”]CMD tail -f … Read more Deploying Python script to Docker container and connect to external SQL Server(in 10 minutes)

How Can AI Boost Call Center Moral?

What if we used these technologies to actually make call center agents’ lives better? I don’t mean coaching them to do a better job. “Feedback overload” is already a recognized problem in call centers. I mean helping them cope with the fact that their job is emotionally draining. Remember how frustrating it was the last … Read more How Can AI Boost Call Center Moral?

Build your own deep learning classification model in Keras

Step #6: Create our model In this task we will build a classification convolutional neural network from scratch and train it to recognize the 20 target classes in the Pascal Voc dataset. Our Model architecture will be based on the popular VGG-16 architecture. This is a CNN with a total of 13 convolutional layers (cfr. … Read more Build your own deep learning classification model in Keras

Anything2Vec: Mapping Reddit into Vector Spaces

Generalizing Word2Vec away from word embeddings “Subreddit Embedding” and the 100 closest subreddits to /r/nba A common problem in ML, natural language processing (NLP), and AI at large surrounds representing objects in a way computers can process. And since computers understand numbers — which we have a common language for comparing, combining and manipulating — … Read more Anything2Vec: Mapping Reddit into Vector Spaces

Azure Cost Management + Billing updates – June 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in. We’re always looking … Read more Azure Cost Management + Billing updates – June 2020

New Azure Firewall features in Q2 CY2020

We are pleased to announce several new Azure Firewall features that allow your organization to improve security, have more customization, and manage rules more easily. These new capabilities were added based on your top feedback: Custom DNS support now in preview. DNS Proxy support now in preview. FQDN filtering in network rules now in preview. IP … Read more New Azure Firewall features in Q2 CY2020

Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models

Forecasting the future has always been one of man’s biggest desires and many approaches have been tried over the centuries. In this post we will look at a simple statistical method for time series analysis, called AR for Autoregressive Model. We will use this method to predict future sales data and will rebuild it to … Read more Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models

[Paper Summary] Distilling the Knowledge in a Neural Network

Photo by Aw Creative on Unsplash The authors start the paper with a very interesting analogy to explain the notion that the requirements for the training & inference could be very different. The analogy given is that of a larva and its adult form and the fact the requirements of nourishments for the two forms … Read more [Paper Summary] Distilling the Knowledge in a Neural Network

The Correct Way to Measure Inference Time of Deep Neural Networks

The network latency is one of the more crucial aspects of deploying a deep network into a production environment. Most real-world applications require blazingly fast inference time, varying anywhere from a few milliseconds to one second. But the task of correctly and meaningfully measuring the inference time, or latency, of a neural network, requires profound … Read more The Correct Way to Measure Inference Time of Deep Neural Networks

How to scrape ANY website with python and beautiful soup

Now you don’t need to know how HTML/CSS works (although, it can be really helpful if you do). The only thing that’s important to know is that you can think of every HTML tag as an object. These HTML tags have attributes that you can query, and each one is different. Each line of code … Read more How to scrape ANY website with python and beautiful soup

Measuring Agreement with Cohen’s Kappa Statistic

This lesser-known metric can help you better evaluate how models perform on imbalanced data A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary … Read more Measuring Agreement with Cohen’s Kappa Statistic

Is Facial Recognition Technology Racist? State of the Art algorithms explained

Let’s break this function down one by one. The first component, face classification, simply penalizes the model for saying that there is a face at a location, while no face exists in the image. “Face box regression” is a fancy term for the distance between the bounding box coordinates of the predicted face and the … Read more Is Facial Recognition Technology Racist? State of the Art algorithms explained

A Complete Beginner’s Guide to Deal with NULL Values in SQL

We cannot use comparison operators=,<,>,<>to test for NULL values. Instead, we have to use IS NULL and IS NOT NULL predicates. IS NULL: Return rows that contain NULL values Syntax: expression IS NULL SELECT ID, Student,Email1,Email2FROM tblSouthParkWHERE Email1 IS NULL AND Email2 IS NULLORDER BY ID The above query yields all records where both Email1 … Read more A Complete Beginner’s Guide to Deal with NULL Values in SQL

AI pseudoscience and scientific racism

Recent attempts to predict criminality from facial features recall a long tradition of unethical and racist pseudoscience Source: Wikimedia Commons A recent paper about to be published by Harrisburg University caused quite a stir earlier this month. Titled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” the paper promised: With 80 percent … Read more AI pseudoscience and scientific racism

Coronavirus: Which country got it right?

Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here. In this … Read more Coronavirus: Which country got it right?

How (NOT) To Predict Stock Prices With LSTMs

Not so recently, a brilliant and ‘original’ idea suddenly struck me — what if I could predict stock prices using Machine Learning. After all, a time series can be easily modeled with an LSTM. I could see myself getting rich overnight! If this is so easy, why hasn’t anyone done it yet? Very excited at … Read more How (NOT) To Predict Stock Prices With LSTMs

Why Building an AI Decentralized Autonomous Organization (AI DAO)

Beyond the already complex challenge of implementing AI, some companies have started analyzing the possible benefits of building an AI Decentralized Autonomous Organizations (AI DAOs). During my latest mission, I had to help create new business models, identify the right AI approach, and create a roadmap for the creation of several AI DAOs proof of … Read more Why Building an AI Decentralized Autonomous Organization (AI DAO)

The Bechdel test and the X-Mansion with tidymodels and #TidyTuesday

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Lately I’ve been publishingscreencasts demonstrating how to use thetidymodels framework, … Read more The Bechdel test and the X-Mansion with tidymodels and #TidyTuesday

My Tableau dashboards sucked – until I started drawing them

Data visualization tools such as Tableau are loved and used because of how simple they make it to show correlations in large datasets. The exact reason they are used is also their biggest flaw. It’s too easy to simply click buttons until you find something which looks acceptable. Lets look at some examples. I’ve recreated … Read more My Tableau dashboards sucked – until I started drawing them

Why R? Webinar – JD Long – Helping drive data science adoption in organizations

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. July 2nd (8:00pm UTC+2) is a date for the last Webinar at Why … Read more Why R? Webinar – JD Long – Helping drive data science adoption in organizations

one bridge further

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Jackie Wong, Jon Forster (Warwick) and Peter Smith have just … Read more one bridge further

Find your most expensive lines of code and improve code quality with Amazon CodeGuru – now generally available

Developers can use Amazon CodeGuru Profiler to identify the most expensive lines of code by helping them understand the runtime behavior of their applications, identify and remove code inefficiencies, improve performance, and significantly decrease compute costs. Amazon CodeGuru Profiler provides visualizations and recommendations on how to fix performance issues and the estimated cost of running … Read more Find your most expensive lines of code and improve code quality with Amazon CodeGuru – now generally available

Amazon Virtual Private Cloud (VPC) customers can now use their own Prefix Lists to simplify the configuration of security groups and route tables

VPC security groups and route tables are used to control access and routing policies. Customers often have a common set of CIDR blocks for security group and route table configurations. Prefix Lists allows you to group multiple CIDR blocks into a single object, and use it as a reference in your security groups or route … Read more Amazon Virtual Private Cloud (VPC) customers can now use their own Prefix Lists to simplify the configuration of security groups and route tables

Learn How to Create Web Data Apps in Python

import streamlit as stimport pandas as pdimport plotly.express as pximport pydeck as pdkimport numpy as np#Load and Cache the [email protected](persist=True)def getmedata():url = ‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv’df = pd.read_csv(url, delimiter=’,’, header=’infer’)df.rename(index=lambda x: df.at[x, ‘Country/Region’], inplace=True)dft = df.loc[df[‘Province/State’].isnull()]dft = dft.transpose()dft = dft.drop([‘Province/State’, ‘Country/Region’, ‘Lat’, ‘Long’])dft.index = pd.to_datetime(dft.index)return(dft, df)df1 = getmedata()[0]st.title(‘Building a Data Dashboard with Streamlit’)st.subheader(‘while exploring COVID-19 data’)#####In Scope Countriescountrylist … Read more Learn How to Create Web Data Apps in Python

Predicting Future Wars

Insights from Open Data and Machine Learning I know what you are thinking: Wars are rare and complicated events, one can’t expect to take into account their entire complexity. And you are right, they spring from an intricate array of political, economic, and historical reasons without forgetting the thick coat of randomness, thus they should … Read more Predicting Future Wars

Industrialize Analytics — How do we get there?

This article is Part 1 of the series “Winning in Analytics!”. Let’s look at key enablers, to scale your AI initiatives with success. Photo by Tim Mossholder on Unsplash Dear AI Enthusiasts, we love to realize the full potential of our data! We would love to see our analytics proof of concepts achieve reality! But … Read more Industrialize Analytics — How do we get there?

Neural Networks using Tensorflow via Keras in R – Video

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. On June 25th we had a pleasure to host Why R? Webinar with … Read more Neural Networks using Tensorflow via Keras in R – Video

Black-Scholes Option Pricing is Wrong

Theory, assumptions, problems, and solutions for practitioners Photo by Pixabay from Pexels The equation offered by Black and Scholes (1973) is the standard theoretical pricing model for European options. The keyword being theoretical as the Black-Scholes model makes some key assumptions that are immediately violated in practice. Key model assumptions: No transaction costs No arbitrage … Read more Black-Scholes Option Pricing is Wrong

Kernel Live Patching for Amazon Linux 2 is now generally available

Many AWS customers introduce security updates by rolling out patched machine images (AMI) or by in-place patching instances followed by rolling restarts. This process is usually time consuming and may result in disruptions to running applications. Kernel Live Patching in Amazon Linux provides a way to reduce disruption and accelerate a rollout by applying a … Read more Kernel Live Patching for Amazon Linux 2 is now generally available

How to Avoid Potential Machine Learning Pitfalls

This post is for all those data science aficionados out there who recently jumped on to the machine learning bandwagon. Whether you studied data science in college or are autodidactic, most aspiring data scientists get a reality check when trying their hand on a machine learning project in a practical setting. I struggled with the … Read more How to Avoid Potential Machine Learning Pitfalls

A guide to setting up monitoring for object creation in Cloud StorageA guide to setting up monitoring for object creation in Cloud StorageBig Data and Analytics Cloud ConsultantStrategic Cloud Engineer

Cloud Storage provides worldwide, highly durable object storage that scales to exabytes of data. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, and big data analytics. The objects are … Read more A guide to setting up monitoring for object creation in Cloud StorageA guide to setting up monitoring for object creation in Cloud StorageBig Data and Analytics Cloud ConsultantStrategic Cloud Engineer

Machine Learning Explainability Introduction via eli5

Tree-based Feature Importance Machine learning model such as random forests is typically treated as a black-box. Why? A forest consists of a large number of deep trees, where each tree is trained on bagged data using a random selection of features. To gaining a full understanding by examining each tree would close to impossible. For … Read more Machine Learning Explainability Introduction via eli5