Visualising Data with Seaborn – Who Pays More For Health Insurance?

Seaborn is a data visualisation library in Python. This tutorial article demonstrates the application of various Seaborn plots in visualising the amount paid by different individuals for their health insurance. You can find the complete notebook here. A picture is worth a thousand words. I recently caught up with a data scientist working in the … Read more Visualising Data with Seaborn – Who Pays More For Health Insurance?

Linear regression and gradient descent for absolute beginners

Gradient Descent Algorithm In machine learning terminology, the sum of squared error is called the “cost”. This cost equation is: cost equation Where: This equation is therefore roughly “sum of squared errors” as it computes the sum of predicted value minus actual value squared. The 1/2mis to “average” the squared error over the number of … Read more Linear regression and gradient descent for absolute beginners

Google Sheets to Google BigQuery — Move Your Data

Source: Depositphotos How to transfer data from Google BigQuery to Google Sheets and from Google Sheets to Google BigQuery without CSV files and paid services? If you are looking for a convenient way to transfer data from and to Google Sheets and Google BigQuery this article is for you. Learn how you can build any … Read more Google Sheets to Google BigQuery — Move Your Data

Top 3 Business Intelligence Tools for Data Analysis and Visualization

PowerBI is a collection of software services, apps, and connectors that work together to turn unrelated sources into coherent, visually immersive, and interactive insights — at least according to Microsoft. PowerBI Showcase: Microsoft’s Sales and Marketing dashboard (source) PowerBI is one of the most widely recognized BI tools due to its intuitive interface, various visualization … Read more Top 3 Business Intelligence Tools for Data Analysis and Visualization

Learn to Write Functions Others Can Use in Python

Do One Thing At a Time A common mistake many beginners make is writing too long and complicated functions. It is always recommended to design functions to only perform one specific task. Small and precise functions are easy to test and debug with modern IDEs and will be flexible. Now, you might be thinking: ‘I … Read more Learn to Write Functions Others Can Use in Python

Why Election Forecasts Were Wrong: The Economist Model

It’s a simple case of Bayesian Statistics In the days leading up to the big election, The Economist predicted that vice-president Biden had a 97% chance of winning the U.S. presidential elections. In 2016, the same publication predicted a 99% chance of winning for Hillary Clinton. As we know now, both predictions were way off, … Read more Why Election Forecasts Were Wrong: The Economist Model

Introduction to f-strings

The old way Do you remember when you have learned about strings? You probably have defined a function hello_world similar to the code below: def hello_world():print(“Hello, world!”) And then you progressed to write a function that could greet not the whole world but a particular person. Something similar to the definition below: def hello_world(name):print(“Hello, ” … Read more Introduction to f-strings

Autoencoders and the Denoising Feature: From Theory to Practice…

If we had to summarize them in one sentence, it would probably sound like: Autoencoders are neural network trained in an unsupervised way to attempt to copy inputs to outputs. Yes I know, it may seem quite easy and useless. However, we will see that this is neither trivial nor pointless. In fact, Autoencoders are … Read more Autoencoders and the Denoising Feature: From Theory to Practice…

Google Data Studio. It’s Free But Is It Any Good?

A product review based on my 6 major differentiators of BI tools I’ll be evaluating Data Studio based on 6 different criteria. They are, for me, the 6 major differentiators of BI tools. They are: Connectors Data management Calculations Data visualisation Interactivity Publishing Photo by fabio on Unsplash So, connectors are the data sources that … Read more Google Data Studio. It’s Free But Is It Any Good?

How to use Metabase Metadata to facilitate Data Discovery

Let’s start off with something simple to answer our first point above about monitoring self-serve: how many questions are being viewed and how many unique users are viewing questions? Use Cases: as your team scales you can track adoption of Metabase; as you add new models you can see if adoption and utilization increases. The … Read more How to use Metabase Metadata to facilitate Data Discovery

10 Awesome Real-World Applications Of Data Science And AI

Photo by Austin Distel on Unsplash The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. Let us understand this concept with a simple example. … Read more 10 Awesome Real-World Applications Of Data Science And AI

Text Analysis in Foreign Languages

Project Background Recently I have started a for-fun project analyzing Twitter posts about a Japanese Show I am watching. In my previous posts, I have discussed the use of the Twint library to gather all show related tweets, and some analysis about the amount of tweets, tweets related actions such as the number of replies, … Read more Text Analysis in Foreign Languages

How to store financial data: a SQL vs No-SQL comparison

Photo by Pierre Jarry on Unsplash This article measures the performance of alternative solutions in storing Open, High, Low, Close (OHLC) prices and volume data, the kind of data used by candlestick charts. I measure used storage space and the speed of inserting, reading and deleting whole time series, and the speed of appending records … Read more How to store financial data: a SQL vs No-SQL comparison

3 Top Business Intelligence Tools Compared: Tableau, PowerBI, and Sisense

Business Intelligence (BI) is used to transform data into actionable insights that provide value to an organization and help achieve its business goals. Reports and dashboards are go-to approaches for modern-day business intelligence tools. At Appsilon, we are big advocates of R and R Shiny, but we also have significant experience with business intelligence tools. … Read more 3 Top Business Intelligence Tools Compared: Tableau, PowerBI, and Sisense

The Clinical Applications of NLP: Workshop at EMNLP 2020

1. Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures [Paper] This paper focuses on text summarization in the context of medical dialogue. The idea is that when a patient has a conversation with a doctor, you want to be able to automatically summarize what transpired in the conversation. So for example, if … Read more The Clinical Applications of NLP: Workshop at EMNLP 2020

Is this the end for Convolutional Neural Networks?

From handcrafted feature detectors to vision transformers, we’ve come a long way… Photo by Agence Olloweb on Unsplash For almost a decade, convolutional neural networks have dominated computer vision research all around the globe. However, a new method is being proposed which harnesses the power of transformers to make sense out of images. Transformers were … Read more Is this the end for Convolutional Neural Networks?

A Quick Introduction to Time Series Analysis

Preliminary Details required for Forecasting. Nathan Dumlao via Unsplash In my first article on Time Series, I hope to introduce the basic ideas and definitions required to understand basic Time Series analysis. We will start with the essential and key mathematical definitions, which are required to implement more advanced models. The information will be introduced … Read more A Quick Introduction to Time Series Analysis

10 Neat Python Tricks and Tips Beginners Should Know

Tricks to Become a Unique Beginner Photo by Cristian Lopez on Unsplash Python is a powerful general-purpose programming language. It is used in web development, data science, creating software prototypes, and so on. Fortunately for beginners, Python has simple easy-to-use syntax. This makes Python an excellent language to learn to program for beginners. Python is … Read more 10 Neat Python Tricks and Tips Beginners Should Know

Amazon Neptune releases graph notebook as an open-source project

The open-source graph notebook provides users the flexibility to run their queries and visualization from local desktops, EC2, or EMR in addition to using the Neptune Workbench on SageMaker. It is easily installed via the Python Package Installer (PIP). You can connect to graph databases that provide an endpoint that implements an Apache TinkerPop Gremlin … Read more Amazon Neptune releases graph notebook as an open-source project

I’m not a data scientist but made a COVID mask detector with Google AutoML and React — Doctor Masky

Using Google AutoML and React, I was able to set up a client-side object detection app, without any custom model code. Image by author Neural-network-based object detection is a powerful technique that’s getting easier and easier to take advantage of. With Google’s Cloud AutoML computer vision service (as well as similar services like Microsoft’s Custom … Read more I’m not a data scientist but made a COVID mask detector with Google AutoML and React — Doctor Masky

Getting Started with Pytorch: How to Train a Deep Learning Model With Pytorch

Image classification with Pytorch using a Convolution Neural Network Photo by Alina Grubnyak on Unsplash Exploring the deep world of machine learning and artificial intelligence, today I will introduce my fellow AI enthusiasts to Pytorch. Primarily developed by Facebook’s AI Research Lab, Pytorch is an open-source machine learning library that aids in the production deployment … Read more Getting Started with Pytorch: How to Train a Deep Learning Model With Pytorch

Aesthetics Within the Computation World- Part 1: Evolutionary Games

The experimental setup in play will be built on top of the prisoner dilemma. The Prisoner’s dilemma The Prisoner’s dilemma has received a lot of attention since the being of the concept of modeling of Game of Life. This is partially due to its simplicity as a model but also due to the paradox it … Read more Aesthetics Within the Computation World- Part 1: Evolutionary Games

How to Navigate Analytics Job Search During COVID-19

This analysis is a part of our project in the Summer Data Competition 2020 hosted by Fuqua School of Business. I want to send my special thank to my teammates: Yaqiong (Juno) Cao and Xinying (Silvia) Sun, for their great contribution. Photo by Annie Spratt on Unsplash Problem Definition Are you an analytics master student … Read more How to Navigate Analytics Job Search During COVID-19

PyTorch Lightning: Making your Training Phase Cleaner and Easier

The PyTorch Lightning project was started in 2016 by William Falcon when he was completing his PhD at NYU [1]. Subsequently PyTorch Lightning was launched in March 2019 and made public in July of the same year, it is also in 2019 that PyTorch Lightning was adopted by the NeurIPS Reproducibility Challenge as the standard … Read more PyTorch Lightning: Making your Training Phase Cleaner and Easier

Tiny ML and the future of on-device AI

Jeremie (00:00):Hey everyone, Jeremie here. I’m the host of the Towards Data Science podcast and I’m also on the team over at the SharperScience mentorship program, and today I am really excited, because we’re talking to Matthew Stewart, who is a PhD student at Harvard. He’s working on a series of different problems in the … Read more Tiny ML and the future of on-device AI

Innovating with What You Have

Building Our Price Optimization Engine, Piece-by-Piece Photo by Xavi Cabrera on Unsplash It’s an exciting time at Best Buy Canada: the holidays are fast approaching and we’re about to enter busy season. That means stress testing the website, finding Black Friday deals, and scouring warehouses for unsold PS5s. Meanwhile, the Digital Intelligence team is awaiting … Read more Innovating with What You Have

Fine-tuning a BERT model for search applications

How to ensure training and serving encoding compatibility There are cases where the inputs to your Transformer model are pairs of sentences, but you want to process each sentence of the pair at different times due to your application’s nature. Search applications are one example. Photo by Alice Dietrich on Unsplash The search use case … Read more Fine-tuning a BERT model for search applications

Managed Backup Retention for AWS CloudHSM

With today’s launch of Managed Backup Retention, you can now configure the retention period for CloudHSM backups. Expired backups are automatically purged for you, so you no longer have to build and maintain automation to delete old backups. With managed backup retention, you can change the cluster retention period at any time. You can also … Read more Managed Backup Retention for AWS CloudHSM

lmDiallel: a new R package to fit diallel models. The Hayman’s model (type 1)

In a previous post we have presented our new ‘lmDiallel’ package (see this link here and see also the original paper in Theoretical and Applied Genetics). It provides several extensions to the lm() function in R, to fit a class of linear models of interest for plant breeders or geneticists, the so-called diallel models. For … Read more lmDiallel: a new R package to fit diallel models. The Hayman’s model (type 1)

Brain Data & Business models: How Brain-computer interface combined with AI will fuel a lucrative…

These same large tech firms will also mostly benefit from new forms of BCIs. Indeed, some companies can already make a brain-computer interface in the form of headphones (6). Other forms of BCI will help bring the benefits of BCIs to the masses in a form factor that people wear on a regular basis. That … Read more Brain Data & Business models: How Brain-computer interface combined with AI will fuel a lucrative…

How To Benchmark Any Models’ Inference Statistics For Production

Find pyinfer on Github : https://github.com/cdpierse/pyinfer Docs can be found here: https://pyinfer.readthedocs.io/en/latest/ When developing machine learning models initial efforts are often put on measuring metrics that reflect how well a model performs for a given task. This step is of course crucially important but when moving a model to production other factors come into play … Read more How To Benchmark Any Models’ Inference Statistics For Production

Wayfair delights suppliers and customers with help from Google CloudWayfair delights suppliers and customers with help from Google CloudAssociate Director, Analytics Infrastructure, Wayfair

At Wayfair, we use data to advance our business processes and help our suppliers work more efficiently, all with the end goal of delivering great customer experiences. As one of the world’s largest online destinations for the home, our massive scale allows us to use data to delight our customers and help our thousands of … Read more Wayfair delights suppliers and customers with help from Google CloudWayfair delights suppliers and customers with help from Google CloudAssociate Director, Analytics Infrastructure, Wayfair

Practical Probability Theory: All About a Single Random Variable

6.1 Definition The cumulative distribution function (CDF) is another way to describe how a random variable’s possible values are distributed. It is defined as the probability that X will take a value less than or equal to x: Equipped with this new concept, we can now express the probability P(a ≤ X ≤ b) in … Read more Practical Probability Theory: All About a Single Random Variable

Fighting Churn with Data: An Interview with Zuora Chief Data Scientist Carl Gold

And why data reliability is top of mind for the Subscription Economy Image courtesy of Austin Distel on Unsplash. We sat down with Carl Gold, Chief Data Scientist at subscription software company Zuora, to learn more about his new book, Fighting Churn with Data. The ability to attract and retain customers is fundamental to the … Read more Fighting Churn with Data: An Interview with Zuora Chief Data Scientist Carl Gold

Attention and Transformer Models

Say we want to calculate self-attention for the word “fluffy” in the sequence “fluffy pancakes”. First, we take the input vector x1 (representing “fluffy”) and multiply it with three different weight matrices Wq, Wk and Wv in order to get three different vectors: q1, k1 and v1. The exact same is done for the input … Read more Attention and Transformer Models

Machine Learning Case Study: Telco Customer Churn Prediction

Photo by Fezbot2000 on Unsplash For Telco companies it is key to attract new customers and at the same time avoid contract terminations (=churn) to grow their revenue generating base. Looking at churn, different reasons trigger customers to terminate their contracts, for example better price offers, more interesting packages, bad service experiences or change of … Read more Machine Learning Case Study: Telco Customer Churn Prediction

LondonR Talks – Computer Vision Classification – Turning a Kaggle example into a clinical decision making tool

I had the pleasure of speaking at the last LondonR event of 2020. What a strange year it has been? But this put the icing on the cake. The premise The premise of my talk was to take a novel Kaggle parasite cell dataset and advocate how this type of classification task could be transported … Read more LondonR Talks – Computer Vision Classification – Turning a Kaggle example into a clinical decision making tool

Battling label distribution shift in a dynamic world

Maximum likelihood with appropriate calibration goes a long way. by Amr M. Alexandari & Avanti Shrikumar U.S. COVID19 Cases (data downloaded from the NYTimes). Image by the authors. In this tutorial, we will see how we can use a combination of model calibration and a simple iterative procedure to make our model predictions robust to … Read more Battling label distribution shift in a dynamic world

Learning data visualization differently

As I just said, visualization of a SINGLE variable tells us how it is distributed with respect to the central tendency and dispersion (e.g. mean, median, quartiles etc.). We’ll pick the variable total_bill and see its distribution using different visualization techniques. In a way, all these techniques present similar kinds of information, with just some … Read more Learning data visualization differently

What is the number one gap I see across data scientists?

Your communication skills are both a product and a tool Photo by Kristina Flour on Unsplash As a hiring manager, there are some essential skills and characteristics that we look for when hiring a data scientist, such as strong knowledge in Python, SQL, R studio, and skills in research, machine learning, and statistics. But the … Read more What is the number one gap I see across data scientists?

Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

Multi-class Classification Problem: Geoscience example (Facies) In this part, we will elaborate on more model evaluation metrics specifically for multi-class classification problems. Learning curves will be discussed as a tool to come up with an idea of how to trade-off between bias and variance in the model parameter selection. ROC curves for all classes in … Read more Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

Drop Duplicates in Pandas | Dean McGrath

Learn how to drop duplicates from a Pandas DataFrame to improve your data quality Photo by Samantha Lam on Unsplash Dropping duplicates from your data sets is a task you will regularly have to do as a Data Analyst. Whilst in some cases, duplicates may be valid frequently, they have been created through lax data … Read more Drop Duplicates in Pandas | Dean McGrath