Red Wine Quality Prediction Using Regression Modeling and Machine Learning

This is my personal project, a part of the Data Science course at Duke University, Fuqua School of Business, MQM Business Analytics Program. Photo by Kym Ellis on Unsplash The red wine industry shows a recent exponential growth as social drinking is on the rise. Nowadays, industry players are using product quality certifications to promote … Read more Red Wine Quality Prediction Using Regression Modeling and Machine Learning

From the Central Limit Theorem to the Z- and t-distributions

Photo by Lukas on Pexels After running several statistical tests to assess my models, I decided to dig deeper into the theory and ask myself questions such as why the number of samples is relevant for the statistical test, why the standard deviation has a square root in the denominator, or why statisticians differentiate between … Read more From the Central Limit Theorem to the Z- and t-distributions

5 Ways To Become A Better Machine Learning Practitioner

Photo by Aleks Dorohovich on Unsplash This section is closely related to the points made in section 1. I’m not talking about reading news articles on the banning of facial recognition software or reading casual tech blogs. To keep up to date with the ‘real’ developments and progress occurring within the AI field, you have … Read more 5 Ways To Become A Better Machine Learning Practitioner

3 neural network architectures you need to know for NLP!

In this article, I will discuss what I think are the three most important architectures to be aware of for NLP. Recurrent Neural Network (RNN). Image from Wikipedia under CC BY-SA 4.0 License. Recurrent neural networks are special architectures that take into account temporal information. The hidden state of an RNN at time t takes … Read more 3 neural network architectures you need to know for NLP!

Attend the Create:Data free online event, December 7

The Microsoft Create: series is back again, now with Create: Data!  Join us for a half-day of conversations at Microsoft Create: Data and connect with the experts and community to learn and discuss everything data – from the upcoming trends to best practices and data for good.  Join our virtual event Create: Data  Date: 7 December 2020  Time: 8:00AM – 11:10AM PDT / 4:00PM – 7:10PM GMT … Read more Attend the Create:Data free online event, December 7

HireAttorney — bringing a more focused market to defense attorneys

Identify defendants who are more likely to hire a private attorney using machine learning. Image by Author I recently consulted for a law firm to identify defendants who are more likely to use their services. Previously, they tried to send emails to all defendants, but the response rate was close to zero. They also tried … Read more HireAttorney — bringing a more focused market to defense attorneys

Counterfactual vs Contrastive Explanations in Artificial Intelligence

Counterfactual vs Contrastive Explanations: As defined in (Counterfactual explanations without opening the black box: Automated decisions and the GDPR [17]) counterfactual explanations have little difference from contrastive explanations as defined in [4]. Both look for minimal changes, although the latter looks for a more constrained change (additions), to the input for the decision of the … Read more Counterfactual vs Contrastive Explanations in Artificial Intelligence

Automate your job search with Python and Github Actions

A real-life example using Scrapy and Github Actions Photo by Marten Newhall on Unsplash Job hunting is a time-consuming task. A lot of different sites for job searches exist, but there is not a “one size fits all”. Job openings are available in job aggregators, LinkedIn, career pages of individual companies, even as tweets or … Read more Automate your job search with Python and Github Actions

Why data scientists and business executives struggle to work together

How to bridge the gap between business executives and data scientists to overcome challenges and move towards data- and business-literate professionals Photo by Vladimir Proskurovskiy on Unsplash When you think about data scientists*, most likely you are imagining someone working in a tech company that has data solutions as their core, like software providers. Perhaps, … Read more Why data scientists and business executives struggle to work together

Analysis of US Elections 2020 with Pandas

Number of counties won by each candidate Let’s check how many counties won by each candidate. I will show you two different ways to accomplish this task. The first way is to use the groupby function and count the number of “True” values for each candidate. winner = elections[[‘candidate’,’won’,’state’]]\.groupby([‘candidate’,’won’], as_index=False).count()winner = winner[winner.won == True]winner.rename(columns={‘state’:’won_county’}, inplace=True)winner … Read more Analysis of US Elections 2020 with Pandas

Here’s Why You Should Learn Docker as a Data Scientist

You’ll be surprised by how easy it is. To use Docker you’ll need to install it. Download Docker Desktop from this link, install it and open up the application. Now create the following project structure anywhere on your computer: Image 1 — Directory structure for your Python app (image by author) Let’s start with what … Read more Here’s Why You Should Learn Docker as a Data Scientist

Warspeed 5 — priors and models continued

My last 4 posts have all focused on thevaccines being produced to fight COVID-19. They have primarily focused onBayesian methods (or at least comparing bayesian to frequentist methods). Thisone follows that pattern and provides expanded coverage of the concept ofpriors in bayesian thinking, how to operationalize them, and additionalcoverage of how to compare bayesian regression … Read more Warspeed 5 — priors and models continued

Serverless load balancing with Terraform: The hard wayServerless load balancing with Terraform: The hard waySenior Developer Advocate

Earlier this year, we announced Cloud Load Balancer support for Cloud Run. You might wonder, aren’t Cloud Run services already load-balanced? Yes, each *.run.app endpoint load balances traffic between an autoscaling set of containers. However, with the Cloud Balancing integration for serverless platforms, you can now fine tune lower levels of your networking stack. In … Read more Serverless load balancing with Terraform: The hard wayServerless load balancing with Terraform: The hard waySenior Developer Advocate

Paving the Way to Google!

Want to get hired? Work on your technical skills! Want a promotion? Learn institutional knowledge! Domain Knowledge & Institutional Knowledge: When my past company’s CEO encouraged me to work on Declined Transactions as my first project after joining the team, I was disappointed! Right out of academia, I was hoping for a complex project like … Read more Paving the Way to Google!

the riddle(r) of the certain winner losing in the end

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Considering a binary random walk, starting at zero, what is … Read more the riddle(r) of the certain winner losing in the end

Amazon CloudWatch Application Insights adds Automatic Application Discovery

The new discovery feature looks for identifying factors of your application or database and then applies the associated application tier automatically to set up the correct metrics, telemetry, logs and alerts. While you can still configure and change the selections, if you’re satisfied with the recommendation, it just takes a few confirmation clicks to complete … Read more Amazon CloudWatch Application Insights adds Automatic Application Discovery

Bandits, WebAssembly, and IoT

A part of the Bootstrap Thompson Sampling algorithm (image by author). An uncommon combination allows efficient sequential learning on the edge The multi-armed bandit (MAB) problem is a relatively simple (to describe that is, not to solve) formalization of a sequential learning problem that has many applications. In its canonical form, a gambler faces a … Read more Bandits, WebAssembly, and IoT

Testing Streamlit Apps Using SeleniumBase

In the time I’ve worked at Streamlit, I’ve seen hundreds of impressive data apps ranging from computer vision applications to public health tracking of COVID-19 and even simple children’s games. I believe the growing popularity of Streamlit comes from the fast, iterative workflows through the Streamlit “magic” functionality and auto-reloading the front-end upon saving your … Read more Testing Streamlit Apps Using SeleniumBase

Installing Jupyter Notebook for Different Environments in Windows 10

A virtual environment is an isolated region where a particular version of Python and its packages are installed enabling the installation of different versions of Python. Each environment has its own files, directories, and paths. Thus a single system can cater to different projects that demand different Python versions. For Python installation and its basics, … Read more Installing Jupyter Notebook for Different Environments in Windows 10

Basic Multipage Routing Tutorial for Shiny Apps: shiny.router

Basic Routing for Shiny Web Applications Web applications couldn’t exist without routing. Think about it in terms of the Appsilon website – you’ve visited our home page, navigated to the blog, and opened this particular article. That’s routing in a nutshell – matching UI components to a URL. Appsilon released the open source shiny.router package … Read more Basic Multipage Routing Tutorial for Shiny Apps: shiny.router

Reverse Engineering AstraZeneca’s Vaccine Trial Press Release

In their press release AstraZeneca provide the following information about an interim analysis of their vaccine trial: One dosing regimen (first a half dose and at least a month later a full dose) with 2741 participants showed 90% efficacy Another dosing regimen (two full doses at least one month apart) with 8896 participants showed 62% … Read more Reverse Engineering AstraZeneca’s Vaccine Trial Press Release

Visualizing geospatial data in R—Part 1: Finding, loading, and cleaning data

Introduction This is part 1 of a 4 part series on how to build maps using R. How to load geospatial data into your workspace and prepare it for visualization How to make static maps using ggplot2 How to make interactive maps (pan, zoom, click) using leaflet How to add interactive maps to a Shiny … Read more Visualizing geospatial data in R—Part 1: Finding, loading, and cleaning data

Truth-Seeking in the Post-Truth Era: Tutorial at EMNLP 2020

AI for Assistance in Fact-Checking. Now that we have a basic understanding of the problem of fake news, how can we tackle it? Of course, there is always the path of manual fact-checking, such as when news groups fact-check presidential candidates during a debate. And completely opposite of this is the path of automatic fact-checking, … Read more Truth-Seeking in the Post-Truth Era: Tutorial at EMNLP 2020

Do You Need a Master’s Degree in Data Science?

There appears to be a wide variety of answers to whether you need an advanced degree to get a data science job. Here, I review and explore the educational level, as discussed by Robert Half, Burtch Works, and Indeed. As you begin your data science career, Robert Half and Indeed agree that you should have … Read more Do You Need a Master’s Degree in Data Science?

Evaluating linear relationships

How to use scatterplots, correlation coefficients, and linear regression effectively Photo by Magda Ehlers from Pexels One of the most common analyses conducted by data scientists is the evaluation of linear relationships between numeric variables. These relationships can be visualized using scatterplots, and this step should be taken regardless of any further analyses that are … Read more Evaluating linear relationships

How to Collect Live Feed and Frequently Updated Data Using Cron

Cron allows you to schedule repeat tasks, making it a great tool to run data collection scripts Photo by Nick Chong on Unsplash A major concern when collecting time series data is ensuring that all data is collected at equal time intervals. Without equal time intervals, you will be unable to use most methods for … Read more How to Collect Live Feed and Frequently Updated Data Using Cron

How to Code Ridge Regression from Scratch

Ridge Regression, like its sibling, Lasso Regression, is a way to “regularize” a linear model. In this context, regularization can be taken as a synonym for preferring a simpler model by penalizing larger coefficients. We can achieve this concretely by adding a measure of the size of our coefficients to our cost function, so that … Read more How to Code Ridge Regression from Scratch

PokéGSQL: Create your First GSQL Algorithms with a Pokémon Dataset!

How to Write Basic GSQL Commands (Note: This is Part 3 of a series. Check out the past blog to load the data into your database!) Now that you have your data loaded, the next big step will be catching them all and extracting relevant information from your graph using queries. This blog will be … Read more PokéGSQL: Create your First GSQL Algorithms with a Pokémon Dataset!

Pokémon Lab Part II: Adding More Data

Adding to the Schema and Adding More Data (Note: This is a Part II of a series. Please check out Part I here: https://towardsdatascience.com/using-api-data-with-tigergraph-ef98cc9293d3) In the last blog, we learned how to upsert API data into a graph for a basic schema. Now, we’re going to go deeper, making a more complex schema and adding … Read more Pokémon Lab Part II: Adding More Data

Achieving 100 percent renewable energy with 24/7 monitoring in Microsoft Sweden

Earlier this year, we made a commitment to shift to 100 percent renewable energy supply in our buildings and datacenters by 2025. On this journey, we recognize that how we track our progress is just as important as how we get there. Today, we are announcing that Microsoft will be the first hyperscale cloud provider … Read more Achieving 100 percent renewable energy with 24/7 monitoring in Microsoft Sweden

Creating Interactive Data Tables in Plotly Dash

How to add click actions and live updates to the Dash DataTable Photo by Markus Spiske on Unsplash Plotly Dash is an incredibly powerful framework that allows you to create fully-functional data visualization dashboards. Using Dash, you can create a full front-end experience using only Python. The library does a great job of abstracting away … Read more Creating Interactive Data Tables in Plotly Dash

xkcd Comics as a Minimal Example for Calling APIs, Downloading Files and Displaying PNG Images with R

xkcd webcomics is one of the institutions of the internet, especially for the nerd community. If you want to learn how to fetch JSON data from a REST API, download a file from the internet and display a PNG file in a ultra-simple example, read on! Many services on the internet provide a web service … Read more xkcd Comics as a Minimal Example for Calling APIs, Downloading Files and Displaying PNG Images with R

To peek or not to peek after 32 cases? Exploring that question in Biontech/Pfizer’s vaccine trial

This is my 4th post about Biontech/Pfizer’s Covid-19 vaccine trial. If you are missing background, please first look at my 2nd post and 3rd post. Biontech/Pfizer originally planed to analyze the vaccine efficacy at 5 stages: after 32, 62, 92, 120 and finally 164 Covid-19 cases have been observed among the 43538 study participants that … Read more To peek or not to peek after 32 cases? Exploring that question in Biontech/Pfizer’s vaccine trial

A/B testing my resume

[This article was first published on R – David’s blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Internet wisdom is divided on whether one-page resumes are more … Read more A/B testing my resume

Create “Interactive Globe + Earthquake Plot in Python

Here we create an interactive Globe like Google Earth using the topography data with an amazing visualization tool, Plotly. We also plot a global earthquake distribution on this interactive Globe. Image by Author Enjoy in here! Through creating this interactive plot, you can get the following ideas.– Deeper insights and more realistic application example to … Read more Create “Interactive Globe + Earthquake Plot in Python

Linear regression made easy. How does it work and how to use it in Python?

All you need to know about building a Machine Learning model using the linear regression algorithm Multiple linear regression model. Graph by author. Machine Learning is making huge leaps forward, with an increasing number of algorithms available so we can solve complex real-world problems. This story is part of a deep dive series explaining the … Read more Linear regression made easy. How does it work and how to use it in Python?

The Top 5 Data Science Questions I Get Asked

Introduction Why did you choose Data Science? What is your everyday work like? Does Data Science get easier? Where do you see Data Science in 5 years? Will you ever leave Data Science? Summary References Data Science has become an incredibly popular field over the past few years. When I started applying to Masters’s programs … Read more The Top 5 Data Science Questions I Get Asked

How to approach AutoML as a data scientist

It doesn’t replace your job, it only makes it a little easier. Photo by Possessed Photography on Unsplash In the past five years, one trend that has made AI more accessible and acted as the driving force behind several companies is automated machine learning (AutoML). Many companies such as H2O.ai, DataRobot, Google, and SparkCognition have … Read more How to approach AutoML as a data scientist

Deploying an R Shiny app on Heroku free tier

Continuous Deployment made easy with GitHub Actions and Heroku This article is a short guide on deploying a Shiny app on Heroku. Familiarity with Docker, Shiny and GitHub is presumed. For an introduction to Docker, see Deploying a Shiny Flexdashboard with Docker. This article was also published on https://www.r-bloggers.com/. This article will show you What Heroku is and … Read more Deploying an R Shiny app on Heroku free tier

How To Create Differentially Private Synthetic Data

A practical guide to creating differentially private, synthetic data with Python and TensorFlow In this post, we’ll train a synthetic data model on the popular Netflix Prize dataset, using a mathematical technique called differential privacy to protect the identities of anonymized users in the dataset from being discovered via known privacy attacks such as re-identification … Read more How To Create Differentially Private Synthetic Data

Closing the gap: Migration completeness when using Database Migration ServiceClosing the gap: Migration completeness when using Database Migration ServiceCloud Developer Advocate

Database Migration Service (DMS) provides high-fidelity, minimal downtime migrations for MySQL (Preview) and PostgreSQL (available in Preview by request) workloads to Cloud SQL. Since DMS is serverless, you don’t have to worry about provisioning, managing, or monitoring any migration-specific resources.  In this post, we’ll focus on what is and is not included in database migration … Read more Closing the gap: Migration completeness when using Database Migration ServiceClosing the gap: Migration completeness when using Database Migration ServiceCloud Developer Advocate

Data Mesh 2.0

Photo by Amy-Leigh Barnard on Unsplash The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh! Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start … Read more Data Mesh 2.0

Forecasting Time Series ARIMA Models (10 Must-Know Tidyverse Functions #5)

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This article is part of a R-Tips Weekly, a weekly video tutorial that … Read more Forecasting Time Series ARIMA Models (10 Must-Know Tidyverse Functions #5)

Model Compression via Pruning

Pruning Neural Networks To obtain fast and accurate inference on edge devices, a model has to be optimized for real-time inference. Fine-tuned state-of-the-art models like VGG16/19, ResNet50 have 138+ million and 23+ million parameters respectively and inference is often expensive on resource-constrained devices. Previously I’ve talked about one model compression technique called “Knowledge Distillation” using … Read more Model Compression via Pruning