Binary Neural Networks — Future of low-cost neural networks?

Can binary neural networks replace full precision networks? Photo by Alexander Sinn on Unsplash Every year, deeper models have been developed to perform various tasks such as object detection, image segmentation, etc. which consistently manage to beat state-of-the-art models. But, there has been increasing focus on making models lighter and more efficient such that they … Read more

Top 5 Data Science Job Roles for 2021

The first thought about machine learning engineers is that they should have a heavy background in both data science and software engineering. Nonetheless, those two backgrounds are not enough for being the exact fit for this role. One thing is to know the theory behind machine learning models and being capable of writing the cleanest … Read more

Warning systems on data warehouse

For the past couple of years, myself and a bunch of others at Shopify were looking for a smart way to set warnings on specific tables. Why warnings? The reason is quite simple, we spend a lot of time building quality front room datasets, as I explain in this previous blog. That being said, those … Read more

Filtering Tweets by Location

Account Location metadata + Regular Expressions == Tweet Location Filtering Photo by Nathan Dumlao on Unsplash In my latest project, I explored the question, “What is the public sentiment in the United States on K-12 learning during the COVID-19 pandemic?”. Using data collected from Twitter, Natural Language Processing, and Supervised Machine Learning, I created a … Read more

Find and remove duplicate images in your dataset

Improve your deep learning image datasets by automatically detecting duplicate and near-duplicate images and removing them Duplicate images in CIFAR-100 visualized in FiftyOne (Image by author) State-of-the-art deep learning models are often trained on datasets with millions of images. Collecting your own datasets of that size is a difficult enough task in and of itself. … Read more

Topic Modeling using LDA

Quickly get the gist on over 30,000 Tweets! Photo by Jan Antonin Kolar on Unsplash In my latest project, I explored the question, “What is the public sentiment in the United States on K-12 learning during the COVID-19 pandemic?”. Using data collected from Twitter, Natural Language Processing, and Supervised Machine Learning, I created a text … Read more

Amazon GuardDuty introduces machine learning domain reputation model to expand threat detection and improve accuracy

Amazon GuardDuty introduces a new machine learning domain reputation model that can categorize previously unseen domains as highly likely to be malicious or benign based on their behavioral characteristics. GuardDuty uses this new capability to alert customers when an EC2 instance in their AWS environment is communicating with a domain identified as malicious and to … Read more

Categories AWS ExcerptFavorite

Amazon SNS now supports 1-minute CloudWatch metrics

Amazon SNS is a fully managed publish/subscribe (pub/sub) messaging service for both application-to-application (A2A) and application-to-person (A2P) communications. Amazon CloudWatch is a monitoring and observability service that provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health.   … Read more

Categories AWS ExcerptFavorite

BRIDGERTON: An analysis of Netflix’s most-streamed TV series

Analysis of over 300,000 tweets on the Bridgerton TV series using NLP techniques in Python & Tableau Photo by Charles Deluvio on Unsplash After its release on Christmas Day, the Bridgerton drama series dominated most TVs across the world. Virtually all movie lovers were talking about it, that I was “coerced” into watching the series … Read more

Four Things You Should Know in Python

Step (or Stride) Value So far we’ve only specified start and/or stop values, where we start at the start value, and end right before the stop value (since it is exclusive). But what if we don’t want all the elements between those two points? What if we want every other element? That’s where the step … Read more

Sorry, Data Lakes Are Not “Legacy”

Image by Thomas Spicer Why “data lakes are dead” talk contributes little modern data architecture conversations Last year in a post about data lakes, we covered various “FUD” around lake architecture, strategy, and analytics. Fast forward a year; it seems that data lakes are now definitively considered a “legacy” data architecture. During the Modern Data … Read more

Amazon Elastic File System triples read throughput

Amazon Elastic File System (Amazon EFS) now allows you to drive up to 3x higher read throughput on your file system. For example, bursting mode file systems now provide 300MB/s of bursting read throughput, or 300MB/s per TiB of data stored in Amazon EFS standard, whichever is higher. If you have configured 1 GB/s of … Read more

Categories AWS ExcerptFavorite

Pandoc filters in Bookdown

[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. tldr: I wrote a Pandoc Lua filter to correctly format my Bookdown/Rmarkdown documents … Read more

Categories R Tags ExcerptFavorite

Linear Regression Explained (in R)

An explanation of residuals, sum of squared residuals, simple linear regression, and multiple linear regression with code in R Linear Regression is one of the first concepts we learn in data science and machine learning. Yet, many are confused by linear regression and the common terminology associated with it. In this article, we explore linear … Read more

Why Babies Can Not Manage Crocodiles

A Cautionary Tale Told With The Wolfram Language Photo by vaun0815 on Unsplash Few parents would be comfortable with putting their baby in a room with a crocodile. They understand that babies and crocodiles have very different opinions of one another. A baby is trusting and seeks to explore and make new friends. A crocodile … Read more

An Introduction of Expected Goals

Using event data to visualize trends and probabilities associated with shots Image by Author Here, I will introduce the concept of expected goals (xG) and conduct an exploration of event data. This will represent the first part of a three part series on expected goals. Part II will be centered around constructing a machine-learning model … Read more

Using Deep Learning to Forecast a Wind Turbines Power Output

(7679, 336, 6)(7679,)(408, 336, 6)(408,) Walk-forward validation logic To make the most accurate forecast at a time (t), a model would need the latest time step (t-1). A walk-forward validation method follows this approach. The steps behind this method are as follows: The first step is to build an LSTM model with the training data. … Read more

A new way to think about modeling for uncertain times

Modeling for uncertain times: Approaches, behaviors, and outcomes Photo by Viktor Forgacs on Unsplash The spread of the pandemic or COVID-19 — first in China and South Korea and then in Europe and the United States — was swift and caught most governments, companies, and citizens off-guard. This global health crisis developed into an economic … Read more

A Guide to Python Environment, Dependency and Package Management: Conda + Poetry

PYTHON How to add packages to your environment files automatically without ever worrying about the dependencies *All images used are by the author except where indicated otherwise. If you work on multiple Python projects at different development stages, you probably have different environments on your system. There are various tools for creating an isolated environment … Read more

Unpivot a Pandas DataFrame | Dean McGrath

How to massage a Pandas DataFrame into the shape you need Photo by Todd Quackenbush on Unsplash Last article we shared an embarrassing moment which encouraged us to learn and use Pandas to pivot a DataFrame. Today we are going to look at Pandas built-it .melt() function to reverse our pivoted data. The .melt() function … Read more

The discovery of wine’s structural form

Hello there. Today I will present a guided tutorial for applying Kemp & Tenembaum’s brilliant “form discovery” algorithm to a wine dataset. Ultimately, this provides a data-driven map to choose wines from, based on our tastes. If you are, like me, fond of data science, machine learning, cognition and/or a wine lover, then you might … Read more

How do I extract Nested Data in Python?

Tutorial Demystifying Python JSON, Dictionaries and Lists JSON: Dictionary and List Data Structures (Types), Image by Author. “Life is a like an onion, you peel it off one layer at time, and sometimes you weep” ― Carl Sandberg I suppose the same could be said of extracting values from nested JSON structures. Even the most … Read more

Upcoming Workshops Series March 2021

[This article was first published on Mirai Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Finally new dates! Take a visit to our upcoming workshops offer on … Read more

Categories R Tags ExcerptFavorite

AWS Glue DataBrew is now available in six additional AWS Regions

AWS Glue DataBrew, a visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data for analytics and machine learning, is now available in the following six additional AWS Regions: US West (N. California) Asia Pacific (Singapore) Asia Pacific (Mumbai) EU (Stockholm) EU (London) EU (Paris) AWS … Read more

Categories AWS ExcerptFavorite

An Open Letter to Data Science Community

Before you start reading this article, I want you to drop down your guard. We are all aware of the power of artificial intelligence. I have studied and worked in this field for the past 15 years. So, I obviously do not want to devalue AI in this letter; however, I want to simply advocate … Read more

Importing CSV Files in Neo4j

A comparison of two different methods designed for either simplicity or speed Figure by Martin Grandjean, Martin Grandjean, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons. I have not altered this image. Graph-enabled data science and machine learning has become a very hot topic in a variety of fields lately, ranging from fraud detection to knowledge … Read more

Sentence2MCQ using BERT Word Sense Disambiguation and T5 Transformer

A practical AI project using HuggingFace transformers and Gradio App Image by Author Icon from Flaticon Imagine a middle school English teacher preparing a reading comprehension quiz for the next day’s class. Instead of giving an outdated assessment, the teacher can quickly generate some assessments (MCQs) based on the trending news articles from that day. … Read more

Best practices in design conversational AI in integration with business software

Image by Author In this article, I will try to describe designing the conversational AI in integration with back-end software. Real example: Create a bot for adding time activities by employees. With time activity, you can specify the date, time spent on some task, assign it to a project or customer, and in addition to … Read more

Risk, Relative Risk and Odds

Statistics Understanding relative risk and odds ratio According to the definition of risk in Wikipedia; the risk is the possibility of something bad happening. Therefore, when using the term risk, the number of negative cases should be considered. In other words, the risk depends on how much more negative cases are than positive ones. Okay, … Read more

How to embed interactive graphs on Medium

Learn how to use Datawrapper to embed interactive graphs on Medium posts Photo by Carlos Muza on Unsplash As data scientists, we love a good graph. There are countless programming packages out there dedicated to data visualisation. Just in Python alone we have bokeh, seaborn, Plotly, Geoplotlib and many other libraries. Graphs are a great … Read more

media bias & shared news on Twitter

[This article was first published on Jason Timm, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction This post provides a brief description of methods for quantifying political … Read more

Categories R Tags ExcerptFavorite

Behind the magick: updates to imagemagick and beyond

It has been a while since we posted an update about magick, but behind the scenes we are constantly tweaking and improving this package, which has become a very mature and complete toolkit for image processing in R. Over the past year, we did 6 CRAN releases, containing many small features and fixes, but perhaps … Read more

Categories R Tags ExcerptFavorite

Dec 2020: “Top 40” New CRAN Packages

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One hundred twenty-three new packages made it to CRAN in December. Here … Read more

Categories R Tags ExcerptFavorite

Understanding Mixed Precision Training

Mixed precision for training neural networks can reduce training time and memory requirements without affecting model performance Photo by patricia serna on Unsplash As deep learning methodologies have developed, it has been generally agreed that increasing the size of a neural network improves performance. However, this is at the detriment of memory and compute requirements, … Read more

Bertelsman Arvato Financial Solution Customer Segmentation

https://www.freepik.com/vectors/blue – Blue vector created by vectorjuice — www.freepik.com Nowadays, with big data becomes reality, people now focus on how to use the data to realize commercial values. One area which is much more mature is how to picture the potential customer or predict the behavior of the customer, to target the market or customer … Read more

Everything You Need to Know To Automate OSX

This next section will be different for everyone since not all of us use the same applications. #Required Appbrew tap homebrew/caskbrew install mas#List your preferred applicationsbrew install –cask intellij-ideabrew install –cask google-chromebrew install –cask slackbrew install –cask spotifybrew install –cask spectaclebrew install –cask karabiner-elementsbrew install jqbrew install awsclibrew install terraformbrew install packerbrew install docker-composebrew install … Read more

treeshap — explain tree-based models with SHAP values

treeshap — explain tree-based models with SHAP values An introduction to the package This post is co-authored by Szymon Maksymiuk. For several months we have been working on an R package treeshap — a fast method to compute SHAP values for tree ensemble models. The package is not yet fully developed but it can already compute explanations for a range of … Read more

Categories R Tags ExcerptFavorite

Latest picks: Revisiting DCT Domain Deep Learning

We want to hear from you! Every week, we’re going to present a question from a TDS author, and we’d be thrilled to read your answers! Let’s chat, share information and learn from each other. You can find the first discussion here: Data Scientists, What is Your Current Tech Stack? by Terence Shin Favorite

When Google Analytics and Data Studio aren’t enough and it’s time to switch to Google BigQuery

Source:Unsplash Let’s figure out when it’s time to move away from standard Google Analytics and Google Data Studio solutions and think about choosing a data warehouse instead Business is increasingly moving online, and 2020 has shown that companies in many industries simply can’t survive without an online presence. Naturally, the more customers there are online, … Read more

Automate application lifecycle management with GitHub Actions

In 2021, each month we will be releasing a monthly blog covering the webinar of the month for the Low-code Application Development (LCAD) on Azure solution. LCAD on Azure is a new solution to demonstrate the robust development capabilities of integrating low-code Microsoft Power Apps and the Azure products you may be familiar with.  This … Read more

How To Actually Land a Data Science Job

A 5-step process to maximize your chances of success Photo by Christina @ wocintechchat.com on Unsplash The most common question I receive is how do I actually break into data science? So many people want to start a career in data science, but struggle to make that first step. And I won’t lie, it is … Read more