Essential Math for Data Science: Matrices as Linear Transformations

Understand and visualize matrices as space transformations As you saw in Essential Math for Data Science and Essential Math for Data Science, being able to manipulate vectors and matrices is critical to create machine learning and deep learning pipelines, for instance for reshaping your raw data before using it with machine learning libraries. The goal … Read more

Plotting Time Series in R (New Cyberpunk Theme)

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This article is part of a R-Tips Weekly, a weekly video tutorial that … Read more

Categories R Tags ExcerptFavorite

Amazon WorkDocs offers additional collaboration controls throughout its Android app

The Amazon WorkDocs Android application provides anytime, anywhere access to you and your team’s work documents. More specifically, it allows you to view, comment on, share, and download documents for which you have been given permissions. The Amazon WorkDocs Android application supports uploading content, offline access to files, content preview of over 50+ file types, … Read more

Categories AWS ExcerptFavorite

LSTMs Networks

Understanding Intuition mathematics Image Source In my last blog we discussed about shortcomings of RNN which had vanishing gradient problem, which results in not learning longer sequences, responsible for short term memory. LSTMs and GRUs are seen as solution to short term memories. Now let’s see the functioning of it to understand it. These have … Read more

Introduction to USgas Package

While the first dataset describes only the US consumption, the second and third describe total and residential consumption by state, respectively. Visualize the demand for natural gas The us_monthly dataset is a monthly series, representing the total demand (or consumption) of natural gas in the US since 2001: library(USgas) data(“us_monthly”) head(us_monthly) ## date y ## … Read more

Categories R Tags ExcerptFavorite

A year in review

This blog post just contains the links I mention in my video that you can watch here. I mention the following books, packages, and people in my video: Many others created and shared amazing content during the year, so sorry I could not mention everyone! Happy new year to all and thank you for the … Read more

Categories R Tags ExcerptFavorite

5 Practical Tips to be a Good Data Scientist

Photo by Karla Hernandez on Unsplash For me, this is the most important thing to do as a data scientist. Maybe most of you already know that data scientists are working closely with product/business teams to improve the performance metrics related to the product or business. That’s why, most of the requests are coming from … Read more

Profiling in Tensorflow 2.x for efficient custom training loops

Installing Tensorflow has become relatively simple over the years, for e.g. installation with a package manager like Anaconda is quite simple. But to use Tensorflow’s profiler, we need another package called CUPTI (CUDA Profiling Tools Interface) built by Nvidia and leveraged by Tensorflow. To install this, follow either of the steps (only on Linux systems) … Read more

Chernobyl’s Lessons for Data Scientists

Wrangling through Dataland Insights on decision-making amid uncertainty from the HBO-Sky TV series Image by Денис Резник from Pixabay Chernobyl (2019) is a mesmerizing drama of human incompetence, ingenuity and courage in the face of disaster. The show’s analytical examination of the swirling confusion and haphazard reponse in the aftermath of an unprecedented catastrophe also … Read more

The Tidyverse in a Table

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This was my submission to the 2020 RStudio Table Contest. For many … Read more

Categories R Tags ExcerptFavorite

Genetic Research with Computer Vision: A Case Study in Studying Seed Dormancy

Assisting genetic research with computer vision To improve the detection of genetic signatures in seeds correlated with their dormancy, we have trained computer vision models, which captured more than was previously understood about the mechanisms of dormancy. New to computer vision? Read our guide on getting started with fastai, ResNet, MobileNet, and more. Navigate to a … Read more

Categories R Tags ExcerptFavorite

What is Model Complexity? Compare Linear Regression to Decision Trees to Random Forests

A machine learning model is a system that learns the relationship between the input (independent) features and the target (dependent) feature of a dataset to be useful in making predictions in the future. To test the effectiveness of the model, a completely new similar dataset is introduced which only contains the input features, and the … Read more

Finding it difficult to learn programming? Here’s why.

You’ve tried and given up way too many times. Maybe you’re just not cut out for it. Photo by Christopher Gower on Unsplash You have spent countless hours doing YouTube tutorials, taking paid online courses, and reading introductory programming articles. Yet, it feels like there is a barrier you simply can’t break through. There are … Read more

Regular Expressions : Using the “re” Module to Extract Information From Strings

Differences between findall(), match(), and search() functions in Python’s built-in Regular Expression module. Photo by Abigail Lynn on Unsplash Regular Expressions, also known as Regex, comes in handy in a multitude of text processing scenarios. You can search for patterns of numbers, letters, punctuation, and even whitespace. Regex is fast and helps avoid unnecessary loops … Read more

Writing a custom data augmentation layer in Keras

Subclass Layer, and implement call() with TensorFlow functions Data augmentation can help an image ML model learn to handle variations of the image that are not in the training dataset. For example, it is likely that photographs provided to an ML model (especially if these are photographs by amateur photographers) will vary quite considerably in … Read more

Automating Sunday with Python, SQL, Jupyter Notebooks & Google Cloud Platform

Automating Sports Analytics with Data Science Table of Contents: — Automation — The Pipeline — Authentication — Extraction — Transformation — Presentation — Jupyter Notebook Link Automate everything until only the fun stuff is left. This is a quote that guides me in almost everything I do. I’m not sure if it is an original … Read more

Advent of 2020, Day 29 – Performance tuning for Apache Spark

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Series of Azure Databricks posts: Yesterday we looked into powershell automation … Read more

Categories R Tags ExcerptFavorite

A short tutorial on screen command

Photo by Pixabay from Pexels I came across the screen command earlier this year, when I was searching for a way to run my program persistently, even if my VM gets disconnected. I am not suggesting that you absolutely need it, but I strongly suggest learning it as you may require it at some point … Read more

Reinforcement Learning & Sushi Go!

Use Reinforcement Learning algorithm to solve the popular card drafting game Sushi Go! Reinforcement learning is an area of machine learning concerned with how an agent takes action based on its environment to maximize its long-term reward. Although the concept has been out for a while, its application has not been nearly as successful as … Read more

My Journey into Data Science

Photo by Guido Coppa on Unsplash In 2014, I graduated from the University of Pittsburgh with a Digital Media and Communications degree. I worked in social media marketing for a travel marketing agency and ended up getting let go a week after moving to Denver (after they assured me I’d be able to work remotely). … Read more

Training-serving skew

MLOps In Action Training-serving skew is one of the most common problems when deploying ML models. This post explains what it is and how to prevent it. When training a Machine Learning model, we always follow the same series of steps: Get data (usually from a database) Clean it (e.g. fix/discard corrupted observations) Generate features … Read more

R vs Python: Linear Regression

Demonstrating how to do Linear Regression in R and Python, along with discussing the differences between the two Photo from Unsplash from Christopher Gower There are dozens of articles out there that compare R vs. Python from a subjective, opinion-based perspective. Both Python and R are great options for data analysis, or any work in … Read more

A Collection of Advanced Data Visualization in Matplotlib and Seaborn

Make Your Storytelling More Interesting Python has a few data visualization library. Arguably matplotlib is the most popular and widely used library. I have several tutorial articles on matplotlib before. This article will focus on some advanced visualization techniques. These plots and charts will provide you with some extra tools to make your reports or … Read more

Building High Performing Data Science Teams

What has changed is that, as a discipline, data science has matured. Many companies have gone from proof-of-concept mode to running multiple productionised machine learning models in a short span of time, with varying degrees of success. A lot of the reason behind this success or failure is not having the most highly qualified data … Read more

Announcing new AWS Wavelength Zones in Denver and Seattle

AWS Wavelength and Verizon 5G Edge bring the power of the world’s leading cloud closer to mobile and connected devices at the edge of the Verizon 5G Ultra Wideband network. Wavelength embeds AWS compute and storage services at the edge of communications service providers’ 5G networks while providing seamless access to cloud services running in … Read more

Categories AWS ExcerptFavorite

How to Become Fluent in Multiple Programming Languages

Nearly every article titled “Which Programming Language Should I Learn First?” suggests that Python is the perfect first language for someone to learn. While I agree that Python is a good first language due to its simple syntax and flexibility, I believe that several programming fundamentals won’t be learned that will be necessary later on. … Read more

Fast functions with pipes

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Pipes are becoming popular R syntax, because they help make code more … Read more

Categories R Tags ExcerptFavorite

2020 recap, Gradient Boosting, Generalized Linear Models, AdaOpt with nnetsauce and mlsauce

A few highlights from 2020 in this blog include: The introduction of mlsauce’s AdaOpt and LSBoost The introduction of Generalized Linear Models (GLMs) in nnetsauce What are AdaOpt, LSBoost and nnetsauce’s GLMs? mlsauce’s AdaOpt is a probabilistic classifier based on a mix of multivariable optimization and a nearest neighbors algorithm. This document explains AdaOpt with … Read more

Categories R Tags ExcerptFavorite

Docly: Generate Comments Automatically for Python Code

Artificial Intelligence-based code documentation assistant Photo by Pakata Goh on Unsplash Artificial Intelligence is extensively used for developing models to help developers write better source code. Various AI models are used to accelerate the work of developers, such as code autocompletion, autosuggestions, unit test assistance, bug detection, code summarization, etc. Documentation of code is essential … Read more

The Pursuit of Lift

The S.P.O.T. Framework From working with companies and teams of various sizes and industries to operationalize ML, issues converging to four (4) key areas. The S.P.O.T. Framework helps teams spot (pun intended) the gaps in each of the four areas. 1 — Strategy (S). Do we have a clear strategy that Data Science (and other) … Read more

SyriaTel Customer Churn Analysis

For my third project at the Flatiron School, I chose to analyze the dataset on customer churn for a telecommunications company, SyriaTel. The objective was to build a classifier to determine if a customer would ‘soon’ leave SyriaTel, and to determine if there were predictable patterns. The data provided no time information but rather had … Read more

Text Generation With GPT-2 in Python

We can get some great results with very little code. Here are a few examples that should give you a better understanding of the impact of each argument in the .generate method. outputs = model.generate(inputs, max_length=200, do_sample=True)tokenizer.decode(outputs[0], skip_special_tokens=True)[Out]: “He began his premiership by forming a five-man war cabinet which included Chamerlain as Lord President of … Read more

A Visual Guide to Gradient Boosted Trees

An intuitive explanation of GBT using the MNIST database Hi everyone, welcome back to another article in the Visual Guide to Machine Learning series! We’ll learn yet another popular model ensembling method called Gradient Boosted Trees. If you haven’t already, check out the previous article to learn about Random Forests, where we introduce the concept … Read more

When Your Fitbit Says Your Heart is Exploding Should You Care?

Exploring Fitbit Heart Rate Data Recently with the reduced workout options available under Covid-19, I have been focusing on doing high-intensity interval training (HIIT). There are different methods of HIIIT workouts. For me, what this means is going flat out, as hard as I can, for about a minute, followed by a rest period, and … Read more

Data Quality from First Principles

The right way to think about Data Quality, from Kimball and Uber’s points of view Photo by Maxime Agnelli on Unsplash If you’ve spent any amount of time in business intelligence, you would know that data quality is a perennial challenge. It never really goes away. For instance, how many times have you been in … Read more

Internal Uniqueness Constraints — Object-Role Modeling

Internal Uniqueness Constraints made easy Object-Role Modeling (ORM) is a graphical conceptual modelling technique used predominantly for database analysis and design, but used for any circumstance where you would like to document or define a data structure. The following is a typical Object-Role Model: An Object-Role Model. Image by author. The model above depicts a … Read more

A gentle introduction to the 5 Google Cloud BigQuery APIs

The principal API for core interaction. Using this API you can interact with core resources as datasets, views, jobs, and routines. Up today exists 7 client libraries: C#, Go, Java, Node.js, PHP, Python, and Ruby. Example For this example, I will use the python client library for the BigQuery API on my personal computer. Consider … Read more

Advent of 2020, Day 28 – Infrastructure as Code and how to automate, script and deploy Azure Databricks with Powershell

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Series of Azure Databricks posts: Yesterday we looked into bringing the … Read more

Categories R Tags ExcerptFavorite

Insights from Visualizing Public Data on Twitch

Legend [By Author] What is this image? This is a computer-generated network graph created from a one week snapshot of twitch viewership. A higher resolution version is available here. Each node represents a single streamer that appeared in the top 100 streams on Twitch during data collection. Each node is analogous to one TV show … Read more

Memory Efficiency of Common Python Data Structures

4. Implication of Over-Allocation Why is it important to learn about over-allocation? It’s simple. Now that we know about how Python over-allocates dynamic data structures, we can look into ways to improve our Python scripts’ memory efficiency, making us one step closer to becoming a Python master. Using tuple As Static Arrays: Imagine we are … Read more

Build Dashboards in Less Than 10 Lines of Code!

Machine Learning Dashboards are a great way to interpret models. These usually describe the inner working of the model and provides interactive plots to discover model performance, feature importances, or “what if” analysis! All this can is generated with a few lines of code plus if you want, you can customize all the elements of … Read more