How does sparse convolution work?

The question is whether we can only calculate the convolution with the sparse data efficiently instead of scanning all the image pixels or spatial voxels. One intuitive thinking is, regular image signals are stored as matrix or tensor. And the corresponding convolution was calculated as dense matrix multiplication. The sparse signals are normally represented as … Read more

Custom Training Loops for Medical Image Segmentation in Tensorflow 2.x

Since neural networks are essentially a sequence of operations, one can visualize these operations as nodes on a graph. In Tensorflow 1.x, the way to execute your training was to write the relationships (or edges) of this computational graph (e.g. the layers of a neural net) and then compile it. Once compiled, you would provide … Read more

Optimization: A notorious road to Structured Inefficiency and transition to Combinatorial…

Inefficiencies and limitations faced by companies using traditional optimization methods and how combinatorial optimization might be the future of logistic industries. Photo by Markus Spiske on Unsplash Title of the article is very oxymoronic: having an optimization and inefficiencies in the same context. But it is very true looking at the trend and current practices … Read more

How to get started with data science in 2021.

A step-by-step approach to getting started and developing your skills in this rapidly changing field. Photo by Myriam Jessier on Unsplash For several years, Data Scientist was ranked as the best job in America by Glassdoor. Today it no longer holds the top spot in job rankings but it still ranks near the top of … Read more

Real-time Age, Gender and Emotion Prediction from Webcam with Keras and OpenCV

Find working codes and trained models here Chinatown @ Singapore (Photo credit to Lily Banse on Unsplash) Introduction In the era of Covid-19, we become more reliant on virtual interactions such as Zoom meetings / Teams chat. These livestream webcam videos have become a rich data source to explore. This article will explore the use … Read more

Summarise the 2020 with R and rgl

The end of the year is a great time to summarize accomplishments of the team. This year in MI2DataLab we summarized good things that happend in the form of baubles on the christmas tree (yes, this is the only known exception for using 3D plots). Each color of a bauble represents a different kind of … Read more

Categories R Tags ExcerptFavorite

Simulating the FIFA World Cup 2022

Who does the data choose to win the largest international football tournament yet? Image by Michal Jarmoluk from Pixabay. The grandest and most exciting of all football tournaments is still a ways off (2022), but in times like these I find solace in the fact that there are better things (like the next World Cup) … Read more

Jupyter Workflow for Data Scientists

setup, debug, version control, and deployment Photo by Greg Rakozy on Unsplash Many data scientists like to use Jupyter Notebook or JupyterLab to do their data explorations, visualizations, and model building. I know some data scientists refuse to use Jupyter Notebook. But, I love to use Jupyter Notebook/Lab to do my experiments and explorations. Here … Read more

Data analytics helps warehouse management

image by Author: Total annual shipped goods in rolls and total weight in the year 2018 and 2019 From the bar plot above, we can conclude that the new workshop’s outbound amount has increased after pursuing a new production machine, both on roll number and total cloth weight. Since almost all the knitting machines run … Read more

Advent of 2020, Day 31 – Azure Databricks documentation, learning materials and additional resources

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet Series of Azure Databricks posts: Dec 01: What is Azure Databricks … Read more

Categories R Tags ExcerptFavorite

Part 10: Discovering Multidimensional Time Series Motifs

Multidimensional Matrix Profiles with STUMPY (Image by Farhan Azam) (Image by Author) STUMPY is a powerful and scalable Python library for modern time series analysis and, at its core, efficiently computes something called a matrix profile. The goal of this multi-part series is to explain what the matrix profile is and how you can start … Read more

Analyzing Customer Satisfaction of Apple AirPods Using Exploratory Data Analysis and…

In Part 1, I went through the statistics regarding the industry, Apple, and AirPods. In this article, I will focus on a more technical analysis of my survey data to help us understand how customers are satisfied with their AirPods. Photo by Hasinteau on Unsplash The survey was conducted to identify the satisfaction level towards … Read more

Python Beginner Breakthroughs (List Comprehensions)

A quick Python “A-ha!” moment that can make you a more efficient and “Pythonic” coder for your data science endeavors. Photo by Javier Esteban on Unsplash In the past twelve months, I have started transitioning my professional focus from a traditional engineering role into one that is looking to utilize data science and machine learning … Read more

The Most Feature-Rich ML Forecasting Methods Available: Compliments of RemixAutoML

This is my go-to method. The main difference between the CatBoost, XGBoost, and H2O versions relate to the ML parameters available for tuning. All functions listed in this blog have working examples in the GitHub README, the R help files (which can be opened in your R session) or the package reference manual. Five feature … Read more


From Installation to Implementation: Part 2 Photo by Maarten van den Heuvel on Unsplash In part one, I started building a database to use for a monthlybudgeting application. After installing MongoDB, we discusseda brief overview of how to create a database, our firstcollections, and inserting documents. With some generalknowledge about MongoDB, the goal is to … Read more

AWS Control Tower console shows more detail about external AWS Config rules

With this feature, you now have a consolidated view of detective guardrails applied to your accounts so that you can easily track compliance and determine if additional guardrails are needed. AWS Control Tower is designed for organizations with multiple accounts and teams who are looking for the easiest way to set up their new or … Read more

Categories AWS ExcerptFavorite

LEFT/RIGHT in 5 languages (VBA/SQL/PYTHON/M query/DAX powerBI)

How to make your favorite Excel feature in another analytic language (VBA/SQL/PYTHON/M query/DAX powerBI) Photo by Nick Fewings on Unsplash Excel is a powerful spreadsheet used by most people working in data analysis. The increase of volume of data and development user-friendly tools is an opportunity of improvement of Excel reports by mixing them with … Read more

Markov models and Markov chains explained in real life: probabilistic workout routine

Through the work of Claude Shannon and many others after him, we can conclude that Markov models: Describe the world in a more realistic way, Are a useful tool to make long-term predictions about a system or process. Realistic tool to describe the world Most real-world systems and phenomena involve multiple parts, which are rarely … Read more

AWS CodePipeline supports deployments with CloudFormation StackSets

AWS CodePipeline has released two new actions for creating and deploying CloudFormation StackSets. The CloudFormationStackSet action dynamically creates and deploys an initial or updated stack set configuration. The CloudFormationStackInstance action safely rolls out the stack set changes to new or existing stack instances in the stack set, region by region, reducing the risk of failure. … Read more

Categories AWS ExcerptFavorite

Analyzing eBay’s AdWords Spending: Is This Extra Expense Worth It?

To approach this analysis, I began by first establishing a null hypothesis as a foundation for further hypothesis testing. In the case of the problem, this is the assertion that both the treatment and control groups would see the same revenue ratio before and after the experiment — in other words, the difference in revenue … Read more

Insights on Classifier Combination

As the arsenal of classification algorithms increased dramatically, it became more and more tempting to use several classifiers and then combine their decisions to gain in accuracy and avoid the burden of choosing the right one. Note that a combination of classifiers remains itself a classifier and the no free lunch theorem also applies to … Read more

AWS IoT SiteWise Monitor now supports AWS CloudFormation

Customers can now author CloudFormation templates to automate the creation and management of AWS IoT SiteWise Monitor resources for creating portals, projects, and dashboards, without having to write custom scripts, or manually use the dashboard creation process through the AWS IoT SiteWise Monitor portal console. Customers can also reuse these templates across AWS accounts and regions … Read more

Categories AWS ExcerptFavorite

Everything You Need to Know About TensorFlow

TensorFlow (Keras) provides us with two approaches for building our models. Those are the functional and sequential methods. A simple, single input-output, layer by layer architecture is perfect for the Sequential model. The Sequential model is used for simple, sequential stacks of layers where each layer has one input and one output. Architectures that require … Read more

Amazon Elastic Container Service launches new management console

On the cluster page, you can see the number of services and tasks in each cluster in your account and status for the resources. Clicking into a cluster lets you see all services and tasks along with which task definition family and revision is used. You can also see when each task started, letting you … Read more

Categories AWS ExcerptFavorite

Under the Hood: Using Gini impurity to your advantage in Decision Tree Classifiers

Photo by This article will serve as the first part of a potentially ongoing series, looking at the mathematical concepts that drive key parameters in the machine learning algorithms employed in data science. My goal in these posts will be to express key concepts in as simple and non-technical a language as possible, while … Read more

Dynamic Programming in RL

In this problem, we are given a grid (4 x 4 in this case). The goal is to reach either the top-left or the bottom-right square (gray colored) from any other square on the grid, with maximum reward. You can jump one square in either of the North, South, East, or West directions from any … Read more

Using MATLAB’s Deep Learning Toolbox | Part 1: Predicting Cancer Malignancy Using Shallow Neural…

A practical guide to getting started in Deep Learning Photo by Giorgio Grani on Unsplash What is Deep Learning? Deep learning is a subset of machine learning algorithms that use neural networks to learn complex patterns from large amounts of data. Due to advances in computing and the amount of data being acquired, these algorithms … Read more

R Shiny {golem} – Development to Production – Overview

This blog series follows the development and creation of an R Shiny application. For the purposes of keeping this focused on software development, we’ll be concentrating on Shiny rather than the business use case. As a background story, we’ll be creating an app for the hit TV show, The Office.  We’ll assume the following scenario: … Read more

Categories R Tags ExcerptFavorite

Visualisation of ranked choice voting in R

Tables with gt and animation with tweenr I recently returned to the R package avr, which runs a range of alternative voting procedures, to add more functionality, and in the process, got to grips with two visualisation packages: gt and tweenr. Each provides a different solution to the problem I had, which was how to … Read more

MLxtend: A Python Library with interesting tools for data science tasks

MLxtend library is developed by Sebastian Raschka (a professor of statistics at the University of Wisconsin-Madison). The library has nice API documentation as well as many examples. You can install the MLxtend package through the Python Package Index (PyPi) by running pip install mlxtend. In this post, I’m using the wine data set obtained from … Read more

Introducing AWS Data Exchange Publisher Coordinator and Subscriber Coordinator

Previously, AWS Data Exchange customers were required to manually upload and download their dataset revisions, or create and maintain their own solutions for automation. With the Publisher and Subscriber solutions, customers can now reduce the operational burden of manual processes, and bypass the engineering complexity of building custom automation. AWS Data Exchange Publisher Coordinator and … Read more

Categories AWS ExcerptFavorite

Trends in Data Science That Will Change Business Strategies

From individual skills to business development, data professionals have many opportunities in the next few years. Photo by Paweł Czerwiński on Unsplash In response to an atypical year, companies rely on data and analytics leaders to accelerate innovation and create new routes to generate revenue. However, recent research involving business leaders in the U.S., U.K., … Read more

The Danger of Overfitting a Model

An Explanation for Splitting Data into Training and Testing Sets Photo by Isaac Smith on Unsplash On my first job out of college, I was tasked with streamlining how a company made purchases. While a big project that encompassed many factors, such as lead times and order quantities, the most challenging part was determining how … Read more

Python Numpy and Matrices Questions for Data Scientists

I’ve been preparing for Data Science interviews for a while, and there is one thing that struck me the most is the lack of preparation for Numpy and Matrices questions. Often, Data Scientists are asked to perform simple matrix operations in Python, which should be straightforward but, unfortunately, throw a lot of candidates off the … Read more

5 Books for Data Engineers

Data Engineering Books Building foundations and framing your viewpoint towards data engineering Photo by Ahmad Ossayli on Unsplash About 3 years ago, I started my IT career as a Data Engineer and tried to find day-to-day solutions and answers surrounding the data platform. And, I always hope that there are some resources like the university … Read more

Breakthroughs in Time Series Forecasting at Neurips 2020

A deep dive into the latest literature in time series forecasting and how you can use them for your business use cases Photo by Brent Ninaber on Unsplash This year at the Neural Information Processing Conference, authors published a number of new papers focusing on time series forecasting and classification. Here I will briefly review … Read more

GridSearchCV for Beginners

The results of GridSearchCV can be somewhat misleading the first time around. The best combination of parameters found is more of a conditional “best” combination. This is due to the fact that the search can only test the parameters that you fed into param_grid. There could be a combination of parameters that further improves the … Read more

Embeeding and clustering combining Knime and Python

UMAP dimension reduction and DBSCAN for clustering MNIST database within KNIME Clustering. Olives and leaves. Shapes and colours. (Image by author) Knime is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. For people like me, who do … Read more