## How does sparse convolution work?

The question is whether we can only calculate the convolution with the sparse data efficiently instead of scanning all the image pixels or spatial voxels. One intuitive thinking is, regular image signals are stored as matrix or tensor. And the corresponding convolution was calculated as dense matrix multiplication. The sparse signals are normally represented as … Read more

Categories Featured Excerpt

## RObservations #6- #TidyTuesday – Analyzing data on the Australian Bush Fires

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Since April 2018 the R4DS community has been putting out unique … Read more

Categories R Tags Excerpt

## Custom Training Loops for Medical Image Segmentation in Tensorflow 2.x

Since neural networks are essentially a sequence of operations, one can visualize these operations as nodes on a graph. In Tensorflow 1.x, the way to execute your training was to write the relationships (or edges) of this computational graph (e.g. the layers of a neural net) and then compile it. Once compiled, you would provide … Read more

Categories Featured Excerpt

## Optimization: A notorious road to Structured Inefficiency and transition to Combinatorial…

Inefficiencies and limitations faced by companies using traditional optimization methods and how combinatorial optimization might be the future of logistic industries. Photo by Markus Spiske on Unsplash Title of the article is very oxymoronic: having an optimization and inefficiencies in the same context. But it is very true looking at the trend and current practices … Read more

Categories Featured Excerpt

## Decision Trees — The Maths, The Theory, The Benefits

Components of a Tree A decision tree has the following components: Node — a point in the tree between two branches, in which a rule is declaredRoot Node — the first node in the treeBranches — arrow connecting one node to another, the direction to travel depending on how the datapoint relates to the rule … Read more

Categories Featured Excerpt

## Complete Guide to Setting Up a New Python Environment For Data Science

When I recently upgraded to a new computer and my first MacOs operating system, naturally on my very first boot, I looked up ways to set up a new Python environment from scratch for all my machine learning and data science needs. I came across many articles on the web and reading two and three … Read more

## How to get started with data science in 2021.

A step-by-step approach to getting started and developing your skills in this rapidly changing field. Photo by Myriam Jessier on Unsplash For several years, Data Scientist was ranked as the best job in America by Glassdoor. Today it no longer holds the top spot in job rankings but it still ranks near the top of … Read more

Categories Featured Excerpt

## Real-time Age, Gender and Emotion Prediction from Webcam with Keras and OpenCV

Find working codes and trained models here Chinatown @ Singapore (Photo credit to Lily Banse on Unsplash) Introduction In the era of Covid-19, we become more reliant on virtual interactions such as Zoom meetings / Teams chat. These livestream webcam videos have become a rich data source to explore. This article will explore the use … Read more

Categories Featured Excerpt

## How to Make Stunning Geomaps in R: A Complete Guide with Leaflet

Popups provide a neat and clean way of displaying more information whenever you click on a marker of interest. You’ll use them to add information on the time of the earthquake, it’s magnitude, depth, and place. You can use the paste0 function to add the data. If you want something styled, you can use various … Read more

Categories Featured, R Excerpt

## Summarise the 2020 with R and rgl

The end of the year is a great time to summarize accomplishments of the team. This year in MI2DataLab we summarized good things that happend in the form of baubles on the christmas tree (yes, this is the only known exception for using 3D plots). Each color of a bauble represents a different kind of … Read more

Categories R Tags Excerpt

## Simulating the FIFA World Cup 2022

Who does the data choose to win the largest international football tournament yet? Image by Michal Jarmoluk from Pixabay. The grandest and most exciting of all football tournaments is still a ways off (2022), but in times like these I find solace in the fact that there are better things (like the next World Cup) … Read more

Categories Featured Excerpt

## Jupyter Workflow for Data Scientists

setup, debug, version control, and deployment Photo by Greg Rakozy on Unsplash Many data scientists like to use Jupyter Notebook or JupyterLab to do their data explorations, visualizations, and model building. I know some data scientists refuse to use Jupyter Notebook. But, I love to use Jupyter Notebook/Lab to do my experiments and explorations. Here … Read more

Categories Featured Excerpt

## Data analytics helps warehouse management

image by Author: Total annual shipped goods in rolls and total weight in the year 2018 and 2019 From the bar plot above, we can conclude that the new workshop’s outbound amount has increased after pursuing a new production machine, both on roll number and total cloth weight. Since almost all the knitting machines run … Read more

## Advent of 2020, Day 31 – Azure Databricks documentation, learning materials and additional resources

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet Series of Azure Databricks posts: Dec 01: What is Azure Databricks … Read more

Categories R Tags Excerpt

## Part 10: Discovering Multidimensional Time Series Motifs

Multidimensional Matrix Profiles with STUMPY (Image by Farhan Azam) (Image by Author) STUMPY is a powerful and scalable Python library for modern time series analysis and, at its core, efficiently computes something called a matrix profile. The goal of this multi-part series is to explain what the matrix profile is and how you can start … Read more

## Create an Interactive Dashboard with Shiny, Flexdashboard, and Plotly

Initialize a Flexdashboard from R Studio using File > New File > R markdown > From Template > Flex Dashboard, save, and knit the document. This creates a static, two-column dashboard with one chart on the left and two on the right: Step 1. flexdashboard “`1. Add runtime: shiny to the YAML header at the … Read more

Categories Featured Excerpt

## Analyzing Customer Satisfaction of Apple AirPods Using Exploratory Data Analysis and…

In Part 1, I went through the statistics regarding the industry, Apple, and AirPods. In this article, I will focus on a more technical analysis of my survey data to help us understand how customers are satisfied with their AirPods. Photo by Hasinteau on Unsplash The survey was conducted to identify the satisfaction level towards … Read more

## Python Beginner Breakthroughs (List Comprehensions)

A quick Python “A-ha!” moment that can make you a more efficient and “Pythonic” coder for your data science endeavors. Photo by Javier Esteban on Unsplash In the past twelve months, I have started transitioning my professional focus from a traditional engineering role into one that is looking to utilize data science and machine learning … Read more

## The Most Feature-Rich ML Forecasting Methods Available: Compliments of RemixAutoML

This is my go-to method. The main difference between the CatBoost, XGBoost, and H2O versions relate to the ML parameters available for tuning. All functions listed in this blog have working examples in the GitHub README, the R help files (which can be opened in your R session) or the package reference manual. Five feature … Read more

Categories Featured Excerpt

## MongoDB

From Installation to Implementation: Part 2 Photo by Maarten van den Heuvel on Unsplash In part one, I started building a database to use for a monthlybudgeting application. After installing MongoDB, we discusseda brief overview of how to create a database, our firstcollections, and inserting documents. With some generalknowledge about MongoDB, the goal is to … Read more

Categories Featured Excerpt

## AWS Control Tower console shows more detail about external AWS Config rules

With this feature, you now have a consolidated view of detective guardrails applied to your accounts so that you can easily track compliance and determine if additional guardrails are needed. AWS Control Tower is designed for organizations with multiple accounts and teams who are looking for the easiest way to set up their new or … Read more

Categories AWS Excerpt

## LEFT/RIGHT in 5 languages (VBA/SQL/PYTHON/M query/DAX powerBI)

How to make your favorite Excel feature in another analytic language (VBA/SQL/PYTHON/M query/DAX powerBI) Photo by Nick Fewings on Unsplash Excel is a powerful spreadsheet used by most people working in data analysis. The increase of volume of data and development user-friendly tools is an opportunity of improvement of Excel reports by mixing them with … Read more

## Markov models and Markov chains explained in real life: probabilistic workout routine

Through the work of Claude Shannon and many others after him, we can conclude that Markov models: Describe the world in a more realistic way, Are a useful tool to make long-term predictions about a system or process. Realistic tool to describe the world Most real-world systems and phenomena involve multiple parts, which are rarely … Read more

Categories Featured Excerpt

## AWS CodePipeline supports deployments with CloudFormation StackSets

AWS CodePipeline has released two new actions for creating and deploying CloudFormation StackSets. The CloudFormationStackSet action dynamically creates and deploys an initial or updated stack set configuration. The CloudFormationStackInstance action safely rolls out the stack set changes to new or existing stack instances in the stack set, region by region, reducing the risk of failure. … Read more

Categories AWS Excerpt

## Analyzing eBay’s AdWords Spending: Is This Extra Expense Worth It?

To approach this analysis, I began by first establishing a null hypothesis as a foundation for further hypothesis testing. In the case of the problem, this is the assertion that both the treatment and control groups would see the same revenue ratio before and after the experiment — in other words, the difference in revenue … Read more

Categories Featured Excerpt

## Insights on Classifier Combination

As the arsenal of classification algorithms increased dramatically, it became more and more tempting to use several classifiers and then combine their decisions to gain in accuracy and avoid the burden of choosing the right one. Note that a combination of classifiers remains itself a classifier and the no free lunch theorem also applies to … Read more

Categories Featured Excerpt

## AWS IoT SiteWise Monitor now supports AWS CloudFormation

Customers can now author CloudFormation templates to automate the creation and management of AWS IoT SiteWise Monitor resources for creating portals, projects, and dashboards, without having to write custom scripts, or manually use the dashboard creation process through the AWS IoT SiteWise Monitor portal console. Customers can also reuse these templates across AWS accounts and regions … Read more

Categories AWS Excerpt

## Everything You Need to Know About TensorFlow

TensorFlow (Keras) provides us with two approaches for building our models. Those are the functional and sequential methods. A simple, single input-output, layer by layer architecture is perfect for the Sequential model. The Sequential model is used for simple, sequential stacks of layers where each layer has one input and one output. Architectures that require … Read more

## The Year 2020: Analyzing Twitter Users’ Reflections using NLP

A Sentiment Analysis Project using Python and Tableau Photo by Claudio Schwarz on Unsplash A lot happened this year and if you watch the movie “Death to 2020” on Netflix, you will have an idea of the timeline of events. For this project, I thought it would be interesting to gain insights into what Twitter … Read more

## Amazon Elastic Container Service launches new management console

On the cluster page, you can see the number of services and tasks in each cluster in your account and status for the resources. Clicking into a cluster lets you see all services and tasks along with which task definition family and revision is used. You can also see when each task started, letting you … Read more

Categories AWS Excerpt

## Under the Hood: Using Gini impurity to your advantage in Decision Tree Classifiers

Photo by unsplash.com/@andrewtneel This article will serve as the first part of a potentially ongoing series, looking at the mathematical concepts that drive key parameters in the machine learning algorithms employed in data science. My goal in these posts will be to express key concepts in as simple and non-technical a language as possible, while … Read more

Categories Featured Excerpt

## How to Set Up a Foreign Data Wrapper in PostgreSQL

A foreign data wrapper is an extension available in PostgreSQL that allows you to access a table or schema in one database from another. Foreign data wrappers can serve all sorts of purposes: Completing a data flow cycle Your data may be segregated across databases, but still related in ways that makes being able to … Read more

Categories Featured Excerpt

## Dynamic Programming in RL

In this problem, we are given a grid (4 x 4 in this case). The goal is to reach either the top-left or the bottom-right square (gray colored) from any other square on the grid, with maximum reward. You can jump one square in either of the North, South, East, or West directions from any … Read more

Categories Featured Excerpt

## Using MATLAB’s Deep Learning Toolbox | Part 1: Predicting Cancer Malignancy Using Shallow Neural…

A practical guide to getting started in Deep Learning Photo by Giorgio Grani on Unsplash What is Deep Learning? Deep learning is a subset of machine learning algorithms that use neural networks to learn complex patterns from large amounts of data. Due to advances in computing and the amount of data being acquired, these algorithms … Read more

Categories Featured Excerpt

## R Shiny {golem} – Development to Production – Overview

This blog series follows the development and creation of an R Shiny application. For the purposes of keeping this focused on software development, we’ll be concentrating on Shiny rather than the business use case. As a background story, we’ll be creating an app for the hit TV show, The Office.  We’ll assume the following scenario: … Read more

Categories R Tags Excerpt

A simple guide to help you create one. No Expertise Required! My Website. BG photo by Daniel Leone on Unsplash A personal portfolio website is like a digital resume. It can make a lot of difference in your career. It can be a great platform for you to publish your work and portfolio, it could … Read more

Categories Featured Excerpt

## Visualisation of ranked choice voting in R

Tables with gt and animation with tweenr I recently returned to the R package avr, which runs a range of alternative voting procedures, to add more functionality, and in the process, got to grips with two visualisation packages: gt and tweenr. Each provides a different solution to the problem I had, which was how to … Read more

Categories Featured Excerpt

## Advent of 2020, Day 30 – Monitoring and troubleshooting of Apache Spark

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Series of Azure Databricks posts: Yesterday we looked into performance tuning … Read more

Categories R Tags Excerpt

## MLxtend: A Python Library with interesting tools for data science tasks

MLxtend library is developed by Sebastian Raschka (a professor of statistics at the University of Wisconsin-Madison). The library has nice API documentation as well as many examples. You can install the MLxtend package through the Python Package Index (PyPi) by running pip install mlxtend. In this post, I’m using the wine data set obtained from … Read more

## Unlock M-Step in Expectation Maximization from GMM

Now that we’ve isolated each component of the equation, let’s combine them into some common mathy phrases that are important to conversing in the language of EM by examining the M-Step. Clusters, Gaussians, The Letter J or K and sometimes C: This all generally the same thing — if we have 3 clusters, then you … Read more

## Introducing AWS Data Exchange Publisher Coordinator and Subscriber Coordinator

Previously, AWS Data Exchange customers were required to manually upload and download their dataset revisions, or create and maintain their own solutions for automation. With the Publisher and Subscriber solutions, customers can now reduce the operational burden of manual processes, and bypass the engineering complexity of building custom automation. AWS Data Exchange Publisher Coordinator and … Read more

Categories AWS Excerpt

## Trends in Data Science That Will Change Business Strategies

From individual skills to business development, data professionals have many opportunities in the next few years. Photo by Paweł Czerwiński on Unsplash In response to an atypical year, companies rely on data and analytics leaders to accelerate innovation and create new routes to generate revenue. However, recent research involving business leaders in the U.S., U.K., … Read more

Categories Featured Excerpt

## The Danger of Overfitting a Model

An Explanation for Splitting Data into Training and Testing Sets Photo by Isaac Smith on Unsplash On my first job out of college, I was tasked with streamlining how a company made purchases. While a big project that encompassed many factors, such as lead times and order quantities, the most challenging part was determining how … Read more

Categories Featured Excerpt

## Python Numpy and Matrices Questions for Data Scientists

I’ve been preparing for Data Science interviews for a while, and there is one thing that struck me the most is the lack of preparation for Numpy and Matrices questions. Often, Data Scientists are asked to perform simple matrix operations in Python, which should be straightforward but, unfortunately, throw a lot of candidates off the … Read more

## 5 Books for Data Engineers

Data Engineering Books Building foundations and framing your viewpoint towards data engineering Photo by Ahmad Ossayli on Unsplash About 3 years ago, I started my IT career as a Data Engineer and tried to find day-to-day solutions and answers surrounding the data platform. And, I always hope that there are some resources like the university … Read more

Categories Featured Excerpt

## Be Careful of This Data Science Mistake I Wasted 30 Hours Over

This had happened so many times that — machine learning models being naturally lazy — the model gave up trying to learn and instead memorized the data, something a neural network has no trouble doing at all. Hence, when presented with data it had truly never seen before, it flunked. The model had already seen … Read more

Categories Featured Excerpt

## Breakthroughs in Time Series Forecasting at Neurips 2020

A deep dive into the latest literature in time series forecasting and how you can use them for your business use cases Photo by Brent Ninaber on Unsplash This year at the Neural Information Processing Conference, authors published a number of new papers focusing on time series forecasting and classification. Here I will briefly review … Read more

Categories Featured Excerpt

## Prettify your Terminal Text With Termcolor and Pyfiglet

Bored with Your Terminal Output? Let’s Change its Color and Shape! If you are working with Python, you probably print the output on the terminal either to debug or to be informed of the process. However, if the output is lengthy, it is difficult to keep track of the output. Is there a way that … Read more

## GridSearchCV for Beginners

The results of GridSearchCV can be somewhat misleading the first time around. The best combination of parameters found is more of a conditional “best” combination. This is due to the fact that the search can only test the parameters that you fed into param_grid. There could be a combination of parameters that further improves the … Read more

Categories Featured Excerpt

## Embeeding and clustering combining Knime and Python

UMAP dimension reduction and DBSCAN for clustering MNIST database within KNIME Clustering. Olives and leaves. Shapes and colours. (Image by author) Knime is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. For people like me, who do … Read more

Categories Featured Excerpt