Starbucks Offer Dataset — Udacity Capstone

Business Understanding and Data Understanding Let’s first take a look at the data. From the datasets, it is clear that we would need to combine all three datasets in order to perform any analysis. Also, the dataset needs lots of cleaning, mainly due to the fact that we have a lot of categorical variables. I … Read more

Paper Summary & Implementation: Contrastive Learning of General-Purpose Audio Representations.

This post is a short summary and steps to implement the following paper: The objective of this paper is to learn self-supervised general-purpose audio representations using Discriminative Pre-Training. The authors train a 2D CNN EfficientNet-B0 to transform Mel-spectrograms into 1D-512 vectors. Those representations are then transferred to other tasks like Speaker Identification or Bird Song … Read more

An Ultimate Cheat Sheet for Data Visualization in Pandas

All the Basic Types of Visualization That Is Available in Pandas and Some Advanced Visualization That Are Extremely Useful and Time Saver We use python’s pandas’ library primarily for data manipulation in data analysis. But we can use Pandas for data visualization as well. You even do not need to import the Matplotlib library for … Read more

The Top 10 Best Places to Find Datasets

Awesome Data is a GitHub repository with a seriously impressive list of datasets separated by category. It is updated regularly. Jeremy Singer-Vine’s Data Is Plural weekly newsletter has great fresh data sources. I’m always impressed by the quality. The archive is available here. In addition to competitions, Kaggle has a huge range of datasets. Kaggle … Read more

Distilling Knowledge in Neural Network

Introduction In traditional machine learning, to achieve state-of-the-art (SOTA) performance, we often train a series of ensemble models for combating the weakness of a single model. However, achieving SOTA performance often comes with large computation using big models with millions of parameters. SOTA models like VGG16/19, ResNet50 have 138+ million and 23+ million parameters respectively. … Read more

How To Build Your Own Chatbot Using Deep Learning

Required Packages The required python packages are as follows, (here I mentioned the packages with versions that I have used for the developments) Define Intents I will define few simple intents and bunch of messages that corresponds to those intents and also map some responses according to each intent category. I will create a JSON … Read more

Sequence Mining My Browsing History with arulesSequences

[This article was first published on R | JLaw’s R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Typically when thinking of pattern mining people tend to … Read more

How to Write Unstoppable Code With Exception Handling in Python

Fighting against the errors in code is the most important part of every programmer’s life. In fact, the errors you are making will be your best teacher in programming. Sometimes, the code will undergo some unexpected situations while executing the code. When the interpreter or compiler could not able to proceed further it will send … Read more

AI is Flawed — Here’s Why

Fairness in AI and AI bias Photo by Bill Oxford on Unsplash Artificial Intelligence has become an integral part of everyone’s lives. Starting from simple tasks like YouTube recommendations to complex life-saving tasks like generating drugs to cure illness, it has become omnipresent. It is influencing our lives in more ways than we realize. But, … Read more

GPT-3 vs PET: Not Big but Beautiful!

An introduction to Pattern Exploiting Training Ever since the advent of transfer learning in Natural Language Processing, larger and larger models have been presented, in order to make more and more complex language tasks possible. But the more complex the models, the more time and the more amount of data it needs to train. The … Read more

Getting Started Guide — Anaconda

Open Source Platform For Python Distribution Photo by Jan Kopřiva on Unsplash With the ever-increasing demand for Python programming language, the first task which any beginner struggles with is the setting up of the right development environment. This tutorial aims to introduce you to Anaconda Platform, a free and open-source distribution of Python and R … Read more

Clear Coding with Overloading in Python

With easy to understand examples! Photo by Zach Kadolph on Unsplash When your Python code grows in size, most probably it becomes unorganised over time. Keeping your code in the same file as it grows makes your code difficult to maintain. At this point, Python modules and packages help you to organize and group your … Read more

Robust Python with Type Hints

Type hints in Python are optional. You can annotate your functions and hint as many things as you want; your scripts will still run regardless of the presence of annotations because Python itself doesn’t use them. You can hint your objects using these three methods: Special comments. Function annotations. Stub files for modules. For demonstration, … Read more

There is an R in Reproducibility

We’ve all been there. You are asked to quickly rerun one of the complex analysis projects you had completed a few months ago. A new cut of the data has arrived and the new results need to be generated quickly. Confident you open the folder with all the analysis and output scripts and get to … Read more

All the ways to initialize your neural network

In this article, I evaluate the many ways of weight initialization and current best practices. Initializing weights to zero DOES NOT WORK. Then Why have I mentioned it here? To understand the need for weight initialization, we need to understand why initializing weights to zero WON’T work. Fig 1. Simple Network. Image by the Author. … Read more

AI Understanding: What is an Elephant?

The picture above shows seven alternative elephant pictures and their DenseNet [9] classifications implemented in Keras with pre-trained ImageNet weights. As we can see, in many cases, the model recognizes the format (drawing, plush) instead of the animal in the image. Only two of the predictions has an elephant in the top5 predictions. The images … Read more

AWS Database Migration Service now supports Amazon DocumentDB (with MongoDB compatibility) as a source

AWS Database Migration Service (AWS DMS) has expanded functionality by adding support for Amazon DocumentDB (with MongoDB compatibility) as a source. Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and … Read more

AWS Database Migration Service Now Supports Parallel Full Load for Amazon DocumentDB (with MongoDB compatibility) and MongoDB

AWS Database Migration Service (AWS DMS) helps you migrate databases to AWS quickly and securely. With this launch, AWS DMS now supports parallel full load with the range segmentation option when using Amazon DocumentDB (with MongoDB compatibility) and MongoDB as a source. You can accelerate the migration of large collections by splitting them into segments … Read more

Two Simple Method to Sort a Python Dictionary

Photo by Aung Soe Min on Unsplash And the difference between sort() and sorted() Dictionary is an important data structure that stores data by mapping keys with values. The default dictionaries in Python are unordered data structures. Like lists, we can use the sorted() function to sort the dictionary by keys. However, it will only … Read more

Teaching Binary Search to Someone who has No Technical Knowledge

Tip #1: Communicate the problem as if it has nothing to do with Programming Polina Tankilevitch | Pexels Teaching basic algorithms can be one of the most challenging parts of introducing someone to Computer Science. Between the hypothetical problems and abstract thinking, it can feel overwhelming. But it doesn’t have to be. Unpopular opinion: Any … Read more

Computer Vision and Camera Calibration for Self Driving Cars

Pinhole Camera Model and Types of Distortion Before start correcting for distortion, let’s get some intuition as to how this distortion occurs. Here’s a simple model of a camera called the pinhole camera model. When the camera forms an image, it’s looking at the worldsimilar to how our eyes do. By focusing the light that’s … Read more

Amazon Chime SDK for JavaScript now enables meeting health monitoring and troubleshooting

Meeting events make it simple to automatically save to your AWS account seven events types and 22 attributes related to audio video sessions on the Amazon Chime SDK— including device and environment information, SDK version, and network conditions. Based on these analytics, you can create custom Amazon CloudWatch dashboards to monitor the performance of your … Read more

Amazon Cognito User Pools enables easy quota management and usage tracking

Amazon Cognito User Pools now enables you to manage quotas for commonly used operation categories, such as user creation and user authentication, as well as view quotas and usage levels in the AWS Service Quotas dashboard or in CloudWatch metrics. This update makes it simple to view your quota usage of and request rate increases … Read more

A few tips to improve you data presentations

Step 7 : Add percentage of increase to answer reader’s questions We now are very tempted to quantify how much the 10% dropout accuracy helped over the 0% dropout setting. To help the reader for that, we add this percentage : Image by Author And we’re done ! The final matplotlib code is : Conclusion … Read more

Implementing a stateful architecture with Streamlit

Using PostgreSQL to create stateful, multi-page applications with Streamlit Photo by Mitchell Luo on Unsplash Streamlit Streamlit has come a long way since its inception back in October of 2019. It has empowered the software development community and has effectively democratized the way we develop and deploy apps to the cloud. However as with all … Read more

5 Uncommon Storage Files in Python

How to download files from the web, Bonus 2 In daily work life, we don’t always have the files we want to be saved locally. We might have to download them from remote servers or interact with them in place. While it is possible to download the files by hand, it would be a shame … Read more

Accelerate Complicated Statistical Test for Data Scientist with Pingouin

According to the Pingouin homepage, this package is designed for users who want simple yet exhaustive stats functions. It was designed like that because some function, just like the t-test from the SciPy, returns only the T-value and the p-value when sometimes we want more explanation regarding the data. In the Pingouin package, the calculation … Read more

beakr – A small web framework for R

[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. What is beakr? beakr is an unopinionated and minimalist web framework for developing … Read more

World Cities Day 2020: Committing to sustainable urban environments

October 31 is World Cities Day, established to educate the public on issues of concern, to mobilize political will and resources to address global problems, and to celebrate and reinforce achievements of humanity. This year’s theme ‘Better City, Better Life: Valuing Our Communities and Cities’ is expected to greatly promote the international community’s interest in … Read more

These inspiring small and medium businesses are helping the world navigate COVID-19 togetherThese inspiring small and medium businesses are helping the world navigate COVID-19 togetherHead of EMEA SMB Sales, Google Cloud

No matter where you are in the world, we’ve all had to adapt our routines to navigate the “new normal” brought on by COVID-19. We’ve been inspired by the ways businesses and organizations have responded, from accelerating medical research using the latest AI technology to enabling remote healthcare to protect doctors and their patients. As … Read more


Rendezvous of Python, SQL, Spark, and Distributed Computing making Machine Learning on Big Data possible Photo by Ben Weber on Unsplash Shilpa was immensely happy with her first job as a data scientist in a promising startup. She was in love with SciKit-Learn libraries and especially Pandas. It was fun to perform data exploration using … Read more

Simple Edge Detection Model using Python

Now, we can move to the fun part, where we will write the edge detection function. You will be amazed at how simple it is using the OpenCV package. This OpenCV detection model is also known as the Canny edge detection model. Our function is consisting of three parts: edge detection, visualization, and lastly, saving … Read more

What is the K-Nearest Neighbor?

An introduction with a Python example Photo by Jon Tyson on Unsplash K-Nearest Neighbor (KNN) is an easy to understand, but essential and broadly applicable supervised machine learning technique. To understand the intuition behind KNN, examine the scatterplot below. The plot shows the relationship between two arbitrary dimensions, x and y. The blue points represent … Read more

Scaling Your Time Series Forecasting Project

Our Design — Based on separate building blocks working together For our system to scale and accommodate more features, more models and eventually more forecasters we divided it to the above building blocks. Each building block stands on its own but also knows how to communicate and work with others. A short explanation of each … Read more

Tableau’s AI-enabled Explain Data feature

A speedy intro to a Tableau feature that starts explaining the ‘Why’ of data points. No extra set up required. Image by chenspec from Pixabay One of the reasons data analysts create data visualizations is to find unexpected and unusual scenarios. The easier it is to get some more detailed information directly within the viz, … Read more

Compact prediction tree

Before entering into details of how it is used for making a prediction, let’s describe the different elements that compose a Compact Prediction Tree (CPT): A trie, for efficient storage of sequences. An inverted index, for constant time retrieving of sequences containing a certain word. A lookup table, for retrieving the sequence from the sequence … Read more

Extracting shift handover data from paper forms in a hospital environment

Now that the project has been deployed to production, it’s a good time to look it retrospectively and reflect on various things/lessons that I learnt from doing it. 1. The training process There should be 3 main stages in predicting the contents of a form using the supervised learning approach, detect the text elements (printed … Read more

A Complete Logistic Regression Algorithm From Scratch in Python: Step by Step

Developed the Algorithm Using a Real-World Dataset Logistic regression is a popular method since the last century. It establishes the relationship between a categorical variable and one or more independent variables. This relationship is used in machine learning to predict the outcome of a categorical variable. It is widely used in many different fields such … Read more