Business Understanding and Data Understanding Let’s first take a look at the data. From the datasets, it is clear that we would need to combine all three datasets in order to perform any analysis. Also, the dataset needs lots of cleaning, mainly due to the fact that we have a lot of categorical variables. I … Read more Starbucks Offer Dataset — Udacity Capstone
In simple terms, we can say that a vector is a quantity which has a magnitude as well as direction (elementary physics/math). In SVM; the magnitude is the numerical value (which can be calculated either by taking the mod or square root of sums of x and y coordinates; similar to hypotenuse in pythagoras theorum). … Read more Breaking Down the Support Vector Machine (SVM) Algorithm
This post is a short summary and steps to implement the following paper: The objective of this paper is to learn self-supervised general-purpose audio representations using Discriminative Pre-Training. The authors train a 2D CNN EfficientNet-B0 to transform Mel-spectrograms into 1D-512 vectors. Those representations are then transferred to other tasks like Speaker Identification or Bird Song … Read more Paper Summary & Implementation: Contrastive Learning of General-Purpose Audio Representations.
All the Basic Types of Visualization That Is Available in Pandas and Some Advanced Visualization That Are Extremely Useful and Time Saver We use python’s pandas’ library primarily for data manipulation in data analysis. But we can use Pandas for data visualization as well. You even do not need to import the Matplotlib library for … Read more An Ultimate Cheat Sheet for Data Visualization in Pandas
Awesome Data is a GitHub repository with a seriously impressive list of datasets separated by category. It is updated regularly. Jeremy Singer-Vine’s Data Is Plural weekly newsletter has great fresh data sources. I’m always impressed by the quality. The archive is available here. In addition to competitions, Kaggle has a huge range of datasets. Kaggle … Read more The Top 10 Best Places to Find Datasets
Introduction In traditional machine learning, to achieve state-of-the-art (SOTA) performance, we often train a series of ensemble models for combating the weakness of a single model. However, achieving SOTA performance often comes with large computation using big models with millions of parameters. SOTA models like VGG16/19, ResNet50 have 138+ million and 23+ million parameters respectively. … Read more Distilling Knowledge in Neural Network
Required Packages The required python packages are as follows, (here I mentioned the packages with versions that I have used for the developments) Define Intents I will define few simple intents and bunch of messages that corresponds to those intents and also map some responses according to each intent category. I will create a JSON … Read more How To Build Your Own Chatbot Using Deep Learning
To get the benefits of measure theory for the box volume the mathematician starts to work his way through the checklist. As a first step, he defines his box measure mathematically. A box in three dimensions can be written as a set: where w₁ ≤ w₂, h₁ ≤ h₂, d₁ ≤ d₂ are all real … Read more How to make measure theory usable for your problem?
[This article was first published on R | JLaw’s R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Typically when thinking of pattern mining people tend to … Read more Sequence Mining My Browsing History with arulesSequences
Fighting against the errors in code is the most important part of every programmer’s life. In fact, the errors you are making will be your best teacher in programming. Sometimes, the code will undergo some unexpected situations while executing the code. When the interpreter or compiler could not able to proceed further it will send … Read more How to Write Unstoppable Code With Exception Handling in Python
As a student about to graduate from college, I was all ready to start #adulting and a part of that was getting a new car. Of course, I soon realised that (a) I did not have any money, and (b) cars in Singapore are prohibitively expensive. So, I did what anyone in my situation would … Read more How much does a car’s brand affect its price?
Fairness in AI and AI bias Photo by Bill Oxford on Unsplash Artificial Intelligence has become an integral part of everyone’s lives. Starting from simple tasks like YouTube recommendations to complex life-saving tasks like generating drugs to cure illness, it has become omnipresent. It is influencing our lives in more ways than we realize. But, … Read more AI is Flawed — Here’s Why
An introduction to Pattern Exploiting Training Ever since the advent of transfer learning in Natural Language Processing, larger and larger models have been presented, in order to make more and more complex language tasks possible. But the more complex the models, the more time and the more amount of data it needs to train. The … Read more GPT-3 vs PET: Not Big but Beautiful!
Doing cool things with data The 2020 elections in the US are around the corner. Fake News published on social media is a HUGE problem around the election time. While some of the Fake News is produced purposefully for skewing election results or to make a quick buck through advertisement, false information can also be … Read more Election Special: Detect Fake News using Transformers
Over the last few months, we’ve been collecting hundreds of COVID-19 blog posts from the R community. Today, we are excited to share this dataset publicly, to help bloggers who want to analyze COVID-19 data by unleashing R and the resources of its community by being able to research such posts. So far, we have … Read more COVID-19 Posts: A Public Dataset Containing 400+ COVID-19 Blog Posts
Open Source Platform For Python Distribution Photo by Jan Kopřiva on Unsplash With the ever-increasing demand for Python programming language, the first task which any beginner struggles with is the setting up of the right development environment. This tutorial aims to introduce you to Anaconda Platform, a free and open-source distribution of Python and R … Read more Getting Started Guide — Anaconda
Some of you may have noticed that it’s been a while since my last article. That’s because I’ve become a dad in the meantime, and I’ve had to take a momentary break from my projects to deal with some parental tasks that can’t (yet) be automated. Or, can they? While we’re probably still a few … Read more Create your own smart baby monitor with a RaspberryPi and Tensorflow
Whether you’re starting on a fresh project, or running on a remote machine, you don’t want to waste time chasing down dependencies and installing software libraries. This tutorial will provide one of the fastest ways to get set up from a blank slate. I have tested this approach by completing it on a plain no-frills … Read more Setting Up a New PyTorch Deep Learning Environment
With easy to understand examples! Photo by Zach Kadolph on Unsplash When your Python code grows in size, most probably it becomes unorganised over time. Keeping your code in the same file as it grows makes your code difficult to maintain. At this point, Python modules and packages help you to organize and group your … Read more Clear Coding with Overloading in Python
Type hints in Python are optional. You can annotate your functions and hint as many things as you want; your scripts will still run regardless of the presence of annotations because Python itself doesn’t use them. You can hint your objects using these three methods: Special comments. Function annotations. Stub files for modules. For demonstration, … Read more Robust Python with Type Hints
We’ve all been there. You are asked to quickly rerun one of the complex analysis projects you had completed a few months ago. A new cut of the data has arrived and the new results need to be generated quickly. Confident you open the folder with all the analysis and output scripts and get to … Read more There is an R in Reproducibility
In this article, I evaluate the many ways of weight initialization and current best practices. Initializing weights to zero DOES NOT WORK. Then Why have I mentioned it here? To understand the need for weight initialization, we need to understand why initializing weights to zero WON’T work. Fig 1. Simple Network. Image by the Author. … Read more All the ways to initialize your neural network
The picture above shows seven alternative elephant pictures and their DenseNet  classifications implemented in Keras with pre-trained ImageNet weights. As we can see, in many cases, the model recognizes the format (drawing, plush) instead of the animal in the image. Only two of the predictions has an elephant in the top5 predictions. The images … Read more AI Understanding: What is an Elephant?
AWS Database Migration Service (AWS DMS) has expanded functionality by adding support for Amazon DocumentDB (with MongoDB compatibility) as a source. Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and … Read more AWS Database Migration Service now supports Amazon DocumentDB (with MongoDB compatibility) as a source
Prologue So, right now I want to test my skill using PyTorch. I am new to PyTorch, previously I am using Tensorflow to build an ML model. I already read all the basic tutorials and ready to build a real model using real data. What your favorite site to found real data to test your … Read more Significant Wave Prediction Using Neural Network
We were excited to see the power of BigQuery receive a further boost this week with the release of 12 new BigQuery SQL features. Google Cloud describes these as “user-friendly SQL capabilities”. So, let’s take a look at what’s now possible. New to BigQuery is the ability to add new columns via the ALTER TABLE … Read more Just Released: BigQuery User friendly SQL Functions
AWS Database Migration Service (AWS DMS) helps you migrate databases to AWS quickly and securely. With this launch, AWS DMS now supports parallel full load with the range segmentation option when using Amazon DocumentDB (with MongoDB compatibility) and MongoDB as a source. You can accelerate the migration of large collections by splitting them into segments … Read more AWS Database Migration Service Now Supports Parallel Full Load for Amazon DocumentDB (with MongoDB compatibility) and MongoDB
Photo by Aung Soe Min on Unsplash And the difference between sort() and sorted() Dictionary is an important data structure that stores data by mapping keys with values. The default dictionaries in Python are unordered data structures. Like lists, we can use the sorted() function to sort the dictionary by keys. However, it will only … Read more Two Simple Method to Sort a Python Dictionary
Tip #1: Communicate the problem as if it has nothing to do with Programming Polina Tankilevitch | Pexels Teaching basic algorithms can be one of the most challenging parts of introducing someone to Computer Science. Between the hypothetical problems and abstract thinking, it can feel overwhelming. But it doesn’t have to be. Unpopular opinion: Any … Read more Teaching Binary Search to Someone who has No Technical Knowledge
Pinhole Camera Model and Types of Distortion Before start correcting for distortion, let’s get some intuition as to how this distortion occurs. Here’s a simple model of a camera called the pinhole camera model. When the camera forms an image, it’s looking at the worldsimilar to how our eyes do. By focusing the light that’s … Read more Computer Vision and Camera Calibration for Self Driving Cars
Amazon Cognito User Pools now enables you to manage quotas for commonly used operation categories, such as user creation and user authentication, as well as view quotas and usage levels in the AWS Service Quotas dashboard or in CloudWatch metrics. This update makes it simple to view your quota usage of and request rate increases … Read more Amazon Cognito User Pools enables easy quota management and usage tracking
Step 7 : Add percentage of increase to answer reader’s questions We now are very tempted to quantify how much the 10% dropout accuracy helped over the 0% dropout setting. To help the reader for that, we add this percentage : Image by Author And we’re done ! The final matplotlib code is : Conclusion … Read more A few tips to improve you data presentations
Using PostgreSQL to create stateful, multi-page applications with Streamlit Photo by Mitchell Luo on Unsplash Streamlit Streamlit has come a long way since its inception back in October of 2019. It has empowered the software development community and has effectively democratized the way we develop and deploy apps to the cloud. However as with all … Read more Implementing a stateful architecture with Streamlit
How to write your first scraper with a few lines of code Photo by Pankaj Patel on Unsplash Data science is only possible with data, and in the real world, the data is usually not waiting for you in a .csv file. You have to go after it. That’s why web scraping is very important … Read more A Gentle Introduction to Web Scraping with Python
How to download files from the web, Bonus 2 In daily work life, we don’t always have the files we want to be saved locally. We might have to download them from remote servers or interact with them in place. While it is possible to download the files by hand, it would be a shame … Read more 5 Uncommon Storage Files in Python
According to the Pingouin homepage, this package is designed for users who want simple yet exhaustive stats functions. It was designed like that because some function, just like the t-test from the SciPy, returns only the T-value and the p-value when sometimes we want more explanation regarding the data. In the Pingouin package, the calculation … Read more Accelerate Complicated Statistical Test for Data Scientist with Pingouin
[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A month ago we finished Why R? 2020 conference. We had an pleasure … Read more Roger Bivand – Applied Spatial Data Analysis with R – retrospect and prospect
[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. What is beakr? beakr is an unopinionated and minimalist web framework for developing … Read more beakr – A small web framework for R
October 31 is World Cities Day, established to educate the public on issues of concern, to mobilize political will and resources to address global problems, and to celebrate and reinforce achievements of humanity. This year’s theme ‘Better City, Better Life: Valuing Our Communities and Cities’ is expected to greatly promote the international community’s interest in … Read more World Cities Day 2020: Committing to sustainable urban environments
No matter where you are in the world, we’ve all had to adapt our routines to navigate the “new normal” brought on by COVID-19. We’ve been inspired by the ways businesses and organizations have responded, from accelerating medical research using the latest AI technology to enabling remote healthcare to protect doctors and their patients. As … Read more These inspiring small and medium businesses are helping the world navigate COVID-19 togetherThese inspiring small and medium businesses are helping the world navigate COVID-19 togetherHead of EMEA SMB Sales, Google Cloud
Rendezvous of Python, SQL, Spark, and Distributed Computing making Machine Learning on Big Data possible Photo by Ben Weber on Unsplash Shilpa was immensely happy with her first job as a data scientist in a promising startup. She was in love with SciKit-Learn libraries and especially Pandas. It was fun to perform data exploration using … Read more PySpark
Now, we can move to the fun part, where we will write the edge detection function. You will be amazed at how simple it is using the OpenCV package. This OpenCV detection model is also known as the Canny edge detection model. Our function is consisting of three parts: edge detection, visualization, and lastly, saving … Read more Simple Edge Detection Model using Python
I recently adopted an extensive collection of notebooks that combined aid in the creation of analytics. It sounded like a daunting project to take on, but the more I read into the code, the more I realized it wasn’t all that bad. The notebooks looked overwhelming, but the code was relatively simple when broken down … Read more Keys to Success when Adopting a Pre-Existing Data Science Project
An introduction with a Python example Photo by Jon Tyson on Unsplash K-Nearest Neighbor (KNN) is an easy to understand, but essential and broadly applicable supervised machine learning technique. To understand the intuition behind KNN, examine the scatterplot below. The plot shows the relationship between two arbitrary dimensions, x and y. The blue points represent … Read more What is the K-Nearest Neighbor?
Our Design — Based on separate building blocks working together For our system to scale and accommodate more features, more models and eventually more forecasters we divided it to the above building blocks. Each building block stands on its own but also knows how to communicate and work with others. A short explanation of each … Read more Scaling Your Time Series Forecasting Project
A speedy intro to a Tableau feature that starts explaining the ‘Why’ of data points. No extra set up required. Image by chenspec from Pixabay One of the reasons data analysts create data visualizations is to find unexpected and unusual scenarios. The easier it is to get some more detailed information directly within the viz, … Read more Tableau’s AI-enabled Explain Data feature
Before entering into details of how it is used for making a prediction, let’s describe the different elements that compose a Compact Prediction Tree (CPT): A trie, for efficient storage of sequences. An inverted index, for constant time retrieving of sequences containing a certain word. A lookup table, for retrieving the sequence from the sequence … Read more Compact prediction tree
Now that the project has been deployed to production, it’s a good time to look it retrospectively and reflect on various things/lessons that I learnt from doing it. 1. The training process There should be 3 main stages in predicting the contents of a form using the supervised learning approach, detect the text elements (printed … Read more Extracting shift handover data from paper forms in a hospital environment
Developed the Algorithm Using a Real-World Dataset Logistic regression is a popular method since the last century. It establishes the relationship between a categorical variable and one or more independent variables. This relationship is used in machine learning to predict the outcome of a categorical variable. It is widely used in many different fields such … Read more A Complete Logistic Regression Algorithm From Scratch in Python: Step by Step