Source: Unsplash (They kind of suck) Convolutional Neural Networks (CNNs) have shown impressive state-of-the-art performance on multiple standard datasets, and no doubt they have been instrumental in the development and research acceleration around the field of image processing. There’s one problem: they kind of suck. Researchers often have a problem of getting too wrapped in … Read more We Need to Rethink Convolutional Neural Networks
Photo by JESHOOTS.COM on Unsplash Generative Adversarial Networks (GANs) are generative models. They generate whole images in parallel. GANs consist of 2 networks: Discriminator & Generator networks Image from Source: Google Developers. Licensed under the Creative Commons Attribution 4.0 License. GANs use a differentiable function. This is usually a neural network. We call it the … Read more Generative Adversarial Networks
R programming for Data Science by Roger d. Peng Description: This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Exploratory … Read more The Best Free Data Science eBooks — 2020 Update
Bar Chart Race GIF COVID-19 has crushed many countries for over eight months. Million cases has been confirmed, and the number keeps getting higher everyday. Thanks to Google who has provided the Corona dataset for us, publicly and FREE. It makes it possible for us to do our own analysis related to this pandemic, for … Read more Python Bar Chart Race Animation: COVID-19 Cases
How to access and map population data in Python Photo by Ryan Wilson on Unsplash Today I’m taking a look at the racial composition of Seattle, according to the 2010 Census. Towards this end, I’ll use Integrated Public Use Microdata Series (IPUMS) National Historical Geographic Information System (NHGIS). You can also use data.census.gov, which I … Read more Mapping Census Data
Biology is the study of living organisms. It’s huge. Live with it. Lucky for us though, your problem is much more specific. It’s an interesting question whether this is a function of the human preference for categorisation but if you go to the Wikipedia list of unsolved problems in biology you will see every problem … Read more Data Scientists: Think like Biologists!
“GPT-3 impressively explains the origin of everything” Kirk Ouimet article is a dialogue between himself and GPT-3, which is referred to as ‘Wise Being’. The content of the dialogue is around the origin of the Big Bang and other associated topics such as time, space and the Universe. I was truly expecting to be bored … Read more Interesting AI/ML Articles You Should Read This Week (Sep 19)
It’s the 15th February 2013. A bus-sized asteroid enters the atmosphere, bursts and explodes a few kilometres above the ground, causing a shock wave that damages property and injures several people in the Russian city Chelyabinsk. Why did that happen? Well… our cosmic vicinity is populated with thousands or hundred-thousands of so called minor bodies; … Read more Space Science with Python — Asteroid Project (Part 1)
Long Short Term Memory (LSTM) models are a powerful type of neural network ideally suited to predict time-dependent data. Rhine water levels fit right into this category: they vary over time, depending on a range of variables such as rain, temperatures and snow cover in the Alps. The Rhine is Europe’s lifeblood. For centuries it … Read more Using machine learning to predict Rhine water levels
Step 1: Install OptimalFlow: If you didn’t install OptimalFlow’s latest version, you should do that from PYPI, simply using PIP. To install OptimalFlow’s latest version, run this command in your terminal or prompt: pip install – upgrade optimalflow Step 2: Download Web App’s source code: There’re 2 ways to download its source code: Method 1 … Read more Build No-code Automated Machine Learning Model with OptimalFlow Web App
Before we jump into this, let’s explain what we need to have in place — I’ll be quick, promise! Setup preparation Amazon S3 Amazon S3 is a storage service allowing us to store and protect our data in directories (Buckets). We will need this service to go forward Buckets: is a container for objects stored … Read more Machine Learning on AWS SageMaker
We take the average value out of each layer called μB. This is called calculated as the sum of all values of layer x_i divided by average on all m values. Mean. Image by the author. We then calculate the variance σ²B as follows:1. Subtracting the μB from every value which is the deviation of … Read more What is batch normalization?
This post is in continuation with the one mentioned below. In the above post, I have presented some important programming takeaways to know and keep in mind while performing Machine Learning practices to make your implementation faster and effective. Following which we are going to see more of these hacks. Let us begin. The most … Read more ML Programming Hacks that every Data Engineer should know — Part 2
A bar graph 📊(also known as a bar chart or bar diagram) is a visual tool with that readers can compare data showed by bars among categories. In this story, I try to introduce how can we draw a clear bar plot with python. As a student or researcher, you have to publish your efforts … Read more How to draw a bar graph for your scientific paper with python
Data science enables many pretty amazing tasks for its practitioners, and changed our lives in many ways from small to big. When a business predicts demand for a product, when a company identifies fraudulent transactions online or when a streaming service recommends what to watch, data science is often the oil that enables these innovations. … Read more Learn linear regression using scikit-learn and NBA data: Data science with sports
In this guide, I want to show you how to make time-series predictions of revenues based on real-life retail data, for these tasks I will be using a very common library: Prophet, developed by scientists at Facebook. Why Prophet? According to Prophet GitHub page: “A tool for producing high-quality forecasts for time series data that … Read more An End-To-End Time Series Data Science Project That Will Boost your portfolio
To set the expectation, you must be aware and ready for the following: Genuine curiosity. You must be curious about the other person — just like how you are curious about insights in data as a Data Scientist. Be patient. Building relationships takes time and interactions. Don’t expect to get referrals after the first interaction … Read more How to get job referrals as a shy data scientist?
In this post, I am sharing my exploration with the Tabular Prediction (predicting target column of the tabular dataset using the remaining column) using Auto Machine Learning (AutoML), AutoGluon from AWS labs, and details around its internal working. AutoML frameworks provide enticing options as they remove the barriers for novice to train high-quality models and … Read more Tabular Prediction using Auto Machine Learning (AutoGluon)
The original structure is split into two branches. Beige branch: predicts the confidence map Blue branch: predicts the PAF Both branches are organized as an iterative prediction architecture. The predictions from the previous stage are concatenated with the original feature F to produce more refined predictions. New Structure Image taken from “Realtime Multi-Person 2D Pose … Read more OpenPose Research Paper Summary: Realtime Multi-Person 2D Pose Estimation
Credit card fraud detection is a plague that all financial institutions are at risk with. In general fraud detection is very challenging because fraudsters are coming up with new and innovative ways of detecting fraud in this digital world, so it is difficult to find a pattern that we can detect. For example, in the … Read more Credit card Fraud Detection with different sampling techniques
How to automate a machine learning workflow using Kubeflow Pipelines Why Machine Learning Pipelines? A lot of attention is being given now to the idea of Machine Learning Pipelines, which are meant to automate and orchestrate the various steps involved in training a machine learning model; however, it’s not always made clear what the benefits … Read more Machine Learning Pipelines with Kubeflow
I think this is the big question, but believe me, the answer is fairly simple. Although the proper answer should be something more thorough, I would like to mention only these two for simplicity: 1. Probability and Statistics (ProbStat) This is, I guess, what most of you would expect. Of course, you have to master … Read more Math for Machine Learning Motivation
Software development is the process followed by developers and programmers to design, write, document, and test codes. Regardless of what programming language you use or what your target application field is, following the specific guidelines of good software development is essential in building a high-quality, maintainable project. Data science projects — may be more than … Read more 4 Software Development Techniques to Level up Your Data Science Project
A step-by-step guide to build a text classifier with CNNs implemented in PyTorch. Photo by Shelby Miller on Unsplash “Deep Learning is more that adding layers” The objective of this blog is to develop a step by step text classifier by implementing convolutional neural networks. So, this blog is divided into the following sections: Introduction … Read more Text Classification with CNNs in PyTorch
As we can see, a number of variables differ significantly between the Churn and Non-Churn group, so this dataset likely holds a good deal of useful intelligence. Training and Testing Data As with all Machine Learning models, we will split the data set into two parts: training and testing. We will use the training data … Read more Predicting and Preventing the Churn of High Value Customers Using Machine Learning
A practical example of Movies Recommendation with Recommender Systems Photo by Pankaj Patel on Unsplash Nowadays, almost every company applies Recommender Systems (RecSys) which is a subclass of information filtering system that seeks to predict the “ rating” or “ preference “ a user would give to an item. They are primarily used in commercial … Read more How to run Recommender Systems in Python
Business Intelligence (BI) is a set of methodologies and resources (theoretical concepts, algorithms, software, tools, technologies) used in business environments and whose fundamental objective is to transform information into knowledge. The purpose of Business Intelligence is to support better business decision making to improve the productivity and performance of any business organization. BI relies heavily … Read more Gauge & Bullet Charts
Introduction Python, R, SAS, and SQL Matplotlib, Seaborn, tqdm sklearn, NumPy, and pandas Jupyter nbextensions Tableau and Google Data Studio Summary References The goal of this article is to give a general overview of the top Data Science tools and languages. I have either used these the most frequently out of others or have worked … Read more Top Data Science Tools and Languages
Use Knowi to connect to Couchbase, run queries, analyze & visualize your query results, and ask questions with search-based analytics. Photo by Corinne Kutz on Unsplash Couchbase is a powerful NoSQL database that empowers enterprises with the ability to store and query large collections of unstructured data. Couchbase’s scalability, flexible data model, and performance rate … Read more Analyzing & Visualizing Couchbase Data | Medium
Let us begin right off. It is an object in Python that can be used in matrix slicing in Numpy package and also for generic list slicing as well. The main aim of this object is to make multidimensional array handling easier. The multiple indices in the Numpy array can be replaced with … (Ellipsis … Read more Programming Hacks that every Data Engineer should know — Part 1
Step 1: Train Test Split: Split the given preprocessed dataset into train and test data, the training data can be used to train the model and testing data is kept as isolated to evaluate the performance of the final model. (Image by Author), Split of Data into Train and Test data This is not a … Read more Nested Cross-Validation — Hyperparameter Optimization and Model Selection
What tree models can see, and what they can’t When you understand how a model works, it becomes much easier to create successful features. It is because you can reason about the model’s strong and weak sides and prepare features accordingly. Let’s take a look together at what features can be understood by a tree-based … Read more Better Features for a Tree-Based Model
A lot of events in our daily life follow the binomial distribution that describes the number of successes in a sequence of independent Bernoulli experiments. For example, assuming that the probability of James Harden making his shot is constant and each shot is independent, the number of field goals follows the binomial distribution. If we … Read more Why Is Logistic Regression the Spokesperson of Binomial Regression Models?
Generate Hand-Drawn Sketches With the Sketch-RNN Neural Network Image by author based on a photo by Chunlea Ju on Unsplash You can’t teach an old dog new tricks, but maybe you can teach a neural network to draw a picture of a cat? The other day, when I was perusing the always interesting Wolfram Neural … Read more How To Draw a Cat and Other Silly Things With the Wolfram Language
Along with worked through examples The expected value of a random variable is the weighted average of all possible values of the variable. The weight here means the probability of the random variable taking a specific value. What is the expected value of the length of a carrot? The random variable here is the length … Read more Expected Value of Random Variables -Explained Simply
Before jumping into Exploratory Data Analysis (EDA) you should always take a first look at the data. You should check few things — data volume, nature of different data coming from various sources, compatibility of the data sets, the mapping between them, data quality, data consistency — and most importantly the meaning of each variable … Read more How Data Scientists Build Machine Learning Models in Real Life
A high-level structural overview of classical Reinforcement Learning algorithms Image from https://unsplash.com/photos/iar-afB0QQw Reinforcement Learning (RL) is an increasing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence, since it has gained great popularity in the last years with a lot of successful real-world applications in robotics, games and many other … Read more Introduction to Reinforcement Learning
Source:Unsplash Learn how you can import cost data into Google Analytics manually, using an add-on for Google Sheets, with the API and Apps Script, and with out-of-the-box solutions If you use multiple ad services and platforms to advertise your products, it’s a good idea to combine all ad data in a single interface. This brings … Read more 4 Ways to Import Cost Data into Google Analytics
Using talib and yfinance Photo by NASA on Unsplash Machine Learning is computationally intensive, as the algorithm is not deterministic and therefore must be constantly tweaked over time. However, technical indicators are much quicker, as the equations do not change. This therefore improves their ability to be used for real-time trading. To create a program … Read more Algorithmic Trading with RSI using Python
What kind of story does the crime data tell about NYC in 2020 so far? These days, the crime rate in New York City got very conflicting ideas. Some politicians said violent crime is rampaging in the big apple, but some others said it is safer than ever before due to the lockdown. With the … Read more Overlooking Crime in New York City amid the Pandemic and Protests
Like in any other machine learning algorithm, preparing data is probably the most important step you can take towards anomaly detection. On the positive side though, you’ll likely use only one column at a time. So unlike hundreds of features in other machine learning techniques, you can focus on only one column that is being … Read more Time series anomaly detection with “anomalize” library
Let’s now look at some of the useful sites for finding open and publicly available datasets, quickly and without much hassle. Screenshot of the Google Dataset Search page (Image by Author) Google Dataset Search is a search engine dedicated to finding datasets. It is a search engine over metadata from data providers. This implies that … Read more Useful sites for finding datasets for Data Analysis tasks
PCA in 2 Dimensions; All images are generated by the author What it is, Why it’s useful, and How to use it In real world data sets, many of our variables are unimportant or correlated with each other. If we are doing a supervised machine learning task, leaving in variables unrelated to our target variable … Read more Principal Components Analysis Explained
In the first image, we try to fit the data using a linear equation. The model is rigid and not at all flexible. Due to the low flexibility of a linear equation, it is not able to predict the samples (training data), therefore the error rate is high and it has a High Bias which … Read more Bias, Variance and How they are related to Underfitting, Overfitting
A curated list of the essential Java libraries in Java and JVM software development Photo by Min An from Pexels Java is the number one programming language in Business Application development. It is also one of the top programming languages. One of the key features of Java is that it has a feature-rich and vast … Read more Top 10 Libraries every Java Developer should know
A Data Analysis Based On Historical Storm Trajectories Photo by Shashank Sahay on Unsplash We’re in the midst of a very active hurricane season with hurricane Sally making landfall last night and several other tropical storms brewing in the Atlantic Ocean. The big question on everyone’s mind is always: “Will the next hurricane hit close … Read more Will The Next Hurricane Hit My Home?
Photo by Alexandre Debiève on Unsplash The development of a model involves a lot of repetitive and tedious tasks inside the Model Development Life Cycle(MDLC), such as tuning the hyper-parameters, generating and selecting features. These tasks consume a lot of time during the development as they are iterative and various permutations and combinations have to … Read more Will AutoML take away my job? What is it?
“Continuous learning ability is one of the hallmarks of human intelligence.” — Lifelong Machine Learning As the deep learning community aims to bridge the gap between human and machine intelligence, the need for agents that can adapt to continuously evolving environments is growing more than ever. This was evident at the ICML 2020 which hosted … Read more Continual learning — where are we?
We all love Python, but how often do we use which mighty functionality? An article about my quest to figure it out The most mentioned Python functions mentioned inside Pythonrepositories calculated via GitHub commits. Image by Author The other day while I was running some zip() with some lists through a map(). I couldn’t stop … Read more My Odyssey, Finding The Most Popular Python Function
This updated equation, which fully describes the probabilities of moving between any two spaces on the board, is fairly easy to solve, since all terms for P(R) and P(M|R) have been determined earlier (shown in Figure 2 and Table 1). Transition matrix For any given space i, there are a total of 40 different destinations … Read more Oh, the Places You’ll Go in Monopoly