How to create a DataBlock for Multispectral Satellite Image Segmentation with the Fastai-v2…

Enjoy sate-of-the art results in your Earth Observation tasks with simple coding and the Fastai’s deep learning API. Water pixels segmentation example in Jirau reservoir (Brazil), using U-Net architecture and Fastai-v2. Fastai is an open source deep learning library that adds higher level functionalities to PyTorch and makes it easier to achieve state-of-the-art results with … Read more How to create a DataBlock for Multispectral Satellite Image Segmentation with the Fastai-v2…

Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

5. Model Implementation Within this section, we will implement the AlexNet CNN architecture from scratch. Through the utilization of Keras Sequential API, we can implement consecutive neural network layers within our models that are stacked against each other. Here are the types of layers the AlexNet CNN architecture is composed of, along with a brief … Read more Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Assessing Statistical Rigor of Employee Surveys

The human resources industry relies heavily on a wide range of assessments to support its functions. In fact, to ensure unbiased and fair hiring practices the US department of labor maintains a set of guidelines (Uniform Guidelines) to aid HR professionals in their assessment development ventures. Personality assessments are often used in selection batteries to … Read more Assessing Statistical Rigor of Employee Surveys

Natural Language Processing (NLP): Don’t Reinvent the Wheel

Now that we have discussed pre-processing methods and Python libraries, let’s put it all together with a few examples. For each, I’ll cover a couple of NLP algorithms, pick one based on our rapid development goals, and create a simple implementation using one of the libraries. Application #1: Pre-Processing Pre-processing is a critical part of … Read more Natural Language Processing (NLP): Don’t Reinvent the Wheel

Visualizing the Philippines’ Population Density using GeoPandas

Population density is a crucial concept in urban planning. Theories on how it affects economic growth are divided. Some claim, as Rappaport does, that an economy is a form of “spatial equilibrium”: that net flows of residents and employment gradually move to be balanced with one another. The thought that density has some sort of … Read more Visualizing the Philippines’ Population Density using GeoPandas

How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Examining why logistic regressions are more suitable for classification problems than linear regressions Image by Trist’n Joseph It is no longer a question as to whether data science has taken over the world. Data has grown to become one of the world’s valuable resource, so much so that almost every decision made within big companies … Read more How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Building and Evaluating Classification ML Models

Must Read to Build Good Classification ML Models Photo by Martin Sanchez on Unsplash There are different types of problems in machine learning. Some might fall under regression (having continuous targets) while others might fall under classification (having discrete targets). Some might not have a target at all where you are just trying to learn … Read more Building and Evaluating Classification ML Models

Making a RNN model learn Arithmetic Operations

Text Prediction using RNN Problem: Given the phrase “34+17”, the model should predict the next word in the sequence “51”. The input and output is a sequence of characters which in turn an arithmetic expression of two numbers and its result. Thus our data is represented as a sequence of two words expression and result. … Read more Making a RNN model learn Arithmetic Operations

Manifold Learning [t-SNE, LLE, Isomap, +] Made Easy

The Heart of Dimensionality Reduction Principal Component Analysis is a powerful method, but it often fails in that it assumes that the data can be modelled linearly. PCA expressed new features as linear combinations of existing ones by multiplying each by a coefficient. To address the limitations of PCA, various techniques have been created by … Read more Manifold Learning [t-SNE, LLE, Isomap, +] Made Easy

How Machine Learning made AI forget about Knowledge Representation and Reasoning

A brief history of how I learned about the forgotten core of artificial intelligence. In my early days working as a data scientist in AI, I was taught one thing above all: you need more and better data to feed your learning systems. And I have dedicated myself to finding that data. I was lured … Read more How Machine Learning made AI forget about Knowledge Representation and Reasoning

Who is An Expert in Scientific Research?

Figures 2 and 3 present the percentiles of authors’ publications and citations, respectively. We see a similar positively skewed distribution on the CDF’s for citations and publications. Most authors have published less than 200 scientific papers and amassed less than 2500 citations across all of them. This makes intuitive sense; the majority of scientists are … Read more Who is An Expert in Scientific Research?

Extracting circles and long edges from images using OpenCV and Python

Using OpenCV for efficiently extracting objects of known shape from images Welcome to the first post in this series of blogs on extracting features from images using OpenCV and Python. Feature extraction from images and videos is a common problem in the field of Computer Vision. In this post, we will consider the task of … Read more Extracting circles and long edges from images using OpenCV and Python

Build your first Random Forest classifier

Exploratory Data Analysis Let’s do some basic data analysis. We are going to look at the distribution of variables using histograms first. import matplotlib.pyplot as plt plt.figure(figsize=(10,10)) df.hist() plt.tight_layout() What we can notice straight away is the fact that some variables are not continuous. Actually, only five features are continuous:’’age’, ‘chol’, ‘oldpeak’, ‘thalach’, ‘trestbps’ whereas … Read more Build your first Random Forest classifier

Introducing ObviouslyAI — No-Code Machine Learning Solution

I find it somewhat difficult to watch tools like this one automate machine learning, and decrease the need for machine learning engineers in small and medium-sized companies. The reasons are many, but the biggest is that the purpose of machine learning was to automate other professions, but we’ve managed to automate machine learning with machine … Read more Introducing ObviouslyAI — No-Code Machine Learning Solution

Human Emotion and Gesture Detector Using Deep Learning: Part-2

With this our exploratory data analysis (EDA) for our gestures dataset is completed. We can proceed to build the gestures training model for appropriate gestures prediction. Let us look at the code block below to understand the libraries we are importing as well as set the number of classes along with their dimensions and their … Read more Human Emotion and Gesture Detector Using Deep Learning: Part-2

Adversarial attacks on the human visual system

Summary of a research paper outlining how our vision systems can fall prey to adversarial attacks much like a Neural Net Image by Pixabay Neural Networks are exceptionally good at recognizing objects shown in an image and in many cases, they have shown superhuman levels of accuracy(E.g.-Traffic sign recognition). But they are also known to … Read more Adversarial attacks on the human visual system

Multiple linear regression with interactions unveiled by genetic programming

We all had some sort of experience with linear regression. It’s one of the most used regression techniques used. Why? Because it is simple to explain and it is easy to implement. But what happens when you have more than one variable? How can you deal with this increased complexity and still use an easy … Read more Multiple linear regression with interactions unveiled by genetic programming

How to extract tables from PDF files with Camelot

If you are on Windows make sure to install Ghostscrip from here. You can still install camelot without the prior installation of Ghostscript. But we will run into errors when trying to use camelot. conda install -c conda-forge camelot-py or pip install “camelot-py[cv]” or git clone https://www.github.com/camelot-dev/camelotcd camelotpip install “.[cv]” We will first import camelot … Read more How to extract tables from PDF files with Camelot

Origins of AutoML: Best Subset Selection

All Images are generated by the author And the Perils of Post-Selection Inference As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That leaves … Read more Origins of AutoML: Best Subset Selection

Croston forecast model for intermittent demand

Initial Idea In 1972, J.D. Croston published “Forecasting and Stock Control for Intermittent Demands,” an article that introduced a new technique to forecast products with intermittent demand. His idea could be summarized in three simple steps:– Evaluate the average demand level when there is a demand occurrence.– Evaluate the average time between two demand occurrences.– … Read more Croston forecast model for intermittent demand

Passing the Turning Point of AI Transformation

How you would benefit from Milvus, an open-source project for data scientists When talking about the open-source AI projects, people would think of the model framework projects like Google TensorFlow, PyTorch, etc. Since the model framework is the critical component while training the AI models, those projects usually receive the most attention. But Artificial Intelligence … Read more Passing the Turning Point of AI Transformation

Demystifying Interpolation Search

Summing up, learning how to apply Interpolation Search will enrich your knowledge not only about algorithms but also as a programmer. Undoubtedly, it will have a place in your programming toolkit. However, it should always be remembered: Similar to Binary Search, Interpolation Search only works in sorted arrays. You need to have direct access to … Read more Demystifying Interpolation Search

The Fundamentals of Reinforcement Learning

Markov decision process is the fundamental problem which we try to solve in reinforcement learning. But, what is the definition of Markov Decision Process? Markov Decision Process or MDP is a formulation of sequential interaction between agent and environment. Here, the learner and decision maker is called agent, where the thing it interacts with is … Read more The Fundamentals of Reinforcement Learning

Segmentation and Object Detection — Part 4

Single Shot Detectors Deep Learning at FAU. Image under CC BY 4.0 from the Deep Learning Lecture These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was … Read more Segmentation and Object Detection — Part 4

Bayesian Statistics: Metropolis-Hastings from scratch in Python

You are here If you’re reading this, odds are: (1) you’re interested in bayesian statistics but (2) you have no idea how Markov Chain Monte Carlo (MCMC) sampling methods work, and (3) you realize that all but the simplest, toy problems require MCMC sampling so you’re a bit unsure of how to move forward. Not … Read more Bayesian Statistics: Metropolis-Hastings from scratch in Python

So you are ready to hire a data scientist?

Throughout my 10-year career, I have seen people often spend their time and energy in passionate debates about what data science can deliver, and what data scientists do or do not do. I submit that these are the wrong questions to focus on when you are looking to hire for your data department. In actuality, … Read more So you are ready to hire a data scientist?

Create a Rust Client for ROS2 from Scratch. Part 0: Integrate C API to Create ROS2 Node

Rust uses FFI (Foreign Function Interface) to interact with C/C++, but if the library is large, writing FFI code becomes monotonous, since it involves simply copying and pasting every function from one place to another. We use the famous and useful bindgen crate in Rust to help us create the FFI code. To know more … Read more Create a Rust Client for ROS2 from Scratch. Part 0: Integrate C API to Create ROS2 Node

A list of things you can build with language models — not just GPT-3

Natural language processing is more approachable than ever Natural language processing (NLP) is everywhere lately, with OpenAI’s GPT-3 generating as much hype as we’ve ever seen from a single model. As I’ve written about before, the flood of projects being built on GPT-3 is not down to just its computational power, but its accessibility. The … Read more A list of things you can build with language models — not just GPT-3

Movie Recommendation System based on MovieLens

As the previous code snippet shows, I created the user/movie profile based on the existing users’ rating records in history. It has not entirely solved the cold start problem yet nevertheless because the system still has no idea what to do for the new users or with the new movies. I will tell you how … Read more Movie Recommendation System based on MovieLens

Introduction to Survival Analysis

Source: pixabay Understand the basic concepts of survival analysis and what tasks it can be used for! In our extremely competitive times, all businesses face the problem of customer churn/retention. To quickly give some context, churn happens when the customer stops using the services of a company (stops purchasing, cancels the subscription, etc.). Retention refers … Read more Introduction to Survival Analysis

The Ultimate Out-of-the-box Automated Python Model Selection Methods

2. Pycaret PyCaret is simple and easy to use sequential pipeline including a well integrate preprocessing functions with hyperparameters tuning and train models ensembling. #import libraries!pip install pycaretfrom pycaret.classification import *#open the datasetdfn = pd.read_csv(‘../input/preprocess-choc/dfn.csv’)#define target label and parametersexp1 = setup(dfn, target = ‘review_date’, feature_selection = True) Pycaret preprocessing functions All the preprocessing steps are … Read more The Ultimate Out-of-the-box Automated Python Model Selection Methods

The Box Plot Guide I Wish I Had When I Started Learning R

Informative data visualization is an arduous task in R! I provide useful tips for customizing your plots to make them more informative and colour-blind friendly. Customizing a graph to transform it into a beautiful figure in R isn’t alchemy. Nonetheless, it took me a lot of time (and frustration) to figure out how to make … Read more The Box Plot Guide I Wish I Had When I Started Learning R

Three Methods for Comparing Text Data to Instantly Boost the Impact of Your Analysis

We’ll use gensim, nltk, and spaCy to create and visualize tokenized vectors, cosine similarities, and entity-target relationships for Indeed Data Analyst job postings. (The data and full Jupyter notebook walk-through can be found here.) If you’re looking for a job as a data analyst or scientist, as well as trying to learn NLP — get … Read more Three Methods for Comparing Text Data to Instantly Boost the Impact of Your Analysis

Scalable Machine Learning with Dask on Google Cloud

Dask has been reviewed by many and compared to various other tools, including Spark, Ray and Vaex. Developed in coordination with other community projects like Numpy, Pandas, and Scikit-Learn, it is definitely a great tool for scaling machine learning. Hence, the purpose of this article is not to compare the pros and cons of Dask … Read more Scalable Machine Learning with Dask on Google Cloud

Modeling with Bayesian Networks

From an introduction to modeling for medical diagnosis Photo by National Cancer Institute on Unsplash I am feeling sick. Fever. Cough. Stuffy nose. And it’s wintertime. Do I have the flu? Likely. Plus I have muscle pain. More likely. Bayesian networks are great for these types of inferences. We have variables, some whose values have … Read more Modeling with Bayesian Networks

Using FCNN Receptive Fields for Object Detection

Create an Object Detector from a pretrained Image Classifier in PyTorch The task of Object Detection is to detect and spatially identify (using bounding boxes etc.) various objects in an image, whereas Image Classification tells whether or not an image contains certain objects without any notion of where exactly they are located. In this post, … Read more Using FCNN Receptive Fields for Object Detection

Which Countries are Concerned about Climate Change? — Mining BitTorrent

The first task is data engineering. I would not repeat the part where the BitTorrent network is mined using the distributed hash table and the other tasks related to that, which is covered in an earlier article. Towards Torrent Science: Data Engineering Mining BitTorrent networks towardsdatascience.com The BitTorrent sites like thepiratebay.org consist of torrents for … Read more Which Countries are Concerned about Climate Change? — Mining BitTorrent

Use Data Brick to verify Azure Explore (Kusto) Data Duplication issue

Connect to Kusto Cluster Python has packages to connect to Kusto: Azure Data Explorer Python SDK. Here, we use package: azure-kusto-data. The following code snippet would allow us to create the KustoClient. It is used to query Kusto Cluster. Before we connect to Kusto, we need to create the AppId and register it with the … Read more Use Data Brick to verify Azure Explore (Kusto) Data Duplication issue

How To Come Up With Amazing AI, ML or Data Science Project Ideas.

Creativity involves breaking out of expected patterns in order to look at things in a different way. — Edward de Bono To come up with awesome and unique AI, ML, or data science ideas, one must be creative and willing to attempt things that have never been done. You should have the desire to birth … Read more How To Come Up With Amazing AI, ML or Data Science Project Ideas.

Social media and topic modeling: how to analyze posts in practice

Practical use of topic modeling We discuss the ways in which topic modeling can be used to analyze various text posts on social media and the benefits of doing so. Image courtesy of Alina Grubnyak on Unsplash There is a substantial amount of data generated on the internet every second — posts, comments, photos, and … Read more Social media and topic modeling: how to analyze posts in practice

Getting NASA data for your next geo-project

NASA provides an extensive library of data points that they’ve captured over the years from their satellites. These datasets include temperature, precipitation and more. NASA hosts this data on a website where you can search and grab information as needed, whether you want the data for the whole world or a specific area. The user … Read more Getting NASA data for your next geo-project

Lambda Functions with Example and Error Handling

Error Handling When you misuse a function, it should throw you an error. For example, check out the function float that returns a floating-point from a number or string. When you pass the function float an integer value, the corresponding float is returned. Similarly, if you give it the string 8.7 it will return the … Read more Lambda Functions with Example and Error Handling

Hidden Gems: Finding the Best Secret Trails in America

With Data Science (photo credit: David Marcu via unsplash) There are plenty of reasons why one would want to find solitude in the wilderness, from the therapeutic effects of being immersed in nature, to not wanting to contribute to trail degradation and soil erosion on busier trails. Now more than ever the reprieve of the … Read more Hidden Gems: Finding the Best Secret Trails in America

Columnar Stores — When/How/Why?

Obvious, but let’s spell it out: Bad space utilization (numbers as strings waste space) No type or structural checking (chars could wind up in numeric fields) CSV — no metadata/header info, JSON — repeated meta/formatting No native compression of repeating values No native indexing /search ability Note CSV and JSON have distinct advantages of being … Read more Columnar Stores — When/How/Why?

Crime Rate Prediction using Facebook Prophet

A Guideline to Make the Best Use of FB Prophet for Time Series Forecasting Img from unsplash via link Time series prediction is one of the must-know techniques for any data scientist. Questions like predicting the weather, product sales, customer visit in the shopping center, or amount of inventory to maintain, etc – all about … Read more Crime Rate Prediction using Facebook Prophet