Unraveling churn and its challenges

Photo by Jornada Produtora on Unsplash Understand why and how to mitigate your business churn rates Relationship management is one of the determining factors in the business health. One of the most important factors of this connection is the ability to identify when a customer is likely to cancel a service. For that reason, it … Read more Unraveling churn and its challenges

How to create a DataBlock for Multispectral Satellite Image Segmentation with the Fastai-v2…

Enjoy sate-of-the art results in your Earth Observation tasks with simple coding and the Fastai’s deep learning API. Water pixels segmentation example in Jirau reservoir (Brazil), using U-Net architecture and Fastai-v2. Fastai is an open source deep learning library that adds higher level functionalities to PyTorch and makes it easier to achieve state-of-the-art results with … Read more How to create a DataBlock for Multispectral Satellite Image Segmentation with the Fastai-v2…

Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

5. Model Implementation Within this section, we will implement the AlexNet CNN architecture from scratch. Through the utilization of Keras Sequential API, we can implement consecutive neural network layers within our models that are stacked against each other. Here are the types of layers the AlexNet CNN architecture is composed of, along with a brief … Read more Implementing AlexNet CNN Architecture Using TensorFlow 2.0+ and Keras

Assessing Statistical Rigor of Employee Surveys

The human resources industry relies heavily on a wide range of assessments to support its functions. In fact, to ensure unbiased and fair hiring practices the US department of labor maintains a set of guidelines (Uniform Guidelines) to aid HR professionals in their assessment development ventures. Personality assessments are often used in selection batteries to … Read more Assessing Statistical Rigor of Employee Surveys

Natural Language Processing (NLP): Don’t Reinvent the Wheel

Now that we have discussed pre-processing methods and Python libraries, let’s put it all together with a few examples. For each, I’ll cover a couple of NLP algorithms, pick one based on our rapid development goals, and create a simple implementation using one of the libraries. Application #1: Pre-Processing Pre-processing is a critical part of … Read more Natural Language Processing (NLP): Don’t Reinvent the Wheel

Visualizing the Philippines’ Population Density using GeoPandas

Population density is a crucial concept in urban planning. Theories on how it affects economic growth are divided. Some claim, as Rappaport does, that an economy is a form of “spatial equilibrium”: that net flows of residents and employment gradually move to be balanced with one another. The thought that density has some sort of … Read more Visualizing the Philippines’ Population Density using GeoPandas

How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Examining why logistic regressions are more suitable for classification problems than linear regressions Image by Trist’n Joseph It is no longer a question as to whether data science has taken over the world. Data has grown to become one of the world’s valuable resource, so much so that almost every decision made within big companies … Read more How Social Media Companies Know What to Show You: Hello Logistic Regressions!

Building and Evaluating Classification ML Models

Must Read to Build Good Classification ML Models Photo by Martin Sanchez on Unsplash There are different types of problems in machine learning. Some might fall under regression (having continuous targets) while others might fall under classification (having discrete targets). Some might not have a target at all where you are just trying to learn … Read more Building and Evaluating Classification ML Models

Making a RNN model learn Arithmetic Operations

Text Prediction using RNN Problem: Given the phrase “34+17”, the model should predict the next word in the sequence “51”. The input and output is a sequence of characters which in turn an arithmetic expression of two numbers and its result. Thus our data is represented as a sequence of two words expression and result. … Read more Making a RNN model learn Arithmetic Operations

Manifold Learning [t-SNE, LLE, Isomap, +] Made Easy

The Heart of Dimensionality Reduction Principal Component Analysis is a powerful method, but it often fails in that it assumes that the data can be modelled linearly. PCA expressed new features as linear combinations of existing ones by multiplying each by a coefficient. To address the limitations of PCA, various techniques have been created by … Read more Manifold Learning [t-SNE, LLE, Isomap, +] Made Easy

How Machine Learning made AI forget about Knowledge Representation and Reasoning

A brief history of how I learned about the forgotten core of artificial intelligence. In my early days working as a data scientist in AI, I was taught one thing above all: you need more and better data to feed your learning systems. And I have dedicated myself to finding that data. I was lured … Read more How Machine Learning made AI forget about Knowledge Representation and Reasoning

Who is An Expert in Scientific Research?

Figures 2 and 3 present the percentiles of authors’ publications and citations, respectively. We see a similar positively skewed distribution on the CDF’s for citations and publications. Most authors have published less than 200 scientific papers and amassed less than 2500 citations across all of them. This makes intuitive sense; the majority of scientists are … Read more Who is An Expert in Scientific Research?

Amazon API Gateway now supports enhanced observability via access logs

Beginning today, customers can configure their HTTP, REST, and WebSocket APIs to include new variables in their access logs that provide enhanced observability of how API Gateway processes requests. The new access log variables provide customers with the information they need to troubleshoot issues with their API’s configuration, including latencies and status codes for each … Read more Amazon API Gateway now supports enhanced observability via access logs

Extracting circles and long edges from images using OpenCV and Python

Using OpenCV for efficiently extracting objects of known shape from images Welcome to the first post in this series of blogs on extracting features from images using OpenCV and Python. Feature extraction from images and videos is a common problem in the field of Computer Vision. In this post, we will consider the task of … Read more Extracting circles and long edges from images using OpenCV and Python

Build your first Random Forest classifier

Exploratory Data Analysis Let’s do some basic data analysis. We are going to look at the distribution of variables using histograms first. import matplotlib.pyplot as plt plt.figure(figsize=(10,10)) df.hist() plt.tight_layout() What we can notice straight away is the fact that some variables are not continuous. Actually, only five features are continuous:’’age’, ‘chol’, ‘oldpeak’, ‘thalach’, ‘trestbps’ whereas … Read more Build your first Random Forest classifier

Introducing ObviouslyAI — No-Code Machine Learning Solution

I find it somewhat difficult to watch tools like this one automate machine learning, and decrease the need for machine learning engineers in small and medium-sized companies. The reasons are many, but the biggest is that the purpose of machine learning was to automate other professions, but we’ve managed to automate machine learning with machine … Read more Introducing ObviouslyAI — No-Code Machine Learning Solution

Human Emotion and Gesture Detector Using Deep Learning: Part-2

With this our exploratory data analysis (EDA) for our gestures dataset is completed. We can proceed to build the gestures training model for appropriate gestures prediction. Let us look at the code block below to understand the libraries we are importing as well as set the number of classes along with their dimensions and their … Read more Human Emotion and Gesture Detector Using Deep Learning: Part-2

Adversarial attacks on the human visual system

Summary of a research paper outlining how our vision systems can fall prey to adversarial attacks much like a Neural Net Image by Pixabay Neural Networks are exceptionally good at recognizing objects shown in an image and in many cases, they have shown superhuman levels of accuracy(E.g.-Traffic sign recognition). But they are also known to … Read more Adversarial attacks on the human visual system

6 tips for database pros to adapt to cloud data warehouses6 tips for database pros to adapt to cloud data warehousesStrategic Cloud EngineerCustomer Engineer

Data warehouse modernization affects both people and technology. For IT professionals like database administrators (DBAs) who run legacy architectures, using a cloud data warehouse means you no longer have to spend the bulk of your time on tedious activities like testing, patching, upgrading, and managing backups. It can also free you from the impossible task … Read more 6 tips for database pros to adapt to cloud data warehouses6 tips for database pros to adapt to cloud data warehousesStrategic Cloud EngineerCustomer Engineer

Building a Scalable Data Strategy with IPTOP: Infrastructure, People, Tools, Organization, and Processes

Many organizations today are adopting data science practices as part of their digital transformation initiatives. However, most of them won’t reap the rewards of mining their data without a data strategy and a clear-cut blueprint for scaling data science within their organization. McKinsey found that only eight out of 1,000 companies undergoing digital transformation initiatives … Read more Building a Scalable Data Strategy with IPTOP: Infrastructure, People, Tools, Organization, and Processes

Multiple linear regression with interactions unveiled by genetic programming

We all had some sort of experience with linear regression. It’s one of the most used regression techniques used. Why? Because it is simple to explain and it is easy to implement. But what happens when you have more than one variable? How can you deal with this increased complexity and still use an easy … Read more Multiple linear regression with interactions unveiled by genetic programming

How to extract tables from PDF files with Camelot

If you are on Windows make sure to install Ghostscrip from here. You can still install camelot without the prior installation of Ghostscript. But we will run into errors when trying to use camelot. conda install -c conda-forge camelot-py or pip install “camelot-py[cv]” or git clone https://www.github.com/camelot-dev/camelotcd camelotpip install “.[cv]” We will first import camelot … Read more How to extract tables from PDF files with Camelot

Origins of AutoML: Best Subset Selection

All Images are generated by the author And the Perils of Post-Selection Inference As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That leaves … Read more Origins of AutoML: Best Subset Selection

Croston forecast model for intermittent demand

Initial Idea In 1972, J.D. Croston published “Forecasting and Stock Control for Intermittent Demands,” an article that introduced a new technique to forecast products with intermittent demand. His idea could be summarized in three simple steps:– Evaluate the average demand level when there is a demand occurrence.– Evaluate the average time between two demand occurrences.– … Read more Croston forecast model for intermittent demand

Data Science Framework – YUNA elements free for download

A few weeks ago we released our new framework and, together with our customers, we have already implemented exciting use cases. YUNA elements is a lean, easy to learn, secure and scalable framework for the development, organization and administration of script-based data processing and analysis projects for the languages R, Python and JULIA. Besides the … Read more Data Science Framework – YUNA elements free for download

RvsPython #4: A Basic Search on Amazon.ca using R and Python

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This post was inspired by one of Thomas Neitmann’s posts showing … Read more RvsPython #4: A Basic Search on Amazon.ca using R and Python

Passing the Turning Point of AI Transformation

How you would benefit from Milvus, an open-source project for data scientists When talking about the open-source AI projects, people would think of the model framework projects like Google TensorFlow, PyTorch, etc. Since the model framework is the critical component while training the AI models, those projects usually receive the most attention. But Artificial Intelligence … Read more Passing the Turning Point of AI Transformation

Demystifying Interpolation Search

Summing up, learning how to apply Interpolation Search will enrich your knowledge not only about algorithms but also as a programmer. Undoubtedly, it will have a place in your programming toolkit. However, it should always be remembered: Similar to Binary Search, Interpolation Search only works in sorted arrays. You need to have direct access to … Read more Demystifying Interpolation Search

The Fundamentals of Reinforcement Learning

Markov decision process is the fundamental problem which we try to solve in reinforcement learning. But, what is the definition of Markov Decision Process? Markov Decision Process or MDP is a formulation of sequential interaction between agent and environment. Here, the learner and decision maker is called agent, where the thing it interacts with is … Read more The Fundamentals of Reinforcement Learning

Segmentation and Object Detection — Part 4

Single Shot Detectors Deep Learning at FAU. Image under CC BY 4.0 from the Deep Learning Lecture These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was … Read more Segmentation and Object Detection — Part 4

Bayesian Statistics: Metropolis-Hastings from scratch in Python

You are here If you’re reading this, odds are: (1) you’re interested in bayesian statistics but (2) you have no idea how Markov Chain Monte Carlo (MCMC) sampling methods work, and (3) you realize that all but the simplest, toy problems require MCMC sampling so you’re a bit unsure of how to move forward. Not … Read more Bayesian Statistics: Metropolis-Hastings from scratch in Python

So you are ready to hire a data scientist?

Throughout my 10-year career, I have seen people often spend their time and energy in passionate debates about what data science can deliver, and what data scientists do or do not do. I submit that these are the wrong questions to focus on when you are looking to hire for your data department. In actuality, … Read more So you are ready to hire a data scientist?

Create a Rust Client for ROS2 from Scratch. Part 0: Integrate C API to Create ROS2 Node

Rust uses FFI (Foreign Function Interface) to interact with C/C++, but if the library is large, writing FFI code becomes monotonous, since it involves simply copying and pasting every function from one place to another. We use the famous and useful bindgen crate in Rust to help us create the FFI code. To know more … Read more Create a Rust Client for ROS2 from Scratch. Part 0: Integrate C API to Create ROS2 Node

A list of things you can build with language models — not just GPT-3

Natural language processing is more approachable than ever Natural language processing (NLP) is everywhere lately, with OpenAI’s GPT-3 generating as much hype as we’ve ever seen from a single model. As I’ve written about before, the flood of projects being built on GPT-3 is not down to just its computational power, but its accessibility. The … Read more A list of things you can build with language models — not just GPT-3

Movie Recommendation System based on MovieLens

As the previous code snippet shows, I created the user/movie profile based on the existing users’ rating records in history. It has not entirely solved the cold start problem yet nevertheless because the system still has no idea what to do for the new users or with the new movies. I will tell you how … Read more Movie Recommendation System based on MovieLens

Introduction to Survival Analysis

Source: pixabay Understand the basic concepts of survival analysis and what tasks it can be used for! In our extremely competitive times, all businesses face the problem of customer churn/retention. To quickly give some context, churn happens when the customer stops using the services of a company (stops purchasing, cancels the subscription, etc.). Retention refers … Read more Introduction to Survival Analysis

The Ultimate Out-of-the-box Automated Python Model Selection Methods

2. Pycaret PyCaret is simple and easy to use sequential pipeline including a well integrate preprocessing functions with hyperparameters tuning and train models ensembling. #import libraries!pip install pycaretfrom pycaret.classification import *#open the datasetdfn = pd.read_csv(‘../input/preprocess-choc/dfn.csv’)#define target label and parametersexp1 = setup(dfn, target = ‘review_date’, feature_selection = True) Pycaret preprocessing functions All the preprocessing steps are … Read more The Ultimate Out-of-the-box Automated Python Model Selection Methods

AWS Lambda now supports custom runtimes on Amazon Linux 2

Amazon Linux 2 provides a secure, stable, and high performance execution environment to develop and run cloud-native applications. With Amazon Linux 2, you get an application environment that offers long term support with access to the latest innovations in the Linux ecosystem, at no additional charge.  To get started, upload your code through the AWS … Read more AWS Lambda now supports custom runtimes on Amazon Linux 2

The Box Plot Guide I Wish I Had When I Started Learning R

Informative data visualization is an arduous task in R! I provide useful tips for customizing your plots to make them more informative and colour-blind friendly. Customizing a graph to transform it into a beautiful figure in R isn’t alchemy. Nonetheless, it took me a lot of time (and frustration) to figure out how to make … Read more The Box Plot Guide I Wish I Had When I Started Learning R

Three Methods for Comparing Text Data to Instantly Boost the Impact of Your Analysis

We’ll use gensim, nltk, and spaCy to create and visualize tokenized vectors, cosine similarities, and entity-target relationships for Indeed Data Analyst job postings. (The data and full Jupyter notebook walk-through can be found here.) If you’re looking for a job as a data analyst or scientist, as well as trying to learn NLP — get … Read more Three Methods for Comparing Text Data to Instantly Boost the Impact of Your Analysis

Scalable Machine Learning with Dask on Google Cloud

Dask has been reviewed by many and compared to various other tools, including Spark, Ray and Vaex. Developed in coordination with other community projects like Numpy, Pandas, and Scikit-Learn, it is definitely a great tool for scaling machine learning. Hence, the purpose of this article is not to compare the pros and cons of Dask … Read more Scalable Machine Learning with Dask on Google Cloud

Modeling with Bayesian Networks

From an introduction to modeling for medical diagnosis Photo by National Cancer Institute on Unsplash I am feeling sick. Fever. Cough. Stuffy nose. And it’s wintertime. Do I have the flu? Likely. Plus I have muscle pain. More likely. Bayesian networks are great for these types of inferences. We have variables, some whose values have … Read more Modeling with Bayesian Networks