October 2021: “Top 40” New CRAN Packages

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One hundred forty-one new packages made it to CRAN in October. Here … Read more

Categories R Tags ExcerptFavorite

Complete Guide to Perform Classification of Tweets with SpaCy

After preparing all the tokenized sentences, we now can use train the models using are preprocessed data. The models I chose can be separated into two categories. The first one is statistical language models, including Naive Bayes, logistic regression, and support vector machines (SVM), and the second one is neural language models, including CNN and … Read more

Why I Chose the MacBook Air over the MacBook Pro as a Data Scientist

Note: There are other differences between the Air and Pro that I didn’t include in the table above. Some of them might influence your decision. I’ll discuss those differences later in the article. MacBook Air summary: The cheapest laptop in Apple’s lineup (although starting at $999, it’s still pretty expensive). It’s also the smallest and … Read more

Enhance Your Plotly Express Scatter Plot With Marginal Plots

Display Extra Information on your Plotly Express Scatter Plots Plotly Express Scatter Plot of well log data illustrating marginal plots on both axes. Image by the author. Scatter plots are a commonly used data visualisation tool within data science. They allow us to plot two numerical variables, as points, on a two dimensional graph. From … Read more

Solutions against overfitting for machine learning on tabular data

In this article, I will present an overview of solutions against overfitting, that apply to tabular data and classical Machine Learning. When comparing classical machine learning with deep learning, the latter is generally considered more complex. For the problem of overfitting, the contrary is the case: many tools and tricks are easily available for avoiding … Read more

Five Unexpected Behaviours of Python Could Be Surprised

Some cold knowledge about Python you need to know Every programming language may have some interesting facts or mysterious behaviours, so does Python. In fact, as a dynamic programming language, there are even more interesting behaviours in Python. I would bet most of the developers may never experience one of these scenarios because most of … Read more

Exploring stacks and queues

Supercharge your programs with two highly useful tools Photo by Nathan Dumlao on Unsplash In our last post, we covered data structures, or the ways that programming languages store data in memory. We touched upon abstract data types, theoretical entities that are implemented via data structures. The concept of a “vehicle” can be viewed as … Read more

Data Visualization Before Machine Learning

Do you ever ask yourself why your machine learning model isn’t used? Why do so few people really believe in the power of machine learning rather than these old dashboards? When I was working in a football club, I made a data visualization showing player performances during the season. It was a really simple tile … Read more

Four Deep Learning Papers to Read in December 2021

From Sensory Substitution to Decision Transformers, Persistent Evolution Strategies and Sharpness-Aware Minimization Welcome to the December edition of the ‚Machine-Learning-Collage‘ series, where I provide an overview of the different Deep Learning research streams. So what is a ML collage? Simply put, I draft one-slide visual summaries of one of my favourite recent papers. Every single … Read more

How to Benefit from the Semi-Supervised Learning with Label Spreading Algorithm

If you are already familiar with the Label Propagation algorithm, you may want to know about the two ways that Label Spreading differs from it. If you are not familiar with Label Propagation, then feel free to skip to the next section. Symmetric normalized Laplacian vs. random walk normalized Laplacian The Label Spreading algorithm uses … Read more

Vector Autoregressive Model (VAR) using R

#========================================================# # Quantitative ALM, Financial Econometrics & Derivatives  # ML/DL using R, Python, Tensorflow by Sang-Heon Lee  # # https://kiandlee.blogspot.com #——————————————————–# # Vector Autoregressive Model #========================================================# graphics.off()  # clear all graphs rm(list = ls()) # remove all files from your workspace library(urca) # ca.jo, denmark library(vars) # vec2var #======================================================== # Data #======================================================== # forecasting horizon nhor – 12  #——————————————– # quarterly data related with money demand #——————————————– # LRM : logarithm of real money M2 (LRM) # LRY : logarithm of real income (LRY) # LPY : logarithm of price deflator (LPY) # IBO : bond rate (IBO) # IDE : bank deposit rate (IDE) # the period 1974:Q1 – 1987:Q3 #——————————————– # selected variables data(denmark) df.lev – denmark[,c(“LRM”,“LRY”,“IBO”,“IDE”)] m.lev  – as.matrix(df.lev) nr_lev – nrow(df.lev) # quarterly centered dummy variables dum_season – data.frame(yyyymm = denmark$ENTRY) substr.q   – as.numeric(substring(denmark$ENTRY, 6,7)) dum_season$Q2 – (substr.q==2)–1/4 dum_season$Q3 – (substr.q==3)–1/4 dum_season$Q4 – (substr.q==4)–1/4 dum_season    – dum_season[,–1] # Draw Graph str.main – c(     “LRM=ln(real money M2)”, “LRY=ln(real income)”,      “IBO=bond rate”, “IDE=bank deposit rate”) x11(width=12, height = 6);  par(mfrow=c(2,2), mar=c(5,3,3,3)) for(i in 1:4) {     matplot(m.lev[,i], axes=FALSE,         type=c(“l”), col = c(“blue”),          main = str.main[i])          axis(2) # show y axis          # show x axis and replace it with      # an user defined sting vector     axis(1, at=seq_along(1:nrow(df.lev)), … Read more

Categories R Tags ExcerptFavorite

Applying of Reinforcement Learning for Self-Driving Cars

A widespread approach of AI application for self-driving cars is the Supervised Learning approach and, above all, for solving perception requirements. But a self-driving car is very similar to a robot and an agent in a Reinforcement Learning (RL) approach. Can we replace a supervised learning approach with a reinforcement learning approach? The disadvantage of … Read more

Introduction to Applied Linear Algebra: Vectors

Photo of Max Fischer from Pexels Goal: This article gives an introduction to vectors, vector operations and their applications in the field of data science Why you should learn it: It is the basis for almost all machine learning techniques to learn from data whether it is predicting, classification or clustering Table of Contents: What … Read more

The Poisson Hidden Markov Model for Time Series Regression

We will first elaborate the ‘visible’ Poisson process, and then show how the Markov process ‘mixes’ into the Poisson process. Consider the following model equation that incorporates an additive error component: y_t expressed as the sum of a mean and an error term (Image by Author) In the above model, the observed value y_t is … Read more

4 Python Pandas Functions That Serve Better With Dictionaries

Pandas is arguably the most popular data analysis and manipulation library in the data science ecosystem. First and foremost, it is easy to learn and offers an intuitive syntax. With that being said, we will focus on a different great feature of Pandas: Flexibility. The capabilities of Pandas functions can be extended by using the … Read more

Preparing For Data Science Interview? Here is a Complete Guide To Help You Perform Well

It doesn’t matter if you are new to data science or have prior experience. Job interviews can bring anxiety to anyone. Each and every job interview is a different experience. While it is not possible to anticipate your interview questions or to guess the expectations of an interviewer. There are definitely some things that will … Read more

Top 5 techniques for Explainable AI

Not 1, not 2, but 5 techniques for Explainable AI Photo by Drew Dizzy Graham on Unsplash Imagine you are a medical professional and you are using AI for stroke prediction. The AI has predicted stroke for one of your patients. When you will tell this to your patient, he is going to panic. There … Read more

A Critical Look at How Data Science is “Taught”

Unsplash A weakness in non-institutional data science education Although we may not think about it much, data science (and computer science broadly) is particularly interesting from an educational/pedagogical standpoint. I may suffer from sampling bias, but I don’t think it’s unreasonable to say that data science education is more decentralized, well-documented, and accessible than that … Read more

Buy ‘Til You Die: Predict Customer Lifetime Value in Python

End-to-end example: the Beta Geometric/Negative Binomial Distribution Model (BG/NBD), using Lifetimes in Python La Catrina Girl Costume — Free photo on Pixabay Today’s article will set its focus on non-contractual business settings. We want to predict the transaction frequency of customers and their churn risks. A customer who used to purchase once every 20 days … Read more

Integrated Gradients from Scratch

Understanding integrated gradients To understand integrated gradients, let’s first start from how we would explain a simple linear model : Image by Author The effect of x1 on y is the gradient multiplied by value of x1, so a * x1. When dealing with deep learning models, looking simply at the gradient like in a … Read more

Comparing the Five Most Popular EDA Tools

Photo by Campaign Creators on Unsplash Exploratory Data Analysis (EDA) is an integral part of any data science project. In simpler terms, it could be referred to as the “detective work” necessary to understand a dataset. These initial investigations lead to the discovery of non-obvious trends and anomalies, leading to enhanced understanding of the data … Read more

A non trivial elevator control system in a train station by reinforcement learning

Why elevators in train stations are different and how RL can optimise the overall service quality Elevator on the platform of the railway station in Ningbo China Today’s urban life is out of imagination without the presence of elevators and the elevator controller algorithm has been well studied by different techniques including reinforcement learning [1]. … Read more

Classifying Handwritten Digits Using A Multilayer Perceptron Classifier (MLP)

1.1 What is a Multilayer Perceptron (MLP)? An MLP is a supervised machine learning (ML) algorithm that belongs in the class of feedforward artificial neural networks [1]. The algorithm essentially is trained on the data in order to learn a function. Given a set of features and a target variable (e.g. labels) it learns a … Read more

SEAL link prediction explained

DGCNN model In this part we will try to understand what DGCNN model is doing. We will go through one message passing analysing in details what is going on at every step. For simplicity we will assume: batch_size = 1 normalization and bias = False (in GCNConv layer) number of GCNConv layers = 1 In … Read more

Custom Formats in gt Tables

[This article was first published on rstats-tips.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The “grammar of tables” is used to build tables with the R-package gt. … Read more

Categories R Tags ExcerptFavorite

Predicting viewership for #TidyTuesday Doctor Who episodes

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the latest in my series of screencasts demonstrating … Read more

Categories R Tags ExcerptFavorite

Durability Testing of Stents Using Sensitivity-Based Methods in R

[This article was first published on [R]eliability, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The current industry protocol for durability testing of vascular stents and frames involves … Read more

Categories R Tags ExcerptFavorite

AWS App2Container now supports Jenkins for setting up a CI/CD pipeline

AWS App2Container (A2C) is a command-line tool for modernizing .NET and Java applications into containerized applications. A2C analyzes and builds an inventory of all applications running in virtual machines, on-premises or in the cloud. You simply select the application you want to containerize, and A2C packages the application artifact and identified dependencies into container images, configures … Read more

Categories AWS ExcerptFavorite

Deploy an object detector model at the edge on AWS Panorama

This article was co-authored by Janos Tolgyesi and Luca Bianchi. In recent years, Computer Vision has become one of the most exciting fields of application for Deep Learning: trained convolutional neural networks (CNN) models can reach human-like accuracy levels detecting objects within an image or a video stream. These incredible advancements opened a broad field … Read more

A Brief Overview of Methods to Explain AI (XAI)

ALE is also a global, model-agnostic interpretation method. It is an alternative to PDP, which is subject to bias when variables are highly correlated. For example, the variable RM, which indicates the number of rooms, is highly correlated with the area of the house. So RM=7.5 would be an unrealistic individual for a very small … Read more

Adiabatic Quantum Computation 1: Foundations and the Adiabatic Theorem

The lesser known type of quantum computers that are easier to build, easier to understand, and (maybe) equally as powerful. I’ve just completed my thesis for Honours at the Australian National University, which proposed how diamonds could be used as adiabatic quantum computers. During this project however, I realised that there are few people with … Read more

AWS Lambda now supports event filtering for Amazon SQS, Amazon DynamoDB, and Amazon Kinesis as event sources

AWS Lambda now provides content filtering options for SQS, DynamoDB and Kinesis as event sources. With event pattern content filtering, customers can write complex rules so that their Lambda function is only triggered by SQS, DynamoDB, or Kinesis under filtering criteria you specify. This helps reduce traffic to customers’ Lambda functions, simplifies code, and reduces … Read more

Categories AWS ExcerptFavorite

AWS price reduction for data transfers out to the internet

Effective December 1, 2021, AWS is making two pricing changes for data transfer out to the internet. Each month, the first terabyte of data transfer out of Amazon Cloudfront, the first 10 million HTTP/S requests, and the first 2 million CloudFront Functions invocations will be free. Free data transfer out of CloudFront is no longer … Read more

Categories AWS ExcerptFavorite

An introduction to non-Probability Sampling Methods

Illustration by Author Quota Sampling can be confused with Stratified Sampling since they both divide the population into strata, based on some characteristics, like gender, age, education. Moreover, the groups need to be internally homogenous and heterogenous among themselves. Even if they appear similar at first sight, Quota Sampling differs for many aspects from the … Read more

A Practical Guide to ARIMA Models using PyCaret — Part 3

2️⃣️ Understanding the Difference Term using PyCaret 👉 Step 1: Setup PyCaret Time Series Experiment In order to understand this concept better, we will use a random walk dataset from pycaret playground. Details can be found in the Jupyter notebook for this article (available at the end of the article). #### Get data from data … Read more

Machine Learning in Medicine — Journal Club

A Critical Appraisal of the Use of Machine Learning Techniques in Clinical Literature A call for establishing documentation standards for reporting machine learning models in scientific journals Photo by Dirk Heiss on Unsplash Introduction The use of machine learning techniques in biomedical research has exploded over the past few years, as exemplified by the dramatic … Read more

4 Changes the AI Industry Needs Right Now

ARTIFICIAL INTELLIGENCE | OPINION I’d love to see these happen. Photo by Katynn on Shutterstock (edited) AI is suffering the consequences of being too successful. A few months ago I wrote an article entitled “5 Reasons Why I Left the AI Industry” in which I criticized AI’s flaws upfront — probably the reason the post … Read more

How to Create Publication-Ready Plots with LaTeX

Kickstart your plotting journey with LaTeX using 5 fundamental plot variations Photo by Isaac Smith on Unsplash If you’ve ever written an article in a scientific journal, chances are that you’ve used a LaTeX template for preparing your manuscript. It is after all the industry standard in typesetting documents. However, how many of you have … Read more

Bayesian Networks: Analysing Hotel Customer Data

Implementing probabilistic modelling with bnlearn Source: Photo by geralt from Pixabay Bayesian networks are quite an intuitive tool when it comes to examining the dependencies between different variables. Specifically, a DAG (or directed acyclic graph) is what allows us to represent the conditional probabilities between a given set of variables. Using the bnlearn library in … Read more

Customer Science in Analytics

A perspective on how “Small Scale Businesses” can use data to Improve Customer Retention Photo by Blake Wisz on Unsplash Customer Science has been a buzzword in the analytics and data science world for businesses. No matter what business you are in, knowing your customer — her motivation, potential, and risks associated with her, is … Read more

Bayesian Inference and Markov Chain Monte Carlo Sampling in Python

An introduction to using Bayesian Inference and MCMC sampling methods to predict the distribution of unknown parameters through an in-depth coin-flip example implemented in Python. Image from Adobe Stock This article extrapolates a basic coin-flip example into a larger context in which we can examine the use and power of Bayesian Inference and Markov Chain … Read more