Missing Migrants, tracking human deaths along migratory routes

[This article was first published on long time ago…, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.  Hi there, Several years ago I received the book “Como si … Read more

Categories R Tags ExcerptFavorite

Spark MLlib on AWS Glue

Machine Learning Distributed ML on AWS that’s ready to go AWS pushes Sagemaker as its machine learning platform. However, Spark’s MLlib is a comprehensive library that runs distributed ML natively on AWS Glue — and provides a viable alternative to their primary ML platform. One of the big benefits of Sagemaker is that it easily … Read more

How to Make Your Data Catalog Successful

Learnings from dozens of companies on how to make your data catalog successful PHOTO BY JOSHUA SORTINO ON UNSPLASH There are only 2 goals that matter when it comes to measuring the success of a data catalog: 1) adoption, and 2) customer satisfaction. If you nail these two, you are successful. I’m the co-creator of … Read more

Cyclical learning rate with R and Keras

[This article was first published on casualR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this blog post I will share a way to perform cyclical learning … Read more

Categories R Tags ExcerptFavorite

Multi-task learning with Multi-gate Mixture-of-experts

Google’s neural network model for content recommendation Photo by Possessed Photography on Unsplash Multi-task learning is a machine learning method in which a model learns to solve multiple tasks simultaneously. The assumption is that by learning to complete multiple correlated tasks with the same model, that the performance of each task will be higher than … Read more

Amazon Connect Chat now supports Apple Business Chat (Generally Available)

With the Apple Business Chat integration in Amazon Connect, your customers can interact with you using the Apple Messages application on their iPhone, iPad, or Mac. Your customers can now have an experience that is as familiar and convenient as chatting with a friend, while using rich customer service features like interactive messages to do things … Read more

Categories AWS ExcerptFavorite

The case for using spatial SQL

Source from GIPHY Great question! Spatial SQL uses all the same elements and structure of normal SQL but allows you to work with another data type: a GEOMETRY or GEOGRAPHY. A GEOMETRY is when your data lives in a projected coordinate system or a flat representation of the earth A GEOGRAPHY is where your data … Read more

Interactive data analysis with dropdown menu Ipywidgets and Plotly in Jupyter Notebook.

An example of how to set up an interactive dropdown menu widgets and using Plotly to display the outcome of database analysis in Jupyter Notebook using IPython and Pandas. Image by the author: Lassen Volcanic National Park. The Challenge Recently, while working on the project for one of my graduate classes, I was faced with … Read more

R user or R Developer? Your opinion matters.

[This article was first published on Mirai Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. What makes one an R Developer and how does it differ from … Read more

Categories R Tags ExcerptFavorite

Why Balancing Classes is Over-Hyped

Citations [1] Bartosz Krawczyk. “Learning from imbalanced data: open challenges and future directions.” Prog Artif Intell (2016) 5:221–232 [2] Marco Altini, “Dealing with imbalanced data: Undersampling, oversampling and proper cross-validation.” Blog. August, 2015. https://www.marcoaltini.com/blog/dealing-with-imbalanced-data-undersampling-oversampling-and-proper-cross-validation [3] Yanping Yang, Guangzhi Ma. “Ensemble-based active learning for class imbalance problem.” Journal of Biomedical Science and Engineering, Vol. 3 №10, … Read more

A Better Alternative to Python Dictionaries

counting example Let’s say that we want to count the number of occurrences of each letter in a word. We will create a dictionary with the keys as the letters and their values as their number of occurrences. We will use the longest word in most English dictionaries (according to google): word = ‘pneumonoultramicroscopicsilicovolcanoconiosis’ We … Read more

Making a flawless ML env. with Tensorflow 2 and CUDA 10.1 on Ubuntu 20.04 with dual boot 2021

Before we start, ensure steps until now are cleanly done. Installation is not that scary if done in the right sequence! So let’s begin… Installing CUDA 10.1 and cuDNN 7: These are stable versions and the commands are tried and tested, so just run them one by one and see the magic happening! (Check this … Read more

Statistical Prophecy and the Art of Forecasting

As they say, Banking is the business of managing risk, and you know you are running the show when your department is called “Bank in a Bank”. The Asset Liability Management (ALM) department of any bank, in simple words, handles two key functions: managing the supply of money (deposits) and catering to the demand of … Read more

Building a Biomedical Knowledge Graph

Konrad, like so many TypeDB community members, comes from a diverse engineering background. Knowledge graphs have been part of his scope since working on an enterprise knowledge graph for GSK. He’s been a part of the TypeDB community for roughly 3 years. While most of his career has been spent in the biomedical industry, he’s … Read more

Mrs. T’s Pierogies: Improved forecasting and DR with SAP on Google CloudMrs. T’s Pierogies: Improved forecasting and DR with SAP on Google CloudManaging Director for SAP, Google Cloud

Pierogies might just be the ultimate comfort food. But when Mrs. T’s Pierogies — the leading manufacturer of frozen pierogies in the US — learned it needed to transition its existing on-premises SAP ECC to S/4HANA, the company sought a little comfort for itself.  Founded in 1952, Mrs. T’s Pierogies now produces more than 650 … Read more

Take Your SQL From Good to Great: Part 4

Define what’s returned ↪ In general, window functions can be grouped into 3 types: Navigation functions: Return the value given specific location criteria (e.g. first_value, lag, lead) Numbering functions: Assign a number (e.g. rank, row_number) to each row based on their position in the specified window Analytic functions: Perform a calculation on a set of … Read more

Weekly review of Reinforcement Learning papers #12

I present 4 publications from my research area. Let’s discuss them! Image by the author [← Previous review][Next review →] Paolo, G., Coninx, A., Doncieux, S., & Laflaquière, A. (2021). Sparse Reward Exploration via Novelty Search and Emitters. arXiv preprint arXiv:2102.03140. The major trade-off in reinforcement learning is the exploration versus exploitation trade-off. Exploration is … Read more

The Physics of Energy-Based Models

Using physics to understand energy-based models Authors: Patrick Huembeli (EPFL), Juan Miguel Arrazola (Xanadu), Nathan Killoran (Xanadu), Masoud Mohseni (Google Quantum AI), Peter Wittek The interactive version of this post can be found here. Since Medium does not support Javascript and equations written in Latex, we recommend to check out our interactive post as well. … Read more

IBM Data Science Capstone Project — Battle of the Neighborhoods

Introduction/ Business Problem Context: A client has approached the Consultation firm to advise on the business strategies and execution roadmap on setting up restaurants in Kyoto. The initial business problem question is “Should the Client set up a restaurant chain in Kyoto, and where?” Photo by Sorasak on Unsplash Rather than diving into the problem … Read more

R is Slow — and It’s Your Fault!

Let’s start with how the R programming language works. It is what is referred to as an interpreted language. This means you don’t have to compile anything before running code, the computer just interprets and runs it, giving you results. This helps speed up how quickly you can write and test your code, but has … Read more

The 5 Certificate to Prove Your Python Knowledge Level

Sometimes, having a certificate can be the validation you need Photo by Anton Maksimov juvnsky on Unsplash Python is one of the popular and commonly-used programming languages for many applications. Python is a general-purpose programming language; that is, you can use it to write codes for a wide variety of application fields. You can use … Read more

Use of Machine Learning in Economic Research: What the Literature Tells Us

Full list of references (in order of appearence): [1] Michael D. Cohen and Robert Axelrod, 1984: Coping with Complexity: The Adaptive Value of Changing Utility, The American Economic Review, Vol. 74, No. 1 (Mar., 1984), pp. 30-42. [2 ] W. Brian Arthur, 1991: Designing Economic Agents that Act like Human Agents: A Behavioral Approach to … Read more

(Self-)Supervised Pre-training? Self-training? Which one to use?

What are the current state of the arts in self-supervised pre-training? Do we really need pre-training? How about self-training? Recently, pre-training has been a hot topic in Computer Vision (and also NLP), especially one of the breakthroughs in NLP — BERT, which proposed a method to train an NLP model by using a “self-supervised” signal. … Read more

RStudio Professional Drivers 1.8.0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Announcing full support for the Snowflake driver In the previous release we … Read more

Categories R Tags ExcerptFavorite

Downloading Sentinel-2 archives from Google Cloud with sen2r

sen2r can be used as usual, remembering to set Google Cloud SDK as input source (Copernicus Hub remains the default choice). Using sen2r from the GUI The function sen2r() opens the sen2r GUI. In the first sheet, the section “SAFE options” was modified to include the selector “Input servers”, which allows keeping the default “ESA … Read more

Categories R Tags ExcerptFavorite

The Ultimate Guide to SQL CTEs

Photo by Dimitra Peppa on Unsplash Master common table expressions in 5 minutes Common table expressions (CTEs) are a SQL functionality that allows you to perform complex, multi-step transformations in a single easy-to-read query. Because of their power, readability, and flexibility, they are a useful tool for beginners and experts alike. In the simplest terms, … Read more

Step-by-Step Deployment of a Free PostgreSQL Database and Data Ingestion

Heroku is a platform as a service (PaaS) that enables developers to build and run applications entirely in the cloud. Heroku offers a ready-to-use environment that makes it very simple to deploy your code as quickly as possible with little development experience. This is an excellent choice for beginners and small to medium-sized companies, unlike … Read more

Building an Image Color Analyzer using Python

In this step, as you can understand from the title, we will be writing functions. I will define three functions that will be helpful for us. Functions are also an excellent method to simplify your programs. Here are the functions with their definitions. rgb_to_hex def rgb_to_hex(rgb_color):hex_color = “#”for i in rgb_color:i = int(i)hex_color += (“{:02x}”.format(i))return … Read more

About Sort in Spark 3.x

Sorting data is a very important transformation needed in many applications, ETL processes, or various data analyses. Spark offers a couple of functions to sort data based on the particular use-case the user has. In this article, we will describe these functions and take a closer look at how sort works under the hood and … Read more

Comparing Facebook’s M2M to mT5 in low resources translation (English-Yoruba).

Dataset preprocessing is minimal here, source and target text is lowercased, and trailing spaces are removed. Validation consists of 5% of the dataset (500 sentences). Simpletransformers makes our lives much simpler when fine tuning T5 / mT5 models since they fully support both. We only need to set the required hyperparameters, name of the model, … Read more

Automating Exploratory Data Analysis Using QuickDA

Using QuickDA for Preprocessing and Manipulating Data Photo by UX Indonesia on Unsplash Exploratory data analysis consists of different parts like visualizing the data patterns, analyzing the statistical properties, preprocessing data, etc. This process takes around 30% of the total project time but this problem can be solved by automating exploratory data analysis. Automating exploratory … Read more

Application of the Von Mises’ axiom of randomness on the forecasts concerning the dynamics of a…

Use of the Von Mises’ axiom of randomness as an analysis method concerning the forecasts on the evolutions of a non-stationary system. ABSTRACT: In this article, we will describe the dynamics of a non-stationary system using a numerical sequence, in which the value of terms varies within the number of degrees of freedom of the … Read more

Monte Carlo Simulation and Variants with Python

Rejection Sampling is usually used to generate independent samples from the unnormalized target distribution. The idea behind this Monte Carlo sampling variant is that if we want to generate random samples from target unnormalized distribution P(x) then we can use some proposal distribution Q(x), a normalization constant c such that cQ(x) is an upper bound … Read more

Scorecard Development for Finance Industry Using PyCaret — Part 1

Details around developing a classification model with light-coding workflow photo by Ameen Fahmy (unsplash) In this article I will try to describe an end-to-end scorecard development for the banking industry, leveraging machine learning library PyCaret. My first encounter with scorecard development happened almost twelve years ago when I developed a propensity scorecard where the objective … Read more

Using NEOS Optimization Solver in R code

#=================================================================# # Finance and Insurance Engineering using R  # by Sang-Heon Lee & Hosam Ki # # https://kiandlee.blogspot.com #—————————————————————–# # Portfolio Optimization using ROI NEOS #=================================================================# graphics.off()  # clear all graphs rm(list = ls()) # remove all files from your workspace library(quadprogXT)      # solveQPXT library(ROI)             # ROI_solve library(ROI.plugin.neos) # NEOS #—————————————————– # Michuad dataset    #—————————————————–     setwd(“D:/SHLEE/blog/rneos/michuad”)     # dataset in Michaud and Michaud (2007) appendix     df.data – read.csv(‘mu_sd_corr_michaud.csv’)     head(df.data)          # mean, std, correlation     mu   – as.vector(df.data[,1])     sd   – as.vector(df.data[,2])     corr – as.matrix(df.data[,–c(1,2)])          # number of asset     nvar – length(mu)     var.name – colnames(df.data[,–c(1,2)])          # convert correlation to covariance matrix     cov – diag(sd)%*%corr%*%diag(sd) #—————————————————– # Portfolio optimization using solveQPXT #—————————————————–          n.er – 100  # number of EF points     rset – seq(min(mu),max(mu),length=n.er+2)     rset – rset[2:n.er+1]          # given returns and unknown std and weight     port1.ret – rset     port1.std – rset*0     port1.wgt – matrix(0,n.er,nvar)          # ith portfolio problem setting     i = 10;     Dmat – 2*cov     dvec – rep(0,nvar) #c(0,0)     Amat – t(rbind(t(rep(1,nvar)),t(mu),diag(nvar)))     bvec – c(1,rset[i],rep(0,nvar))              # mean-variance optimization     m–solveQPXT(Dmat,dvec,Amat,bvec,meq=2,factorized=FALSE)          … Read more

Categories R Tags ExcerptFavorite

Matplotlib Animations in Jupyter Notebook

In order to create an interactive plot in Jupyter Notebook, you first need to enable interactive plot as follows: # Enable interactive plot%matplotlib notebook After that, we import the required libraries. Especially FuncAnimation class that can be used to create an animation for you. import matplotlib.pyplot as pltfrom matplotlib.animation import FuncAnimation Next, we need to … Read more

An Approach for Choosing Number of Clusters for K-Means

The suggested approach considers the common trade-off between the inner-distances and the number of clusters- and automatically choose the number of clusters Photo by Franki Chamaki on Unsplash When we use clustering algorithms, choosing the number of clusters is always a challenging task. While there are some existing approaches that can help with this task, … Read more

Equality of Variances in R-Homogeneity test-Quick Guide

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Equality of Variances in R, in this article, we are describing … Read more

Categories R Tags ExcerptFavorite

A unified view of Graph Neural Networks

Graph attention, graph convolution, network propagation are all special cases of message passing in graph neural networks. Illustration of three different GNNs. Image from [1]. Message passing networks (MPN), graph attention networks (GAT), graph convolution networks (GCN), and even network propagation (NP) are closely related methods that fall into the category of graph neural networks … Read more

k-Nearest Neighbors (kNN) — How To Make Quality Predictions With Supervised Learning?

As you can see from the chart above, k-Nearest Neighbors belongs to the supervised branch of Machine Learning algorithms, which means that it requires labeled data for training. However, suppose you only want to find similar data points (i.e., find neighbors) instead of making predictions. In that case, it is possible to use kNN in … Read more

Boosting performance by combining trees with GLM: A benchmarking analysis

Photo by Simon Berger on Unsplash How much of an improvement is gained by combining trees with GLM, and how does it compare to additive models? A common pitfall of statistical modeling is ensuring the modeling method is appropriate to the structure of the data. linear models like logistic regression assume the existence of a … Read more

C++ Basics: Understanding Lambda

A convenient way to define a functor that can help us to simplify the code. Photo by Tudor Baciu on Unsplash One of the new features introduced in Modern C++ starting from C++11 is Lambda Expression. It is a convenient way to define an anonymous function object or functor. It is convenient because we can … Read more