AWS Lambda integration with Snowflake

Resource policy After replacing the relevant fields in the following JSON add the same resource policy for the API. Finally, in the lambda console, you should observe API triggered Lambda function. API Integration We create an API integration in Snowflake. This integration will create a user and allow that user to assume the role we … Read more

Adversarial Machine Learning: Attacks and Possible Defense Strategies

Information Theory An overview regarding one of the emerging research field for Machine Learning and Artificial Intelligence. Image by Author Research on Machine Learning (ML) models has evolved in recent years, leading to the definition of very precise models. In fact, the primary goal of the ML researchers has always been to develop ever more … Read more

Categorizing user-uploaded documents

How insights from data were used to help build the taxonomy and our approach to assign categories to the user-uploaded documents. Scribd offers a variety of publisher and user-uploaded content to our users and while the publisher content is rich in metadata, user-uploaded content typically is not. Documents uploaded by the users have varied subjects … Read more

Supercharge your Vim Skills

8 Vim tips to edit your files faster. (Image by author) Vim is a text editor that in the hands of a skilled user, can enable blazing fast edits closer to the speed of thought — much faster than what’s usually achievable with a traditional text editor. For everything we do on the computer, there … Read more

The Easiest Headless Raspberry Pi Setup

Let’s get started. I have a Raspberry Pi 3, but any Raspberry Pi will work with this setup. All we’ll need is the following to get setup. Raspberry Pi 4GB or greater microSD card Windows, Mac, or Linux computer Adapter(s) to plug in your microSD card into your computer iPhone or Android device Adapter hell. … Read more

Understanding LIME

First things first, we need to install LIME using pip. You can find the source code for LIME in [2]. pip install lime We will use the iris dataset provided to us by Scikit-learn [3] as an example to demonstrate the package usages. First things first, we need to import the different packages which we … Read more

Album covers by GANs

A step-by-step code and intuition guide to generating album covers. A random sample of generated album covers from WGAN Yeah, GANs can be pretty cool. If you somehow managed to stumble upon this little article, it’s probably safe to say that you’re somewhat interested in generative adversarial networks — GANs. I definitely was. In seeing … Read more

A Better Way for Data Preprocessing: Pandas Pipe

Efficient, organized, and elegant. Photo by Sigmund on Unsplash Real-life data is usually messy. It requires a lot of preprocessing to be ready for use. Pandas being one of the most-widely used data analysis and manipulation libraries offers several functions to preprocess the raw data. In this article, we will focus on one particular function … Read more

Thinking Like a Chef Will Make You a Better Data Scientist

The prominent chefs all have their own restaurant and/or unique style of cooking, and they still practice all their fundamentals everyday. There isn’t a single chef breaking the rules that hadn’t mastered the rules in the first place. Essentially, they know the how and why to break rules in a way that’s meaningful. That simply … Read more

RDCOMClient : A Simple Libor IRS Pricing with OIS Discounting

#=========================================================================# # Financial Econometrics & Derivatives, ML/DL using R, Python, Tensorflow   # by Sang-Heon Lee  # # #————————————————————————-# # OIS swap pricing by using a VBA macro in R through RDCOMClient #=========================================================================# library(RDCOMClient)  # clear all graphs rm(list = ls()) # remove all files from your workspace #=============================================================================== # functions using RDCOMClient #=============================================================================== f_read_vector – function(xlWbk1, sheet1, range1){          sheet – xlWbk1$Worksheets(sheet1)     range – sheet$Range(range1)     data  –“cbind”,range[[“Value”]])     data  – matrix(unlist(data), dim(data)[1], dim(data)[2])     return(data) } f_write_vector – function(xlWbk1, sheet1, range1, data1) {          sheet – xlWbk1$Worksheets(sheet1)     range – sheet$Range(range1)     range[[“Value”]] – asCOMArray(data1) } #=========================================================== # MAIN #===========================================================          # set working directory     setwd(“D:/SHLEE/blog/excel_com”)          # Create Excel Application     xlApp – COMCreate(“Excel.Application”)          # Open the Macro Excel book     fn – “sample_ois.xlsm”     xlWbk – xlApp$Workbooks()$Open(paste0(getwd(),“/”,fn))          # use TRUE for Excel Spreadsheet to be visible     xlApp[[‘Visible’]] – TRUE # FALSE  #=========================================================== # Communicate between R and Excel #===========================================================     # Arguments for Excel Spreadsheet and VBA macro     sheet      – “Sheet1”     range_in   – “D4:E11”     range_out  – “H4:I11”     macro_name – “macro1”          #————————————————–     # Pass Input Market Swap Rates to Excel … Read more

Categories R Tags ExcerptFavorite

What is Deep Analytics?

And why we need to rethink business intelligence Going for a dive. Photo by Joe Pohle from Unsplashed. As data analysts, we waste too much time on making dashboards for other people and not enough time on answering deep questions about critical business issues. This is a waste of resources for the individual, and a … Read more

Working with tree-based hierarchies using data.tree

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 options(tidyverse.quiet = TRUE) library(tidyverse) geography <- tribble( ~id, ~parent_id, ~area, ~some_additional_data, 1, NA, “Europe”, “That’s a continent”, 2, 1, “Germany”, “That’s a country”, 3, 1, “France”, “Oh, yeah, another country”, 4, 1, “Denmark”, “Oh dear, … Read more

Categories R Tags ExcerptFavorite

parallel grid search cross-validation using `crossvalidation`

options(repos = c( techtonique = ‘’, CRAN = ‘’)) install.packages(“crossvalidation”) library(crossvalidation) library(randomForest) library(microbenchmark) set.seed(123) n <- 1000 ; p <- 10 X <- matrix(rnorm(n * p), n, p) y <- rnorm(n) tuning_grid <- base::expand.grid(mtry = c(2, 3, 4), ntree = c(100, 200, 300)) n_params <- nrow(tuning_grid) print(tuning_grid) n_cores <- 4 Sequential f1 <- function() base::lapply(1:n_params, … Read more

Categories R Tags ExcerptFavorite

Monitor models for training-serving skew with Vertex AIMonitor models for training-serving skew with Vertex AIProduct Manager, Cloud AI PlatformDeveloper Advocate, Google Cloud

Let’s look at how this practice helped Google Play improve app install rate: By comparing the statistics of serving logs and training data on the same day, Google Play discovered a few features that were always missing from the logs, but always present in training. The results of an online A/B experiment showed that removing … Read more

Self-Supervised Learning in Vision Transformers

Anyone who has ever approached the world of machine learning has certainly heard of supervised learning and unsupervised learning. These are in fact two important possible approaches to Machine Learning that have been widely used for years. Only recently, however, has there been an explosion of a new term, Self-Supervised Learning! But let’s get there … Read more

What are the Most Popular Skills for Data Science Jobs? Ask a Graph Database!

Finding your Next Job by Building an Job Graph with TigerGraph, Indeed Job Data, and Kaggle API Graphs are everywhere and can help with so much, including finding a job. Platforms like LinkedIn are powered by graph databases to help recommend jobs to you. In this blog, we’ll create an Indeed Graph that can … Read more

Can Github’s Copilot replace developers?

In simple words, Copilot really understands what you want to code in the next line. In my case, it even understands bad comments perfectly. Sometimes, it makes a few silly mistakes like declaring the same variable repeatedly; these kinds of bugs were already expected, which is why Github initially gave developers access to give their … Read more

AWS Control Tower announces improvements to guardrail naming and descriptions

AWS Control Tower guardrail naming and descriptions have been revised to better reflect the guardrail policy intention. The revised names and descriptions will help users more intuitively understand how guardrails enhance control of their accounts. For example, names of detective guardrails were modified from “Disallow” to “Detect” since the detective guardrail itself does not enforce … Read more

Categories AWS ExcerptFavorite

Introduction to Time Series Forecasting — Part 2 (ARIMA Models)

Most time series forecasting methods assume that the data is ‘stationary,’ but in reality it often needs certain transformations for further processing. Photo by Miguel Luis on Unsplash In the first article, we looked at Simple Moving Average and Exponential Smoothing methods. In this article we will look at more complex methods like ARIMA and … Read more

How Big Is Cost Overrun for the Olympics?

All Games, without exception, have had cost overruns. For no other type of mega-project is this the case. With Alexander Budzier and Daniel Lunn Photo by Bryan Turner on Unsplash Percentage cost overrun for the Olympic Games 1960–2016 is shown in real terms in the table below. Data on cost overrun were available for 19 … Read more

Amazon RDS for SQL Server Now Supports Two New Parameter Changes For Full-Text Search

Amazon RDS for SQL Server now supports parameter changes for full-text search. Full-text search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Customers now can customize the values of two parameters for full-text search in Amazon RDS: ‘max full-text crawl range’: Improves full-text crawl performance, and … Read more

Categories AWS ExcerptFavorite

Introducing improved maintenance policy for Cloud MemorystoreIntroducing improved maintenance policy for Cloud MemorystoreCustomer EngineerProduct Manager for Cloud Memorystore

Maintenance is a critical component of every database user experience as it ensures that your database is staying up to date with security patches, receiving feature updates, and improving performance. However, maintenance downtime can be impactful, especially when it occurs at inopportune times.  We are happy to announce that Cloud Memorystore now enables you to … Read more

Compliance Engineering – From manual attestation to continuous complianceCompliance Engineering – From manual attestation to continuous complianceTechnical Account Manager

Risk Management and Compliance is as important in the cloud as it is in conventional on-premises environments. To help organizations in regulated industries meet their compliance requirements, Google Cloud offers automated capabilities that ensure the effectiveness of productionalization processes.  Continuous compliance in the banking industry Banks have a formidable responsibility in managing the world’s wealth, … Read more

5 Things I (didn’t) learn at University

Opinion — University failed me badly to prepare me for my IT and Data Science Career How much worth is a Bachelor’s Degree in IT and Data Science? Photo by Raychan on Unsplash In my current career, I am dealing with several trending topics regarding digitalization such as the renewal of IT through the cloud, … Read more

How to Calculate Mean Absolute Error in R

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Mean Absolute Error in R, when we do modeling always need … Read more

Categories R Tags ExcerptFavorite

Predicting Electric Vehicle & Commercial Charger Demand in Washington State

Which Washington counties will have the most EVs and need the most commercial chargers? If you’ve stepped outside in the past couple months, chances are you’ve felt like a melting scoop of ice cream more than any other summer. Well, it is no coincidence that the global land-only surface temperature for June 2021 was the … Read more

How 400k+ Tweets Show That Simone Biles Wins

Here are the top 10 retweeted tweets. Top retweeted tweets referencing ‘Simone Biles’ | Skanda Vivek All of the top 10 retweeted tweets are in support of Simone Biles! And here are the top 10 liked tweets. Top 10 liked tweets referencing ‘Simone Biles’ | Skanda Vivek Same in this case — All of the … Read more

Practical Guide to Ensemble Learning

The intuition behind ensemble learning is often described with a phenomenon called the Wisdom of the Crowd which means aggregated decisions made by a group of individuals are often better than the individual decisions. There are multiple methods for creating aggregated models (or ensembles) which we can categorize as heterogenous and homogenous ensembles. In heterogeneous … Read more

Automatic Parallel Parking: Path Planning, Path Tracking & Control

Path Tracking The kinematic model of the car is: x = vcos(ϕ) y = vsin(ϕ) v = a ϕ = vtan(δ)/L The state vector is: z=[x,y,v,ϕ] x: x-position, y: y-position, v: velocity, φ: yaw angle The input vector is: u=[a,δ] a: acceleration, δ: steering angle Control The MPC controller controls vehicle speed and steering based … Read more

How to Run Animations in Altair and Streamlit

Data Visualisation A ready-to-run tutorial, which describes how to build an animated line chart using Altair and Streamlit. Image by Author Altair is a very popular Python library for data visualisation. Through Altair, you can build very complex charts with few lines of code, since the library follows the guide lines provided by the Vega-lite … Read more

Top Surprising Data Science Trends

Introduction Arts and Entertainment Utility Script Earth and Nature Summary References This article will outline the most popular data science trends that are designated as tags on Kaggle [2]. From those popular tags, I have picked three that I think are the most surprising. Understanding trends in data science can be helpful in a variety … Read more

Who are you Data Engineer?

In this post, I will explain the data roles that exist today and in particular — who is a data engineer? What are the role definition, responsibilities, and challenges contained in it? Photo by Christina @ on Unsplash For the past few years, I have been working as a big-data engineer, and although it … Read more

Explaining a BigQuery ML model

How to obtain and interpret explanations of predictions BigQuery ML is an easy-to-use way to invoke machine learning models on structured data using just SQL. Although it started with only linear regression, more sophisticated models like Deep Neural Networks and AutoML Tables have been added by connecting BigQuery ML with TensorFlow and Vertex AI as … Read more

Caching the results of functions of your R package

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One principle of programming that’s often encountered is “DRY”, “Don’t … Read more

Categories R Tags ExcerptFavorite

Is it worth the weight?

Intro Oh man, I did it again. Grab a coffee, this is going to be a long one. Weights got me confused. The justification for using weights seems simple enough; if you’re working with a sample in which one (or more) strata are over(under)-represented, you should compute weighted univariate statistics. I’ve discussed this already here. … Read more

Categories R Tags ExcerptFavorite

Building with Looker made easier with the Extension FrameworkBuilding with Looker made easier with the Extension FrameworkProduct ManagerStaff Software Engineer

Our goal is to continue to improve our platform functionalities, and find new ways to empower Looker developers to build data experiences much faster and at a lower upfront cost. We’ve heard the developer community feedback and we’re excited to have announced the general availability of the Looker Extension Framework. The Extension Framework is a fully … Read more

The Amazon DynamoDB Accelerator (DAX) SDK for Java 2.x is now available

The Amazon DynamoDB Accelerator (DAX) SDK for Java 2.x is now available and is compatible with the AWS SDK for Java 2.x. You can build Java applications with accelerated access to DynamoDB and benefit from non-blocking I/O and other features of the latest AWS SDK for Java. DAX provides a fully managed, highly available, in-memory … Read more

Categories AWS ExcerptFavorite

Pre-Pruning or Post-Pruning

In a previous article, we talked about post pruning decision trees. In this article, we will focus on pre-pruning decision trees. Let’s briefly review our motivations for pruning decision trees, how and why post-pruning works, and its advantages and disadvantages. If you’d like some more details, check out this article. Decision Trees are grown using … Read more

A Detailed, Novice Introduction to Natural Language Processing (NLP)

There are a total of 5 execution steps when building a Natural Language Processor: Lexical Analysis: Processing of Natural Languages by the NLP algorithm starts with identifying and analyzing the input words’ structure. This part is called Lexical Analysis and Lexicon stands for an anthology of the various words and phrases used in a language. … Read more

Automating EDA & Machine Learning

Using MLJAR-Supervised for Automating EDA Machine Learning Models and Creating Markdown Reports Source: By Author Exploratory Data Analysis is an important step for understanding the data that we are working on it helps us in identifying any hidden pattern in the data, the correlation between different columns of the data, and in analyzing the properties … Read more

Detecting Semantic Drift within Image Data

1. Metadata and Features∘ Image features∘ Image metadata2. Semantic Drifts∘ Custom Features — Distances from Cluster Centers3. Conclusion Even though we don’t have an actual model for prediction, let’s assume that our model input is expected to consist mainly of landscape images. Using a simulated production stage, we can test if it’s possible to detect … Read more