It’s Time to Use AI and Machine Learning Like Bar Charts.

Organizations need to deploy AI and machine learning more widely, beyond just data science teams. Yes, democratizing ML will lead to imperfect models, and sometimes even the wrong decision. But imperfect ML is not worse than imperfect Excel business analysis. The available size and scale of data requires more analysts empowered with an upgraded suite … Read more

Word2Vec Explained

Table of Contents Introduction What is a Word Embedding? Word2Vec Architecture– CBOW (Continuous Bag of Words) Model– Continuous Skip-Gram Model Implementation– Data– Requirements– Import Data– Preprocess Data– Embed– PCA on Embeddings Concluding Remarks Resources Word2Vec is a recent breakthrough in the world of NLP. Tomas Mikolov a Czech computer scientist and currently a researcher at … Read more

Heteroscedasticity in Regression Model

Use of Statsmodels to check Heteroscedasticity Image from Unsplash Introduction Oftentimes, regression analysis is carried out on data that may have a built-in feature of high variance across different independent variable values. One of the artifacts of this type of data is heteroscedasticity which indicates variable variances around the fitted values. When we observe heteroscedasticity, … Read more

Getting predictions from an isotonic regression model

TLDR: Pass the output of the isoreg function to as.stepfun to make an isotonic regression model into a black box object that takes in uncalibrated predictions and outputs calibrated ones. Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties … Read more

Categories R Tags ExcerptFavorite

Building a Streamlit app to visualise Covid-19 data

Photo by Mika Baumeister on Unsplash There are many datasets on the OWID GitHub repository however they have aggregated many of the key data points into one combined structure. This makes life considerably easier as transformations are in less intensive. Let’s have a look at the CSV. import pandas as pddf = pd.read_csv(‘owid-covid-data.csv’)print(df.shape)(101752, 60)print(‘Total unique … Read more

Shiny Video Game: How to Build An Award Winning App in R

Register for Appsilon & RStudio’s Community X-Session: Enterprise Shiny Apps from Concept to Production Shiny Video Game Floats To The Top! Shark Attack is an environmental video game designed and built by Appsilon engineer Marcin Dubel. Marcin’s creative approach to raising environmental awareness and ingenious methods for implementing his Shiny video game has earned him … Read more

Categories R Tags ExcerptFavorite

RDCOMClient : read and write Excel, and call VBA macro in R

#=========================================================================# # Financial Econometrics & Derivatives, ML/DL using R, Python, Tensorflow   # by Sang-Heon Lee  # # https://kiandlee.blogspot.com #————————————————————————-# # Read and Write Excel in R, also call VBA macro in R using RDCOMClient # # Install package (Not available on CRAN at 12 June 2019) # install.packages(“RDCOMClient”, repos = “http://www.omegahat.net/R”) #=========================================================================# library(RDCOMClient) graphics.off()  # clear all graphs rm(list = ls()) # remove all files from your workspace #=========================================================== # functions using RDCOMClient #=========================================================== f_read_vector – function(xlWbk1, sheet1, range1){          sheet – xlWbk1$Worksheets(sheet1)     range – sheet$Range(range1)     data  – as.numeric(unlist(range[[“Value”]]))     return(data) } f_write_vector – function(xlWbk1, sheet1, range1, data1) {          sheet – xlWbk1$Worksheets(sheet1)     range – sheet$Range(range1)     range[[“Value”]] – asCOMArray(data1) } #=========================================================== # MAIN #===========================================================          # set working directory     setwd(“D:/SHLEE/blog/excel_com”)          # Create Excel Application     xlApp – COMCreate(“Excel.Application”)          # Open the Macro Excel book     fn – “sample_excel.xlsm”     xlWbk – xlApp$Workbooks()$Open(paste0(getwd(),“/”,fn))          # use TRUE for Excel Spreadsheet to be visible     xlApp[[‘Visible’]] – FALSE  #=========================================================== # Communicate between R and Excel #===========================================================     # Arguments for Excel Spreadsheet and VBA macro     sheet      – “Sheet1”     range_in   – “C3:C12”     range_out  – “D3:D12”     macro_name – “macro1”      … Read more

Categories R Tags ExcerptFavorite

Open MLOps: Open Source Production Machine Learning

We are releasing the beta of Open MLOps: an open source platform for building machine learning solutions. Source: Author All machine learning teams face the same tasks: Manage a preprocessing pipeline; train and test models; and deploy models as APIs. And nearly every team builds their own hodgepodge collection of internal tools and scripts. The … Read more

Tutorial: MongoDB User Authentication with Google Sign-In

A step-by-step guide to simplify your application’s access control and personalize user experience. Photo by Franck on Unsplash MongoDB has quickly become my nonrelational database platform of choice for its high performance, broad developer support, and generous free tier. As is the case with many database engines, user management and access control can get quite … Read more

Developing and Explaining Cross-Entropy from Scratch

Cross-entropy is an important concept. It is commonly used in machine learning as a cost function — often our objective is to minimize the cross-entropy. But why are we minimizing the cross-entropy, and what does cross-entropy really mean? Let’s answer these questions. First, we need an adequate understanding of the concepts of information and entropy. … Read more

Clustering: How to Find Hyperparameters using Inertia

Clustering is very powerful due to the lack of labels. Getting labeled data is often expensive and time consuming. Clustering is often used for finding patterns in data. The found patterns are then often used in order to improve a certain product. One famous example is customer clustering. In customer clustering, groups of similar users … Read more

How to Run Shiny on Virtual Machines the Hard Way

[This article was first published on R – Hosting Data Apps, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are all used to all conveniences that our … Read more

Categories R Tags ExcerptFavorite

Advancing your financial services strategy with Azure sustainability

Many CEOs and senior business leaders have used the COVID-19 and economic crisis as an opportunity to focus on redesigning their business. Like others, they’ve felt compelled to re-examine their business and operational models, driven by the internal necessity for digital transformation, as well as external consumer and regulatory pressures to advance sustainability efforts. Across … Read more

Automating Submission Forms with Python

1. Set up your answers as a .csv file For simplicity purposes, the form I created on google forms is made of three simple questions: What is your name? What is your age? What is your favorite color? I know I know…very original, but this is just for demonstration purposes so bare with me. The … Read more

Continuous Testing for Machine Learning Systems

Validate the correctness and performance of machine learning systems through the ML product lifecycle. Photo by Tolga Ulkan on Unsplash Testing in the software industry is a well-researched and established area. The good practices which have been learned from the countless number of the failed projects help us to release frequently and have fewer opportunities … Read more

RStudio Connect 1.9.0 – Content Curation Tools

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As publishers add more content to RStudio Connect, content organization, distribution, and … Read more

Categories R Tags ExcerptFavorite

Prediction for the 100m final at the Tokyo Olympics

[This article was first published on R | mages’ blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. On Sunday the Tokyo Olympics men sprint 100m final will … Read more

Categories R Tags ExcerptFavorite

Use racing methods to tune xgboost models and predict home runs

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the latest in my series of screencasts demonstrating … Read more

Categories R Tags ExcerptFavorite

likelihood inference with no MLE

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. “In a regular full discrete exponential family, the MLE for … Read more

Categories R Tags ExcerptFavorite

Weaving Individualized AI into Everyday Life

Photo by Parham Moieni on Unsplash Leveraging the achievements of deep learning for customized use Artificial intelligence is transforming the world through language translation, face recognition, objection detection, and many other areas. These AI systems often fall into broad categories that have worldwide market demand and therefore attract intensive research. What many people might not … Read more

Amazon DynamoDB Accelerator (DAX) is now available in the China (Beijing) Region, operated by Sinnet

Amazon DynamoDB Accelerator (DAX) is now available in the China (Beijing) Region, operated by Sinnet. You can create DAX clusters in this AWS Region for your DynamoDB applications that require microsecond response times. DAX provides a fully managed, highly available, in-memory cache for DynamoDB that can accelerate reads from DynamoDB tables by up to 10 … Read more

Categories AWS ExcerptFavorite

Improving the World for Birds

Video showing a stoat taking a chicken egg. Video with permission of Dave Minty, Huia trapping group. Data exploration After three years of neighbourhood trapping we now have over 1,000 catches at Huia, so we can pose some exploratory questions to the data. First, what are we catching? The answer is mainly rats and mice … Read more

Virtual Environments — Setup and Importance in Python

There are a couple different options for installing packages within your environment. Pip Approach First, the respective virtual environment needs to be activated. Then a pip install of the package can be performed and it should be an importable library in your shell at this point (a refresh may be needed). An example of the … Read more

The new Google Cloud region in Melbourne is now openThe new Google Cloud region in Melbourne is now openGM and VP of Product, IaaSVice President, Australia & New Zealand

With this new region, Google Cloud customers operating in Australia and New Zealand will benefit from low latency and high performance of their cloud-based workloads and data. Designed for high availability, the region opens with three zones to protect against service disruptions, and offers a portfolio of key products, including Compute Engine, Google Kubernetes Engine, … Read more

Hasty Bug-fix Release of RSwitch v2

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Version 2 of RSwitch — the macOS menubar utility that enables … Read more

Categories R Tags ExcerptFavorite

5 AWS Services Every Data Scientist Should Use

Image Source Amazon Web Services (AWS) provides a dizzying array of cloud services, from the well known Elastic Compute Cloud (EC2) and Simple Storage Service (S3) to platform as a service (PaaS) offerings covering almost every aspect of modern computing. Specifically, AWS provides a mature big data architecture with services covering the entire data processing … Read more

5 Ultimate Python Libraries for Image Processing

SimpleCV is a python framework that uses computer vision libraries like OpenCV. This library is quite simple and easy to use and can be really helpful for quick prototyping. This library can particularly be useful for those who don’t have a knowledge of different image manipulation concepts like eigenvalues, colour spaces, and bit depth. Installation … Read more

Beginner’s guide for feature selection

Filter Methods: Based on Pearson’s correlation I use this function to get correlated features result of this when run on cleaned dataset is following: correlated features: 1correlated features: {‘sqft_above’} This means ‘sqft_above’ feature is correlated with other features and should be dropped. Univariate selection (ANOVA) I used this line of code to perform ANOVA. I … Read more

London RUG on Creating an Open and Inviting Group

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. R Consortium talks to Laura Swales of the London R User Group … Read more

Categories R Tags ExcerptFavorite

Getting familiar with Rmarkdown HTML

Although the underlying use of a plain text document through Markdown is key to create well-formatted documents for internet users yet adopting this practice at scale in a corporate workspace may be cumbersome due to several reasons. Most corporate workspaces are heavily dependent on slide decks where multiple users can edit and polish content without … Read more

Data Mesh topologies

Design considerations for building a data mesh architecture All organizations I work with understand the importance of data and are either interested in or planning their next generation of a modern data platform. Their aim is to move away from tightly coupled data interfaces and varying data flows towards an architecture that allows eco-system connectivity: … Read more

Are we thinking about AI wrong?

Divya Siddarth on a better paradigm for AI, and how Taiwan is leading the way on tech governance APPLE | GOOGLE | SPOTIFY | OTHERS Editor’s note: This episode is part of our podcast series on emerging problems in data science and machine learning, hosted by Jeremie Harris. Apart from hosting the podcast, Jeremie helps … Read more

Supervised vs Unsupervised Learning

Supervised Learning In supervised learning, the dataset of interest contains the explanatory variables (also known as the input or features) as well as the target responses (also known as the output labels). Such algorithms attempt to learn a function that approximates the relationship between the feature values and the labels in a way that it’d … Read more

Understanding the n_jobs Parameter to Speedup scikit-learn Classification

Data Analysis A ready-to-run code which demonstrates how the use of the n_jobs parameter can reduce the training time Image by Author In this tutorial I illustrate the importance of the n_jobs parameter provided by some classes of the scikit-learn library. According to the official scikit-learn library, the n_jobs parameter is described as follows: The … Read more

How to Measure Heteroscedasticity in Regression?

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Heteroscedasticity in Regression, one of the easiest ways to measure heteroscedasticity … Read more

Categories R Tags ExcerptFavorite

Five Rules for Designing Tables

Simple design elements to make your spreadsheets pop 5 rules for designing tables. By Alana Pirrone Getting people excited about a table or an Excel spreadsheet isn’t an easy task. Usually, they are filled with rows and columns of numbers upon numbers and people’s eyes seem to glaze over when presented with one. There are … Read more

Taking the TensorBoard Embedding Projector to the Next Level

TensorBoard Projector allows to graphically represent low-dimensional embeddings. Here I show you how, instead of displaying a point, you can render the image to which the embedding refers. Photo by Franki Chamaki on Unsplash The TensorBoard embedding projector is a very powerful tool in data analysis, specifically for interpreting and visualizing low-dimensional embeddings. In order … Read more

3 Tips on Pandas Groupby (vs SQL)

How to make the same output in Pandas groupby and aggregation compare to SQL Table of Contents “Genesis does what Nintendon’t” was one of the most classic but also powerful advertisement phrases in the early ‘90s video game market. Likewise, “SQL does what Pandas don’t” has been bugging me whenever I face any data wrangling … Read more

Let users choose which plot you want to show

If you have build your homepage using blogdown, it’s actually quite simple to integrate Javascript snippets in it. While this is mentioned in the book “blogdown: Creating Websites with R Markdown”, it still took me a little bit to undertstand how it works. As an example, let’s make different versions of a simple plot and … Read more

Categories R Tags ExcerptFavorite

¿Cómo dejar de preocuparse y empezar a utilizar eficazmente los paquetes de R para la econometría?

[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Dedicado a todos los que escriben comentarios estúpidos sobre las personas con discapacidades, … Read more

Categories R Tags ExcerptFavorite

Best practices for dependency managementBest practices for dependency managementSenior Developer Advocate

This article describes a set of best practices for managing dependencies of your application, including vulnerability monitoring, artifact verification, and steps to reduce your dependency footprint and make it reproducible. The specifics of each of these practices may vary depending on the specifics of your language ecosystem and the tooling you use, but general principles … Read more

Spotify Case Study: Is there a secret to producing hit songs?

Here we have 954 songs, let’s see what they have in common by plotting their attributes by mean: #First we list the attributes we want to see reflected in the plotlabels = [ “valence”, “danceability”, “energy”, “acousticness”,”instrumentalness”, “liveness”,”speechiness”]#Then we plot those attributes by meanfig = px.line_polar(popular_songs, theta = labels, r = popular_songs[labels].mean(), line_close = True)fig.show() … Read more

Backpropagation in Neural Networks

Forward pass Image by author The superscripts in parentheses denote the layer. Layer 1 (= input layer): Image by author Layer 2 (= output layer): Image by author Loss: Image by author Backward pass Layer 1 (= input layer): Image by author Layer 2 (= output layer): Image by author Calculating the derivatives of layer … Read more

Make measurable: what Galileo didn’t say about the subjectivity of algorithms

Choices about how to measure the world mean that not all data and algorithms are as objective as they seem. Photo by Diana Polekhina on Unsplash Galileo’s aphorism “Measure what can be measured, and make measurable what cannot be” echoes through almost every algorithmic system today. But reverberating in these echoes is the reality that … Read more