AWS Launch Wizard now supports additional deployment capabilities to meet DevOps and organizational requirements

When you deploy SAP applications with AWS Launch Wizard it now saves the CloudFormation templates and application configuration code in your S3 buckets and creates an AWS Service Catalog product when deployment is complete. This allows you to leverage Launch Wizard-generated infrastructure as code to repeat deployments with identical configuration, or customize them to meet … Read more

Categories AWS ExcerptFavorite

Git Cheat Sheet for Data Scientists

Getting started Photo by Roman Synkevych on Unsplash Git is a free and open-source version control system. Most programmers and data scientists interact with git on a daily basis. So what is version control? Version control is a way that we as a programmer track our code changes and a way to collaborate with other … Read more

Data Applications for Analytics

A new generation of tools bridging the gap between technical and non-technical users Photo by Jerry Zhang on Unsplash Co-authored with Sarah Krasnik. In 2021, the Modern Data Stack is the talk of the town. As predicted by Tristan Handy last year, there’s a “Cambrian Explosion” of data tools taking place. As companies and open … Read more

Top AutoML open source tools to automate your deep learning applications

These 2 computer vision tools will blow your mind Photo by philippe_yeonathan bouaziz on Unsplash Article Co-authors with : @bonnefoypy and @emeric.chaize CEOs at Olexya. Building the best model is a key step after exploratory data analysis and feature selection in any data science project. In deep learning this process consists of building layers by … Read more

How to organize and run remote data science internships

Photo by JJ Ying on Unsplash We are a team of scientists working on computational methods for biological data analysis. One of our focus areas is the application of machine learning methods to genomics data. This year, we welcomed several interns for machine learning-related projects. Due to the pandemic, we had to run these internships … Read more

How Julia Perfected Multiple Dispatch

Before we touch on what is so great about the way the Julia language has expanded and worked with multiple dispatch across the board, let us first touch on what exactly multiple dispatch is. Multiple dispatch is a generic programming concept that was originally introduced to solve a problem in the functional programming paradigm. This … Read more

Singular Value Decomposition

Matrix Multiplication To start, let’s consider the following vector, x, as the sum of two basis vectors i and j. Image generated by the author We can easily visualize this vector using matplotlib and Python… Image generated by the author Vectors are relatively straightforward, they have a direction and magnitude. In this example, we plot … Read more

How to Make Production Ready Shiny Applications

What defines a production-ready Shiny application? And more importantly, how do we get there? Shiny is hands down one of the best dev tools available for quick production of Proof of Concepts (PoC). But here in lies a trap. People tend to mislabel Shiny as a poor production tool because they fall for the PoC … Read more

Categories R Tags ExcerptFavorite

Dimensionality Reduction Explained

Introduction Dimensionality reduction is a popular method in machine learning commonly used by data scientists. This article will focus on a very popular unsupervised learning approach to dimensionality reduction, principle component analysis (PCA). The scope of this article is to provide an intuitive understanding of what dimensionality reduction is, how PCA works and how to … Read more

Feature engineering A-Z

Feature extraction Let’s say we have the data on consumption statistics of some kind and it has a time stamp on it: Data with a timestamp In this example, the “Date” column could easily be used to extract additional features and generate powerful insights such as variations of consumption on weekdays or weekends or at … Read more

Intuitive Kaggle Task Exploration and Model Baselining

I have chosen PyTorch Lightning for this task as it helps to decouple my data science code from deep learning engineering, empowering us to focus on: a) Loading and processing data with LightningDataModules. b) Selecting which model/architecture to use in our LightningModules.c) Evaluating performance with TorchMetrics. Although the Plant Pathology 2021 — FGVC8 challenge organizers … Read more

How to confuse your shareholders by bad data visualization

Like many people during the COVID19 crisis, I turned to the stock market as a new hobby. Like the ignorant investor that I am, I thought it wise to hop on the cloud computing bandwagon. Hence, I bought, among others, a small position in Rackspace Technologies. A long way down Now, my Rackspace shares have … Read more

Categories R Tags ExcerptFavorite

{gitlabr} v2.0 is on CRAN!

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. You can read the original post in its original format on Rtask website … Read more

Categories R Tags ExcerptFavorite

How to Scale Your Analytics Org by Ditching Git

Why analytics work should often prioritize discoverability and reproducibility, not version control and code review. [Image from Freepik] A critical aspect of scaling organizations is process. Process allows you to normalize and incorporate best practices to ensure things work smoothly and scalably even when no one is minding the controls. But process in analytics organizations … Read more

Is Data Science Practically Useful?

Despite being in the profession for years, it took me some time to get out of the if-then mindset of my day-to-day and begin to analyze how data science could be useful to me (not just the companies I have helped to apply it). It all started with a pause in my day to take … Read more

The Most Exciting Aspect Of Machine Learning

Computer Vision techniques are behind most AI applications we use daily, from the facial recognition capabilities in your smartphone to the incoming cashier-less retail stores, and let’s not forget everyone’s favourite car brand’s autonomous vehicle functionalities. It’s almost crazy to think solving Computer Vision was once a University’s student summer project back in the ’60s, … Read more

How To Change The Size Of Figures In Matplotlib

Modifying rcParams If you want to modify the size of a figure without using the figure environment, then you can also update matplotlib.rcParams which is an instance of RcParams for handling default Matplotlib values. import matplotlib.pyplot as pltplt.rcParams[‘figure.figsize’] = (3, 2)x = y = range(1, 10)plt.plot(x, y)plt.show() Note that the above will have effect on … Read more

GSoC 2021 with ML4SCI | The NMR Project

Organization description Unlike most other organizations participating at the Google summer of code, I feel that ML4SCI is unique in both its methods and objectives. While most organizations look for developers to build up their code repositories, resolve bugs and update new features, the primary objective of ML4SCI is to solve open-ended research questions in … Read more

Equivocal Zones in a Classification Model

Example of an analysis with Equivocal Zones in a Naïve Bayes model. by Ginna Gomezwith Greg Page Equivocal Zones: Overview Typically, when we measure the performance of classification models, we do this by using all of the records in some particular dataset. We might say, for instance, that a model achieved “71 percent accuracy against … Read more

rOpenSci News Digest, August 2021

Dear rOpenSci friends, it’s time for our monthly news roundup! You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci! 🔗 rOpenSci HQ 🔗 Recordings of useR! 2021 Recordings of useR! 2021 are now available on the conference website. You can watch contributions by rOpenSci staff: And … Read more

Categories R Tags ExcerptFavorite

Subgroup analysis using a Bayesian hierarchical model

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I’m part of a team that recently submitted the results of … Read more

Categories R Tags ExcerptFavorite

Detecting knee- / elbow points in a graph

The “Kneedle” algorithm has been published by Satopää, Albrecht, Irwin, and Raghavan (2011, [2]) using the concept of curvature as a mathematical measure of much a function differs from a straight line. Satopää et al. (2011, [2]) conclude that “as a result, the maximum curvature captures the leveling off effects operators use to identify knees” … Read more

A guide to the Knowledge Graphs

Introduction– What is a Knowledge Graph (KG)?– Why KG?– How to use KG? KG in practice– Open source KGs– Creating custom KG– KG Ontology – Hosting KG (database)– Query facts from KG In this section, we will introduce KG by asking some simple but intuitive questions about KG. In fact, we will cover the what, … Read more

Spatial Transformer Tutorial, Part 1 — Forward and Reverse Mapping

A Self-Contained Introduction Convolutional Neural Networks (CNN) possess the inbuilt property of translation invariance. This enables them to correctly classify an image at test time, even when its constituent components are located at positions not seen during training. However, CNNs lack the inbuilt property of scale and rotation invariance: two of the most frequently encountered … Read more

Big Trouble in Little Data

Small data are everywhere and are very useful in the early days of learning different data science techniques. In some settings like research, small data are all that are available to answer very targeted research questions. In fact, small data can even be present in very big data sets when we are interested in better … Read more

3 challenges of data adoption

Countless companies are embarking on a data journey. And increasingly they — correctly — start by designing a data strategy. But even with a great plan and positive intentions, success is not certain. One of the reasons why data initiatives are failing is the lack of data adoption across the organisation. Adoption needs to spread … Read more

Amazon Aurora supports PostgreSQL 13 in GovCloud Regions

Amazon Aurora PostgreSQL-Compatible Edition now supports PostgreSQL major version 13 in GovCloud regions. PostgreSQL 13 includes improved functionality and performance from enhancements such as de-duplication of B-tree index entries, improved performance for queries that use partitioned tables, incremental sorting to accelerate data sorts, parallel processing of indexes with the VACUUM command, more ways to monitor activity … Read more

Categories AWS ExcerptFavorite

Google Cloud VMware Engine, PowerCLI and BigQuery AnalyticsGoogle Cloud VMware Engine, PowerCLI and BigQuery AnalyticsStrategic Cloud Engineer, Google Cloud Professional ServicesStrategic Cloud Engineer, Google Cloud Professional ServicesStrategic Cloud Engineer, Google Cloud Professional Services

Google Cloud Billing allows Billing Account Administrators to configure the export of Google Cloud billing data to a BigQuery dataset for analysis and intercompany billback scenarios. Developers may choose to extract Google Cloud VMware Engine (GCVE) configuration and utilization data and apply internal cost and pricing data to create custom reports to support GCVE resource … Read more

Build a Movie Recommendation Engine backend API in 5 minutes (Part 2)

The first step is to download the data from: https://grouplens.org/datasets/movielens/ I used the following dataset from the MovieLens: “education & development”. User Ratings Data Source: MovieLens After you’ve downloaded & unzipped the “ml-latest-small” folder, let’s load the relevant files into Jupyter notebook: import pandas as pddf_movies = pd.read_csv(‘~/Downloads/ml-latest-small/movies.csv’)df_ratings = pd.read_csv(‘~/Downloads/ml-latest-small/ratings.csv’)df_merged = pd.merge(df_movies, df_ratings, on=’movieId’, how=’inner’)df_merged.head() … Read more

Working with Multi-Index Pandas DataFrames

Learn how to work with multi-index dataframes with ease Photo by AbsolutVision on Unsplash Most learners of Pandas dataframe are familiar with how a dataframe looks like, as well as how to extract rows and columns using the loc[] and iloc[] indexer methods. However, things can get really hairy when multi-index dataframes are involved. A … Read more

120 years of Olympic Games

Photo by Bryan Turner On Unsplash As Tokyo Olympic Games just drew to an end, I would like to review 120 years history of Modern Olympic Game. Modern Olympic Games are leading international sport events featuring summer and winter sports competitions. The creation was inspired by the ancient Olympic Games, held in Olympia, Greece from … Read more

A Latin American R community for HR

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. By Sergio Garcia Mora R4HR, formerly known as the Club de R … Read more

Categories R Tags ExcerptFavorite

Creating Data Science Python Package Using Jupyter Notebook

The Gaussian distributions is important in statistics and are often used in social sciences to represent real random variables whose distributions are unknown. — Wikipedia Figure1 | Wikipedia Mean The mean of a list of numbers is the sum of all the numbers divided by the number of samples. Mean — Wikipedia Standard Deviation This … Read more

Confused by Multi-Index in Pandas? 9 Essential Operations to Know

1. What’s MultiIndex? We have mentioned that single level index uses a series of labels to uniquely identify each row or column. Unlike the single level index, the multi-index uses a series of tuples with each uniquely identifying a row or column. For the simplicity of the terminology, we’ll just focus on the rows’ index, … Read more

ShinyProxy 1-click App

[This article was first published on R – Hosting Data Apps, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShinyProxy is one of the most popular options to … Read more

Categories R Tags ExcerptFavorite

Re-Imagine The Business Of Fashion-part 1

How Can Fashion Brands And Retailers Re-Imagine Fashion Through Advanced Analytics? Photo by Andrea Piacquadio on Pixels My interest in Data Analytics began in 2020 when I stumbled on an article about different case studies of analytics in the Fashion Industry. Honestly, it was a long read, about 60 pages in all, but I enjoyed … Read more