Amazon S3 on Outposts is now generally available, expanding object storage to on-premises environments

In addition to helping you meet data residency requirements, you can use S3 on Outposts to satisfy demanding performance needs by keeping data close to on-premises applications. S3 on Outposts provides a new Amazon S3 storage class, named ‘S3 Outposts’, which uses the same S3 APIs, and is designed to durably and redundantly store data … Read more

Categories AWS ExcerptFavorite

Introducing Hiveplotlib

Better Network Visualization in Python with Hive Plots Hiveplotlib is a new, open-source Python package for generating Hive Plots. Introducing hiveplotlib— a new, open-source Python package for generating Hive Plots. Originally developed by Martin Krzywinski, Hive Plots generate well-defined figures that allow for interpretable, visual explorations of network data. The hiveplotlib repository is visible to … Read more

How GANs Can Improve Healthcare Analytics

Over the past decade, the adoption of electronic health record (EHR) systems in hospitals has become widespread. This transformation is due to the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009, which allocated $30 million in incentives for hospitals and physician practices to adopt EHR systems. This digital explosion in big … Read more

Training the same CNN to do 18 different things and visualizing what it learned.

Toon Images Convolutional Neural Network (CNN) architectures can be pretty general purpose for vision tasks. In this article, I’ll relay my experience in using the same network architecture for 18 different classification tasks. The classification tasks include facial features such as length of the chin (3 gradations), type of hair (111 types), and hair color … Read more

Optimizing Hyperparameters the right Way

Efficiently exploring the parameter-search through Bayesian Optimization with skopt in Python. TL;DR: my hyperparameters are always better than yours. Explore vast canyons of the problem space efficiently — Photo by Fineas Anton on Unsplash In this post, we will build a machine learning pipeline using multiple optimizers and use the power of Bayesian Optimization to … Read more

Machine Learning Models For Improved Startup Valuation. | by flo.tausend

Determining the valuation of an early-stage Startup is in most cases very challenging due limited historical data, little to no existing revenues, market uncertainty and many more. Traditional valuation techniques, such as Discounted Cash Flow (DCF) or Multiples (CCA), therefore often lead to inappropriate results. On the other hand, alternative valuation methods remain subject to … Read more

Cats vs Dogs —Your second end-to-end CNN Classifier in 5 minutes

Result: ~74% Accuracy This is exactly similar to our previous classification model. There are some changes based on the nature of the problem though. Lets briefly discuss the same. Data Preprocessing You have images, but models work on matrices! Right! So, this a step that will be needed in almost all CNN-based models. Most of … Read more

Learning from Audio: Fourier Transformations

Breaking down a fundamental equation in signal processing Related article: In Wave Forms, we looked at what waves are, how to visualize them, and how to deal with null data. In this article, I aim to develop an intuition on what the Fourier Transformation is, why it is useful when studying audio, show mathematical proofs … Read more

Where Are All The 10s – Diving Math

Welcome to another case study here at Swimming + Data Science. We’re returning the world of diving, a place we’ve visited before, in an attempt to answer the question “where are all the 10s?” Today’s data science component will consist of matching calculated values to sets of allowed solutions, plus some tidyverse style data wrangling … Read more

Categories R Tags ExcerptFavorite

Getting familiar with torch tensors

[This article was first published on RStudio AI Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Two days ago, I introduced torch, an R package that provides … Read more

Categories R Tags ExcerptFavorite

S4 vs vctrs library – A Double Dispatch Comparision Remake

[This article was first published on krzjoa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Remake About two weeks ago, I published on my blog a comparision between … Read more

Categories R Tags ExcerptFavorite

What’s The Best Day to Get Married?

Since I’ll be working with dates the lubridate package will be the workhorse for preparing my data. In order to make the universe of wedding dates tractable I’ll be looking at all potential dates occurring on a Saturday in the past 10 years (since 1/1/2010) and through the next 5 years (through 12/31/2025). The seq.Date() … Read more

Categories R Tags ExcerptFavorite

Defending the Data Science Masters

Considerations in Favor of Post-Graduate Degrees Specifically for Data Science Introduction (Any statements made in this article are my own and not representative of any other entity, person, or organization) Data Science, as much as it can be defined as a field, is a growing field. With its unique mixture of computer science, statistics, scientific … Read more

Le Monde puzzle [#1157]

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The weekly puzzle from Le Monde is an empty (?) … Read more

Categories R Tags ExcerptFavorite

Jupyter is Ready for Production; As Is

Kubeflow is an open-source project, dedicated to making deployments of ML projects simpler, portable and scalable. From the documentation: The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed … Read more

Simulating Plague Infection in a Strategy Game

Approximating probabilities from a turn-based strategy game with the binomial distribution Photo by CDC on Unsplash Probabilities can be hard to calculate, especially in complex scenarios. One way to get around this is to turn probabilistic situations into code. By running this code many times, one can approximate the probability of a certain outcome without … Read more

What is computation?

A program is defined as a sequence of definitions and commands definitions are evaluated (expressions) commands are executed (statement) Commands basically instruct the interpreter to do something, an example being the print statement Objects We write programs to manipulate data objects, which are the building blocks for programs each object has a specific type that … Read more

Scaling Microsoft Kaizala on Azure

This post was co-authored by Anubhav Mehendru, Group Engineering Manager, Kaizala. Mobile-only workers depend on Microsoft Kaizala—a simple and secure work management and mobile messaging app—to get the work done. Since COVID-19 has forced many of us to work from home across the world, Kaizala usage has surged close to 3x from pre-COVID-19. While this … Read more

How to Switch from Excel to R Shiny: First Steps

tl;dr If you’re still using Excel or Google Sheets for business, you might already know that Excel is obsolete for many business use-cases. But how do you switch from Excel to a better alternative like R Shiny? It’s easier to get started than you might think, especially if you’re already an Excel power user. This … Read more

Categories R Tags ExcerptFavorite

General availability of Azure Maps support for Azure Active Directory, Power Apps integration, and more

This blog post was co-authored by Chad Raynor, Principal Program Manager, Azure Maps. New and recent updates for Microsoft Azure Maps include support for Azure Maps integration with Azure Active Directory (generally available), integration with the Microsoft Power Apps platform, Search and Routing services enhancements, new Weather services REST APIs (in preview), and expanded coverage … Read more

How to Compute Word Similarity — A Comparative Analysis

This Owl API uses various word2vec models and advanced text clustering techniques to create a better granularity comparing to the industry standards. In fact, it uses the largest word2vec English model created by spaCy (i.e., en-core-web-lg) for the general context and uses one of the word2vec models created at Stanford University (i.e., glove-wiki-gigaword-300) for the … Read more

Efficient Euclidean distance computation in pandas

Let’s begin with a set of geospatial data points: We usually do not compute Euclidean distance directly from latitude and longitude. Instead, they are projected to a geographical appropriate coordinate system where x and y share the same unit. I will elaborate on this in a future post but just note that One degree latitude … Read more

What is the Volatility Risk Premium?

Visualization and implementation in an investment portfolio Photo by Pixabay from Pexels After the derivation of the Black-Scholes model, the discussion is open to its place in pricing vanilla options. The market dictates the interest-rate and option price for a particular strike and expiration by the laws of supply and demand. The only parameter not … Read more

Predict Any Cryptocurrency Applying NLP using Global News

On today’s harsh global economic conditions, traditional indicators and techniques can have poor performances (to say the least). In this tutorial we’ll search for useful information on news and transform it to a numerical format using NLP to train a Machine Learning model which will predict the rise or fall of any given Cryptocurrency (using … Read more

Entropy Application in the Stock Market

A lot of definitions and formulations of entropy are available. What in general is true is that entropy is used to measure information, surprise, or uncertainty regarding experiments’ possible outcomes. In particular, Shannon entropy is the one that is used most frequently in statistics and machine learning. For this reason, it’s the focus of our … Read more

SQL On-Demand: An easier way to Query Data

Query files like CSV, JSON, and Parquet without moving it from its location Image by Author (Azure Synapse Workspace) About SQL On-Demand Creating the Data Source Gaining Access to the Azure Storage Querying Files in Azure Storage through Azure Synapse Workspace YouTube Video for Visual Description of Steps Below In my last post, I mentioned … Read more

The Google Data Analyst Interview

Google Data Analyst Interview Questions Image from Unsplash Introduction Google LLC. is an American tech giant that offers industry-based solutions. As a company that prides itself in “providing access to the world’s information in one click”, Google offers a long list of products and services, including a huge hardware portfolio, an internet search engine, web … Read more

5 Powerful Tricks to Visualize Your Data with Matplotlib

DESIGNING WITH MATPLOTLIB How to use LaTeX font, create zoom effect, outbox legend, continuous error, and adjust box pad margin Data visualization is used to shows the data in a more straightforward representation and more comfortable to be understood. It can be formed in histograms, scatter plots, line plots, pie chart, etc. Many people are … Read more

Amazon Textract supports customer S3 buckets

Amazon Textract is a fully managed machine learning service that makes it easy to extract text and data from virtually any document. Amazon Textract offers you both synchronous and asynchronous APIs to choose based on the fit for each use case. With the asynchronous APIs, you can retrieve the extracted information using the GetDocumentTextDetection or … Read more

Categories AWS ExcerptFavorite

Airbnb Rental-Analysis of New York using Python

Airbnb is a San Francisco-based company with presence in more that 81,000 cities worldwide, with more than 6 million listings of rentals available. From a data-creation standpoint, the company has generated enormous amounts of data from the cities in which it operates, such as reviews from users, location’s descriptions, and rental statistics. It has emerged … Read more

Evaluating American Funds Portfolio

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Active funds have done poorly over the last ten years, and in most … Read more

Categories R Tags ExcerptFavorite

Detecting microcontrollers with CNN

OBJECT DETECTION ON CLICK Simple tutorial for detecting microcontrollers on data from Kaggle competition Photo by Jonas Svidras from Pexels Standart libraries as TensorFlow or PyTorch don’t provide any simple way to train your custom Object Detection models. Most of the time you need to install a big library as Detectron 2 or Tensorflow Object … Read more

Best practices for Reinforcement Learning

Lifting the curses of time and cardinality. Machine learning is research intensive. It contains significantly higher degrees of uncertainty compared to classic programming. This has a significant impact on product management and product development. Image via Shutterstock under license to Nicolas Maquaire. Developing an intelligent product with good performance is very difficult. In addition, the … Read more

Visualization of COVID-19 Cases in Arkansas

[This article was first published on R – Nathan Chaney, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Throughout the COVID-19 pandemic, the main sources of information for … Read more

Categories R Tags ExcerptFavorite

New Polished Feature – User Roles

[This article was first published on Posts on Tychobra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Polished is an R package that adds authentication and user administration … Read more

Categories R Tags ExcerptFavorite

RStudio v1.4 Preview: Visual Markdown Editing

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Today we’re excited to announce availability of our first Preview Release for … Read more

Categories R Tags ExcerptFavorite

The Invisible Traps of Data

It’s all about reading between the lines Photo by Erico Marcelino on Unsplash It is well known that data scientists spend much more time on data preparation tasks (data collection, EDA and features engineering) than on Machine Learning modelling. Although many of us might have complained about this fact, I think that underestimating the importance … Read more

Google Cloud migration made easyGoogle Cloud migration made easyDeveloper Advocate

Should you migrate to Google Cloud? To determine whether your application can and should migrate to cloud, begin by asking yourself the following questions: Are the components of my application stack virtualized or virtualizable? Can my application stack run in a cloud environment while still supporting any and all licensing, security, privacy, and compliance requirements? … Read more

‘Sherlock Holmes’ AI Diagnoses Disease Better Than Your Doctor, Study Finds

Peer-reviewed study says you’ll soon consult Dr. Bot for a second opinion Image Credit: upklyak New research finds that causal machine learning models are not only more accurate than previous AI-based symptom checkers for patient diagnosis but, in many cases, can now exceed the diagnosis accuracy of human doctors. That’s mainly due to the methods … Read more