An overview of the global market for chatbot solutions in 2020

Coming from my active watch on the conversational assistant market, I give you some statistics and information on the almost 550 solutions I studied. As a conclusion, I will give you my view about the future. Source: photo by Patrick Meyer The event still seems to feel everything in the minds of those who followed … Read more An overview of the global market for chatbot solutions in 2020

Amazon S3 on Outposts is now generally available, expanding object storage to on-premises environments

In addition to helping you meet data residency requirements, you can use S3 on Outposts to satisfy demanding performance needs by keeping data close to on-premises applications. S3 on Outposts provides a new Amazon S3 storage class, named ‘S3 Outposts’, which uses the same S3 APIs, and is designed to durably and redundantly store data … Read more Amazon S3 on Outposts is now generally available, expanding object storage to on-premises environments

Introducing Hiveplotlib

Better Network Visualization in Python with Hive Plots Hiveplotlib is a new, open-source Python package for generating Hive Plots. Introducing hiveplotlib— a new, open-source Python package for generating Hive Plots. Originally developed by Martin Krzywinski, Hive Plots generate well-defined figures that allow for interpretable, visual explorations of network data. The hiveplotlib repository is visible to … Read more Introducing Hiveplotlib

How GANs Can Improve Healthcare Analytics

Over the past decade, the adoption of electronic health record (EHR) systems in hospitals has become widespread. This transformation is due to the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009, which allocated $30 million in incentives for hospitals and physician practices to adopt EHR systems. This digital explosion in big … Read more How GANs Can Improve Healthcare Analytics

Training the same CNN to do 18 different things and visualizing what it learned.

Toon Images Convolutional Neural Network (CNN) architectures can be pretty general purpose for vision tasks. In this article, I’ll relay my experience in using the same network architecture for 18 different classification tasks. The classification tasks include facial features such as length of the chin (3 gradations), type of hair (111 types), and hair color … Read more Training the same CNN to do 18 different things and visualizing what it learned.

Optimizing Hyperparameters the right Way

Efficiently exploring the parameter-search through Bayesian Optimization with skopt in Python. TL;DR: my hyperparameters are always better than yours. Explore vast canyons of the problem space efficiently — Photo by Fineas Anton on Unsplash In this post, we will build a machine learning pipeline using multiple optimizers and use the power of Bayesian Optimization to … Read more Optimizing Hyperparameters the right Way

Machine Learning Models For Improved Startup Valuation. | by flo.tausend

Determining the valuation of an early-stage Startup is in most cases very challenging due limited historical data, little to no existing revenues, market uncertainty and many more. Traditional valuation techniques, such as Discounted Cash Flow (DCF) or Multiples (CCA), therefore often lead to inappropriate results. On the other hand, alternative valuation methods remain subject to … Read more Machine Learning Models For Improved Startup Valuation. | by flo.tausend

Cats vs Dogs —Your second end-to-end CNN Classifier in 5 minutes

Result: ~74% Accuracy This is exactly similar to our previous classification model. There are some changes based on the nature of the problem though. Lets briefly discuss the same. Data Preprocessing You have images, but models work on matrices! Right! So, this a step that will be needed in almost all CNN-based models. Most of … Read more Cats vs Dogs —Your second end-to-end CNN Classifier in 5 minutes

Learning from Audio: Fourier Transformations

Breaking down a fundamental equation in signal processing Related article: In Wave Forms, we looked at what waves are, how to visualize them, and how to deal with null data. In this article, I aim to develop an intuition on what the Fourier Transformation is, why it is useful when studying audio, show mathematical proofs … Read more Learning from Audio: Fourier Transformations

Where Are All The 10s – Diving Math

Welcome to another case study here at Swimming + Data Science. We’re returning the world of diving, a place we’ve visited before, in an attempt to answer the question “where are all the 10s?” Today’s data science component will consist of matching calculated values to sets of allowed solutions, plus some tidyverse style data wrangling … Read more Where Are All The 10s – Diving Math

S4 vs vctrs library – A Double Dispatch Comparision Remake

[This article was first published on krzjoa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Remake About two weeks ago, I published on my blog a comparision between … Read more S4 vs vctrs library – A Double Dispatch Comparision Remake

Defending the Data Science Masters

Considerations in Favor of Post-Graduate Degrees Specifically for Data Science Introduction (Any statements made in this article are my own and not representative of any other entity, person, or organization) Data Science, as much as it can be defined as a field, is a growing field. With its unique mixture of computer science, statistics, scientific … Read more Defending the Data Science Masters

Jupyter is Ready for Production; As Is

Kubeflow is an open-source project, dedicated to making deployments of ML projects simpler, portable and scalable. From the documentation: The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed … Read more Jupyter is Ready for Production; As Is

Simulating Plague Infection in a Strategy Game

Approximating probabilities from a turn-based strategy game with the binomial distribution Photo by CDC on Unsplash Probabilities can be hard to calculate, especially in complex scenarios. One way to get around this is to turn probabilistic situations into code. By running this code many times, one can approximate the probability of a certain outcome without … Read more Simulating Plague Infection in a Strategy Game

Softmax Activation Function — How It Actually Works

Categorical Data into Numerical Data The truth labels are categorical data: any particular image can be categorized into one of these groups: dog, cat, horse or cheetah. The computer however does not understand this kind of data and therefore we need to convert them into numerical data. There are two ways to do so: Integer … Read more Softmax Activation Function — How It Actually Works

What is computation?

A program is defined as a sequence of definitions and commands definitions are evaluated (expressions) commands are executed (statement) Commands basically instruct the interpreter to do something, an example being the print statement Objects We write programs to manipulate data objects, which are the building blocks for programs each object has a specific type that … Read more What is computation?

Azure Cost Management + Billing updates – September 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints, and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in. We’re always looking … Read more Azure Cost Management + Billing updates – September 2020

Scaling Microsoft Kaizala on Azure

This post was co-authored by Anubhav Mehendru, Group Engineering Manager, Kaizala. Mobile-only workers depend on Microsoft Kaizala—a simple and secure work management and mobile messaging app—to get the work done. Since COVID-19 has forced many of us to work from home across the world, Kaizala usage has surged close to 3x from pre-COVID-19. While this … Read more Scaling Microsoft Kaizala on Azure

How to Switch from Excel to R Shiny: First Steps

tl;dr If you’re still using Excel or Google Sheets for business, you might already know that Excel is obsolete for many business use-cases. But how do you switch from Excel to a better alternative like R Shiny? It’s easier to get started than you might think, especially if you’re already an Excel power user. This … Read more How to Switch from Excel to R Shiny: First Steps

General availability of Azure Maps support for Azure Active Directory, Power Apps integration, and more

This blog post was co-authored by Chad Raynor, Principal Program Manager, Azure Maps. New and recent updates for Microsoft Azure Maps include support for Azure Maps integration with Azure Active Directory (generally available), integration with the Microsoft Power Apps platform, Search and Routing services enhancements, new Weather services REST APIs (in preview), and expanded coverage … Read more General availability of Azure Maps support for Azure Active Directory, Power Apps integration, and more

The Graphs You Need to Understand the Covid-19 Pandemic

The right graphs can be tremendously revealing — if you know what to look for As one of the contributors to the CDC’s Covid-19 “Ensemble” forecast model, I update a set of state and national graphs several times a week on my Covid-19 Spin Free Data Center. I include the charts that I personally find … Read more The Graphs You Need to Understand the Covid-19 Pandemic

Ultimate Pandas Guide — Window Functions

Photo by Laura Woodbury from Pexels Master your understanding of the groups in your data Window functions are an efficient way to understand more about the relationship between each of the records within our data and the groups they belong to. They also happen to be a common data science interview question so they’re a … Read more Ultimate Pandas Guide — Window Functions

How to Reduce Training Time for a Deep Learning Model using tf.data

Learn to create an input pipeline for images to efficiently use CPU and GPU resources to process the image dataset and reduce the training time for a deep learning model. In this post, you will learn How are the CPU and GPU resources used in a naive approach during model training? How efficiently use the … Read more How to Reduce Training Time for a Deep Learning Model using tf.data

How to Compute Word Similarity — A Comparative Analysis

This Owl API uses various word2vec models and advanced text clustering techniques to create a better granularity comparing to the industry standards. In fact, it uses the largest word2vec English model created by spaCy (i.e., en-core-web-lg) for the general context and uses one of the word2vec models created at Stanford University (i.e., glove-wiki-gigaword-300) for the … Read more How to Compute Word Similarity — A Comparative Analysis

Efficient Euclidean distance computation in pandas

Let’s begin with a set of geospatial data points: We usually do not compute Euclidean distance directly from latitude and longitude. Instead, they are projected to a geographical appropriate coordinate system where x and y share the same unit. I will elaborate on this in a future post but just note that One degree latitude … Read more Efficient Euclidean distance computation in pandas

What is the Volatility Risk Premium?

Visualization and implementation in an investment portfolio Photo by Pixabay from Pexels After the derivation of the Black-Scholes model, the discussion is open to its place in pricing vanilla options. The market dictates the interest-rate and option price for a particular strike and expiration by the laws of supply and demand. The only parameter not … Read more What is the Volatility Risk Premium?

Predict Any Cryptocurrency Applying NLP using Global News

On today’s harsh global economic conditions, traditional indicators and techniques can have poor performances (to say the least). In this tutorial we’ll search for useful information on news and transform it to a numerical format using NLP to train a Machine Learning model which will predict the rise or fall of any given Cryptocurrency (using … Read more Predict Any Cryptocurrency Applying NLP using Global News

Entropy Application in the Stock Market

A lot of definitions and formulations of entropy are available. What in general is true is that entropy is used to measure information, surprise, or uncertainty regarding experiments’ possible outcomes. In particular, Shannon entropy is the one that is used most frequently in statistics and machine learning. For this reason, it’s the focus of our … Read more Entropy Application in the Stock Market

SQL On-Demand: An easier way to Query Data

Query files like CSV, JSON, and Parquet without moving it from its location Image by Author (Azure Synapse Workspace) About SQL On-Demand Creating the Data Source Gaining Access to the Azure Storage Querying Files in Azure Storage through Azure Synapse Workspace YouTube Video for Visual Description of Steps Below In my last post, I mentioned … Read more SQL On-Demand: An easier way to Query Data

The Google Data Analyst Interview

Google Data Analyst Interview Questions Image from Unsplash Introduction Google LLC. is an American tech giant that offers industry-based solutions. As a company that prides itself in “providing access to the world’s information in one click”, Google offers a long list of products and services, including a huge hardware portfolio, an internet search engine, web … Read more The Google Data Analyst Interview

Pipeline, ColumnTransformer and FeatureUnion explained

Let’s assume we wanted to use smoker, day and time columns to predict total_bill. We will drop size column and partition the data first: Typically, the raw data is not in a state where we can straight away feed it into a machine learning model. Therefore, transforming the data to a state that is acceptable … Read more Pipeline, ColumnTransformer and FeatureUnion explained

5 Powerful Tricks to Visualize Your Data with Matplotlib

DESIGNING WITH MATPLOTLIB How to use LaTeX font, create zoom effect, outbox legend, continuous error, and adjust box pad margin Data visualization is used to shows the data in a more straightforward representation and more comfortable to be understood. It can be formed in histograms, scatter plots, line plots, pie chart, etc. Many people are … Read more 5 Powerful Tricks to Visualize Your Data with Matplotlib

Amazon Textract supports customer S3 buckets

Amazon Textract is a fully managed machine learning service that makes it easy to extract text and data from virtually any document. Amazon Textract offers you both synchronous and asynchronous APIs to choose based on the fit for each use case. With the asynchronous APIs, you can retrieve the extracted information using the GetDocumentTextDetection or … Read more Amazon Textract supports customer S3 buckets

Airbnb Rental-Analysis of New York using Python

Airbnb is a San Francisco-based company with presence in more that 81,000 cities worldwide, with more than 6 million listings of rentals available. From a data-creation standpoint, the company has generated enormous amounts of data from the cities in which it operates, such as reviews from users, location’s descriptions, and rental statistics. It has emerged … Read more Airbnb Rental-Analysis of New York using Python

Detecting microcontrollers with CNN

OBJECT DETECTION ON CLICK Simple tutorial for detecting microcontrollers on data from Kaggle competition Photo by Jonas Svidras from Pexels Standart libraries as TensorFlow or PyTorch don’t provide any simple way to train your custom Object Detection models. Most of the time you need to install a big library as Detectron 2 or Tensorflow Object … Read more Detecting microcontrollers with CNN

Best practices for Reinforcement Learning

Lifting the curses of time and cardinality. Machine learning is research intensive. It contains significantly higher degrees of uncertainty compared to classic programming. This has a significant impact on product management and product development. Image via Shutterstock under license to Nicolas Maquaire. Developing an intelligent product with good performance is very difficult. In addition, the … Read more Best practices for Reinforcement Learning

Visualization of COVID-19 Cases in Arkansas

[This article was first published on R – Nathan Chaney, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Throughout the COVID-19 pandemic, the main sources of information for … Read more Visualization of COVID-19 Cases in Arkansas

IBM Data Science Professional Certificate on Coursera: job ready?

Last year I took and completed the 9 main courses which make up the IBM Data Science Professional Certificate offered by IBM on Coursera (link to certificate here. I have been wanting to certify my Data Science skills during the last few years, and thus jumped at the opportunity to take this series of courses … Read more IBM Data Science Professional Certificate on Coursera: job ready?

RStudio v1.4 Preview: Visual Markdown Editing

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Today we’re excited to announce availability of our first Preview Release for … Read more RStudio v1.4 Preview: Visual Markdown Editing

The Invisible Traps of Data

It’s all about reading between the lines Photo by Erico Marcelino on Unsplash It is well known that data scientists spend much more time on data preparation tasks (data collection, EDA and features engineering) than on Machine Learning modelling. Although many of us might have complained about this fact, I think that underestimating the importance … Read more The Invisible Traps of Data

Rapid Analysis and Presentation of Quality Improvement Data with R

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I presented at RMedicine2020 and really enjoyed it – It’s been a few … Read more Rapid Analysis and Presentation of Quality Improvement Data with R

Google Cloud migration made easyGoogle Cloud migration made easyDeveloper Advocate

Should you migrate to Google Cloud? To determine whether your application can and should migrate to cloud, begin by asking yourself the following questions: Are the components of my application stack virtualized or virtualizable? Can my application stack run in a cloud environment while still supporting any and all licensing, security, privacy, and compliance requirements? … Read more Google Cloud migration made easyGoogle Cloud migration made easyDeveloper Advocate

Invitation to All Aspiring Reinforcement Learning Practitioner

Basic Idea How RL Works. [Image by Author] For now, I’ll explain how RL works in high-level. Not to worry, because, in the future posts, we’ll get back to this and learn the details! The subscript t refers to the time step we are currently in. At the first time step (t=0), the agent receives … Read more Invitation to All Aspiring Reinforcement Learning Practitioner

‘Sherlock Holmes’ AI Diagnoses Disease Better Than Your Doctor, Study Finds

Peer-reviewed study says you’ll soon consult Dr. Bot for a second opinion Image Credit: upklyak New research finds that causal machine learning models are not only more accurate than previous AI-based symptom checkers for patient diagnosis but, in many cases, can now exceed the diagnosis accuracy of human doctors. That’s mainly due to the methods … Read more ‘Sherlock Holmes’ AI Diagnoses Disease Better Than Your Doctor, Study Finds