How to Build a Matrix Module from Scratch

If you have been importing Numpy for matrix operations but don’t know how the module is built, this article will show you how to build your own matrix module Numpy is a useful library that enables you to create a matrix and perform matrix operations with ease. If you want to know about tricks you … Read more How to Build a Matrix Module from Scratch

I can’t teach you Data Science in 10 days

A Case Study approach towards understanding Entities & Requirements in Data Science Space “Around four and a half years back, I was struggling to understand the whole concept of Data Science. Coming from a non-Statistics background, I was skeptical, worried and more importantly I was obnoxious. I had doubts about if I will be able … Read more I can’t teach you Data Science in 10 days

Understanding Truthy and Falsy Values in Python

In Python, individual values can evaluate to either True or False. They don’t necessarily have to be part of an operator and operand expression to be a truth value, because they already have one that is inherent by the rules of the Python language. Here are some examples of truthy and falsy values: By default, … Read more Understanding Truthy and Falsy Values in Python

Can analysts and statisticians get along?

Inside the subtle war between the data science professions Image: SOURCE. In a previous article, I explained that typical training programs in statistics and analytics endow graduates with different skillsets. When you’re dealing with uncertainty, analysts help you ask better questions, while statisticians provide more rigorous answers. Seems like the makings of a collaboration dream, … Read more Can analysts and statisticians get along?

Building a Road Sign Classifier in Keras

There are so many different types of traffic signs out there, each with different colours, shapes and sizes. Sometimes, there are two signs may have a similar colour, shape and size, but have 2 totally different meanings. How on earth would we ever be able to program a computer to correctly classify a traffic sign … Read more Building a Road Sign Classifier in Keras

Deep Learning: Solving Problems With TensorFlow

Learn how to Solve Optimization Problems and Train your First Neural Network with the MNIST Dataset! www.forbes.com The goal of this article is to define and solve pratical use cases with TensorFlow. To do so, we will solve: An optimization problem A linear regression problem, where we will adjust a regression line to a dataset … Read more Deep Learning: Solving Problems With TensorFlow

Amazon EC2 T3 instances now support launching as Dedicated Instances

T3 dedicated instances are available in Asia Pacific (Tokyo, Seoul, Singapore, Sydney, Mumbai), Europe (Frankfurt, Ireland, London), South America (Sao Paulo), Canada (Central), US East (N. Virginia, Ohio), and US West (Oregon, N. California) regions. For more information about using T3 dedicated instances, see our EC2 documentation page here.  Favorite

How Football Helps to Understand Bad Algorithms

Photo by Dave Adamson on Unsplash Two major problems often arise when implementing AI in business. Projects run into ‘Bad Data’ and ‘Bad Algorithms’. Last week’s blog described some of the issues that come about when dealing with bad data. This week, we’ll use football to illustrate why bad algorithms can cause trouble for machine … Read more How Football Helps to Understand Bad Algorithms

Making big moves in Big Data with Hadoop, Hive, Parquet, Hue and Docker

Jump and run in this brief introduction to Big Data Draft · 9 min read What data at most big companies in 2020 looks like. Seriously. The goal of this article is to introduce you to some key concepts in the buzzword realm of Big Data. After reading this article — potentially with some additional … Read more Making big moves in Big Data with Hadoop, Hive, Parquet, Hue and Docker

Face Recognition using TensorRT on Jetson Nano — Set up in less than 5min

mtCNN and Google FaceNet in FP16 precision on Jetson Nano When it comes to Face Recognition there are many options to choose from. While most of them are cloud-based, I decided to build a hardware based face recognition system that does not need an internet connection which makes it particularly attractive for robotics, embedded systems, … Read more Face Recognition using TensorRT on Jetson Nano — Set up in less than 5min

Do you speak Python?

The shortcuts that every Python coders should know A systematic way of learning a new language is to know its words and then create sentences by using these words. Unlike human languages, the Python vocabulary is actually pretty small. This vocabulary is actually called as “reserved words” that have special meaning in coding on Python. … Read more Do you speak Python?

Inequality: How to draw a Lorenz curve with SQL, BigQuery, and Data Studio

The top 0.1% of all Wikipedia pages earn 25% of the pageviews. The bottom 99% only get 42% of all the views. And the bottom 80% — only get 4%. This is just one example — in this post we’ll review how to get these numbers for this and any other dataset. How can we … Read more Inequality: How to draw a Lorenz curve with SQL, BigQuery, and Data Studio

New GA Dataproc features extend data science and ML capabilitiesNew GA Dataproc features extend data science and ML capabilities

The life of a data scientist can be challenging. If you’re in this role, your job may involve anything from understanding the day-to-day business behind the data to keeping up with the latest machine learning academic research. With all that a data scientist must do to be effective, you shouldn’t have to worry about migrating … Read more New GA Dataproc features extend data science and ML capabilitiesNew GA Dataproc features extend data science and ML capabilities

A Relook on Random Forest and Feature Importance

A relook on Feature Importance and Random Forest No matter who you are, a student who just finished up his/her first machine learning course, an experienced Data Scientist, or basically any guy who worked as a technical role nowadays, you must have heard of Random Forest. Random Forest is an ensemble-trees model mostly used for … Read more A Relook on Random Forest and Feature Importance

5 Tools for Reproducible Data Science

Watermark Watermark is an IPython magic extension that prints information about software versions, hardware and dates and times in any IPython shell or Jupyter Notebook session. Watermark provides a very quick and simple way to keep track of tools, libraries, versions, authors and dates involved in a project. It is particularly useful for ad hoc … Read more 5 Tools for Reproducible Data Science

This is the Architecture Powering Machine Learning at LinkedIn

LinkedIn has implemented a very advanced architecture for developing machine learning solutions at scale. Building the infrastructure to manage the lifecycle of machine learning models remains a challenge for most organizations. While we have seen tremendous advancements in machine/deep learning frameworks, the architecture best practices for developing, deploying and managing models at scale still is … Read more This is the Architecture Powering Machine Learning at LinkedIn

Google just published 25 million free datasets

Here’s what you need to know about the largest data repository in the world Note: Google’s new dataset search tool was publicly released on January 23rd, 2020. Google recently released datasetsearch, a free tool for searching 25 million publicly available datasets. The search tool includes filters to limit results based on their license (free or … Read more Google just published 25 million free datasets

The Basics: Principal Component Analysis

Data Science from the Ground Up An unsupervised method for variable grouping and dimensionality reduction Principle Component Analysis sits somewhere between unsupervised learning and data processing. On the one hand, it’s an unsupervised method, but one that groups features together rather than points as in a clustering algorithm. But principal component analysis ends up being … Read more The Basics: Principal Component Analysis

Google Trends Email Automation with Shiny

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Google Trends is a FREE tool to gain insights about Google Search Terms … Read more Google Trends Email Automation with Shiny

Super-charged similarity metric calculations

Featuring NumPy and TensorFlow Draft · 4 min read Photo by Joe Neric on Unsplash Similarity detection is a common method used to identify items that share traits but not necessarily the same features. Product recommendations and related articles are often driven by similarity metrics. Cosine similarity is the most popular and will be covered … Read more Super-charged similarity metric calculations

Things to know before you make your 1st ML model

This is the part that heavily decides the success of an ML model, so spending the maximum time here makes the most sense. A prerequisite of making useful models is the knowledge of the terminology used in Machine Learning. A label is the thing we’re predicting — the y variable in simple linear regression. A … Read more Things to know before you make your 1st ML model

How to Improve Sports Betting Odds — Step by Step Guide in Python

The statistical method does seem more sophisticated than traditional methods. But how do the performances compare? Let’s look at three other conventional methods: Method #1: Win-Loss % As we talked about in the earlier section of this article, this is a fundamental statistic that often appears on sports websites. For each particular team, the win-loss … Read more How to Improve Sports Betting Odds — Step by Step Guide in Python

How a passion for numbers turned this Mechanical Engineer into a Kaggle Grandmaster

It is rightly said that one should never seek praise, instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the Data Science world, a name that resonates when we speak of humility is that … Read more How a passion for numbers turned this Mechanical Engineer into a Kaggle Grandmaster

Quantitative, Qualitative or Maybe Both?

Supervised Learning When analyzing data many problems fall naturally into the supervised or unsupervised learning paradigms. In this blog we review supervised learning, where you’ll have an input variable such as x and an output variable such as Y and you use an algorithm to learn the mapping function from the input to the output … Read more Quantitative, Qualitative or Maybe Both?

Breaking Down the Data Scientist Interview Process

It’s a brand new decade and some of us might be looking for our next role — Maybe as a Data Scientist? I’ve broken down the interview process based on my experiences taking part and facilitating Data Science interviews. Data Science is an incredibly broad field — there are Data Scientists who function similarly to … Read more Breaking Down the Data Scientist Interview Process

ABS time series as tsibbles

[This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. library(tidyverse) library(tsibble) library(readabs) library(raustats) Australian data analysts will know … Read more ABS time series as tsibbles

Web Scraping with rvest + Astro Throwback

In my first post of the year I will provide a gentle introduction to web scraping with the tidyverse package rvest. As the package name pun suggests, web scraping is the process of harvesting, or extracting, data from websites. The extraction process is greatly simplified by the fact that websites are predominantly built using HTML … Read more Web Scraping with rvest + Astro Throwback

Key Concepts of Modern Reinforcement Learning

As the Agent interacts with the Environment, it learns a policy. A policy is a “learned strategy” that governs the agents’ behaviour in selecting an action at a particular time t of the Environment. A policy can be seen as a mapping from states of an Environment to the actions taken in those states. The … Read more Key Concepts of Modern Reinforcement Learning

Is there a Future for Data Science Teams?

1. Stream-Aligned Teams Steam-aligned teams are cross-functional teams that add value as fast and independent as possible. They have clear responsibilities. Their interactions with other parts of the organization are clearly defined. To fulfill their obligations, they rely on direct feedback from users and can implement new features accordingly. These teams are at the core … Read more Is there a Future for Data Science Teams?

Using Prophet To Forecast Weather Data

Prophet has quickly become a popular open-source library for working with time series data. In this particular example, Prophet is used to: Identify extreme weather events using anomaly detection Forecast future weather data with Prophet In specific, Prophet is used to forecast wind speed for Dublin Airport, Ireland. The dataset used consists of hourly wind … Read more Using Prophet To Forecast Weather Data

Testing, profiling, and optimizing NLP models with Pytest, Cython, and spaCy

Unit test your machine learning models, profile your code, and take full advantage of c’s natural language processing speed. I read a blog post that claimed to have profiled spaCy and NLTK for natural-language data preprocessing, and to have found NLTK far faster. uhh (via giphy, @ TheLateShow) What? NLTK is the Jeep Grand Cherokee … Read more Testing, profiling, and optimizing NLP models with Pytest, Cython, and spaCy

Why Politics and Machine Learning Are Not a Good Match

Photo by roya ann miller on Unsplash With the excitement surrounding opportunities that come with deploying Machine Learning (ML), it is easy to forget the downsides and risks. One of the major reasons that ML strategies lose money, miss the mark, or disappoint customers is ‘bad data’. Bad data can come in many forms and … Read more Why Politics and Machine Learning Are Not a Good Match

Deep Learning Containers Updates for SageMaker Debugger and Tensorflow Serving

The AWS Deep Learning Containers are available today with bug fixes to the SageMaker integration with Tensorflow Server and the latest version of SageMaker Debugger. You can launch the new versions of Deep Learning Container on Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), self-managed Kubernetes on Amazon EC2, and Amazon Elastic Container Service (Amazon … Read more Deep Learning Containers Updates for SageMaker Debugger and Tensorflow Serving

Making Your RGB Camera Smarter with Power of Deep Learning!

The main aim of this article is to design deep learning-based smart security system which is based only on the images from RGB camera.This system can be applied to the home/office security cameras. cctv The proposed system will be able to do the following tasks: Calculate no. of people in the room. Face Recognition Activity … Read more Making Your RGB Camera Smarter with Power of Deep Learning!

31 Best Practices for Product Analytics Data Governance

Managers & Members Managers and members are the primary consumers of product analytics data. The needs of these roles revolve around reporting and insight generation. They will be: conducting analyses from available data creating reports from available data sharing and collaborating on reports or dashboards creating collections of reports or organizing data The key difference … Read more 31 Best Practices for Product Analytics Data Governance

From cutting-edge research to industrial applications with Giotto

The algorithm introduction by Royer et al. is called ATOL, for Automatic Topologically-Oriented Learning. It addresses a key problem in topological data analysis (TDA), namely the automatic the automatic extraction of features from so-called persistence diagrams. A persistence diagram is a representation of the global topology of a dataset in terms of its connectivity at … Read more From cutting-edge research to industrial applications with Giotto

Cheaper Cloud AI deployments with NVIDIA T4 GPU price cutCheaper Cloud AI deployments with NVIDIA T4 GPU price cutProduct ManagerProduct Marketing

Locations and configurations Google Cloud was the first major cloud provider to launch the T4 GPU and offer it globally (in eight regions). This worldwide footprint, combined with the performance of the T4 Tensor Cores, opens up more possibilities to our customers. Since our global rollout, T4 performance has improved. The T4 and V100 GPUs … Read more Cheaper Cloud AI deployments with NVIDIA T4 GPU price cutCheaper Cloud AI deployments with NVIDIA T4 GPU price cutProduct ManagerProduct Marketing

Foods Around Me: Google Maps Data Scraping with Python & Google Colab

The first step: What data would you like to get? For me, I would like to get restaurants around me (chilling by the beach in Sanur, Bali) in radius of 1 km. So, the parameters would be ‘restaurant’, ‘Sanur Beach’ (in coordinate), and ‘1 km’. Translated into Python, it would be: coordinates = [‘-8.705833, 115.261377’]keywords … Read more Foods Around Me: Google Maps Data Scraping with Python & Google Colab

What can analysing more than 2 million street names reveal?

The “Road” suffix dominates the tally (with 775,537) followed by Lane and street with 238726 and 213881 frequency respectively. We can have a look at how these names are distributed geographically. Here, we aggregate all suffix per city and take the highest frequent street suffix name in each city. The result is quite revealing. The … Read more What can analysing more than 2 million street names reveal?

How To Save and Load A Model In PyTorch With A Complete Example

Have you experienced a situation where you spend hours or days training your model and then it stops in the middle? Or you are not satisfied with your model performance and want to train the model again? There are multiple reasons why we might need a flexible way to save and load our model. Most … Read more How To Save and Load A Model In PyTorch With A Complete Example

What is Artificial Intelligence?

One of the key features that distinguishes us, humans, from everything else in the world is intelligence. This ability to understand, apply knowledge and improve skills has played a significant role in our evolution and establishing human civilisation. But many people (including Elon Musk) believe that the advancement in technology can create superintelligence that can … Read more What is Artificial Intelligence?

Big Data: An Anthropological Perspective

An anthropological argument for why big data and qualitative research should be combined. Adobe Stock As a business anthropologist working as a mixed-methods researcher, I support the use of qualitative and quantitive methods, including big data. But, I do not support the use of only big data. Research and the insights that are generated are … Read more Big Data: An Anthropological Perspective

This Google Scientist teaches AI to build better AI

Can you tell us about your professional background? I got my Masters and Ph.D in electrical and computer engineering at Rice University. I was working on algorithmic, hardware/software code design for large-scale data analytics model and Machine Learning. Towards the end of my Ph.D when Deep Learning took off, I switch my focus to Deep … Read more This Google Scientist teaches AI to build better AI

5 Hacky Data Visualization Techniques with Tableau

Tackle Tableau limitations in simple hacky ways Draft · 6 min read Tableau is a powerful and easy-to-use data visualization tools. Yet sometimes the limitation of features will hold us back when we want to create an advanced dashboard. My Tableau project Line Separator Pagination Search Box Line Plot with Markers Text with a URL … Read more 5 Hacky Data Visualization Techniques with Tableau

Python Stock Analysis — Candlestick Chart with Python and Plotly

A candlestick chart is a very common and useful representation of stock prices. By looking into a candlestick chart, we can visually see the open, close, low and high price for any given stock. In this article, I would like to show you how to use Python, Pandas and Plotly to build your own candlestick … Read more Python Stock Analysis — Candlestick Chart with Python and Plotly

A/B testing — Is there a better way? An exploration of multi-armed bandits

Websites today are meticulously designed to maximize one or even several goals. Should the “Buy Now!” button be red or blue? What headline attracts the most clicks to that news article? Which version of an advertisement has the highest click-through rate? To determine the optimal answer to these questions, software developers employ A/B tests — … Read more A/B testing — Is there a better way? An exploration of multi-armed bandits

How to calculate landscape metrics for local landscapes?

In the last few weeks, I was asked a similar question several times – how to calculate landscape metrics for local landscapes?In other words, how to divide the categorical input map into a number of smaller areas, and next calculate selected landscape metrics for each of the areas.Those areas have many names, such as tiles, … Read more How to calculate landscape metrics for local landscapes?