TextBlob Spelling Correction

What is TextBlob? TextBlob is a Python library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. Why TextBlob? NLU is a subset of NLP in which an unstructured data or sentence is being converted … Read more TextBlob Spelling Correction

Surprising Cost Based Route Optimization Results

Real world cost based route optimizations. Picture by Free-Photos from Pixabay. Cost based route optimizations are different from route optimizations that take into account distance or time only. A cost based route optimization is useful in the real-world because it meets the business objective of any company trying to streamline costs and improve efficiency. By … Read more Surprising Cost Based Route Optimization Results

How I got Certified as a TensorFlow Developer by Google

Resources and tips that will help you prepare for the certificate exam About a month ago, Deep Learning was a foreign concept to me — I barely had any theoretical background in it, and I had 0 practical experience coding neural networks. Now, a month later, I received the TensorFlow Developer Certificate, and I am … Read more How I got Certified as a TensorFlow Developer by Google

Crafting a Machine Learning Model to Predict Student Retention Using R

Modeling a ML Experiment to Predict Student Retention in R Student Retention is one of the most important indicators in Higher Education. Therefore, Predictive Analytics plays a crucial role in that regard. First and foremost, let’s start by defining what student retention is, at least in the scope of this article. We’ll define it, as … Read more Crafting a Machine Learning Model to Predict Student Retention Using R

Error Bar plots from a Data Frame using Matplotlib in Python

Simple code for error bar charts in Python! Photo by Isaac Smith on Unsplash I recently had to compare the performance of a few approaches/algorithms for a report and I chose error bars to summarize the results. If you have a similar task at hand, save yourself some time with this article. What are error … Read more Error Bar plots from a Data Frame using Matplotlib in Python

Get Your Own Data — Building a Scalable Web-Scraper with AWS

As you see from the diagram, I am using CloudWatch, Lambda, Batch, S3. I am also using SNS for notifications triggered by the “Batch-Jobs-Monitor.” Here is my reasoning for the services I chose: CloudWatch has “Rules” which behave as Cron Jobs and can pass JSON payloads to lambda functions. This enabled me to submit multiple … Read more Get Your Own Data — Building a Scalable Web-Scraper with AWS

Time for Sustainable Development

The coronavirus pandemic of 2019 and 2020 and the civil rights crisis of 2020, led by the Black Lives Matter movement, have highlighted some of the major limitations of our society today. While our economy is growing at an increasingly rapid pace, many areas of development remain under-considered. A global vision of the interconnection between … Read more Time for Sustainable Development

Database Migration using AWS Data Migration Service (DMS) — A few lessons learnt along the way

Heterogeneous Migration of Oracle Database to Amazon Aurora PostgreSQL Image courtesy Pixabay Recently, we migrated a fairly large on-premises Oracle Database to Amazon Aurora PostgreSQL. To start off heterogeneous migrations between different database platforms are never easy. Coupling this with migrating the database to the cloud definitely adds on to the challenge. The intent of … Read more Database Migration using AWS Data Migration Service (DMS) — A few lessons learnt along the way

FNN-VAE for noisy time series forecasting

“) training_loop_vae(ds_train) test_batch <- as_iterator(ds_test) %>% iter_next() encoded <- encoder(test_batch[[1]][1:1000]) test_var <- tf\(math\)reduce_variance(encoded, axis = 0L) print(test_var %>% as.numeric() %>% round(5)) } “` Experimental setup and data The idea was to add white noise to a deterministic series. This time, the Roessler system was chosen, mainly for the prettiness of its attractor, apparent even in … Read more FNN-VAE for noisy time series forecasting

Fuzzy Name Matching with Machine Learning

Stacking Phonetic Algorithms, String Metrics and Character Embedding for Semantic Name Matching Photo by Thom Masat on Unsplash It is often the case when working with external data that a common identifier such as a numerical key does not exist. In place of a unique identifier, a person’s full name can be used as part … Read more Fuzzy Name Matching with Machine Learning

Hundreds of Companies Are Having a Party with my Cookie Data and I wasn’t Invited

As an added note, QuantCast does not require you to tell you who you are or give you any personal details to identify you when requesting access to your data. Given they are tracing an “mc” cookie placed in your browser, they already know who you are. That certainly sets the tone. In this post, … Read more Hundreds of Companies Are Having a Party with my Cookie Data and I wasn’t Invited

Object Distance Measurement By Stereo Vision

Stereovision, Trinagulation, Feature Correspondance, Disparity Map In the modern industrial automation production process, computer vision is becoming one of the key technologies to improve production efficiency and inspect product quality, such as automatic detection of machine parts, intelligent robot control, automatic monitoring of production lines, etc. In the fields of defense and aerospace, computer vision … Read more Object Distance Measurement By Stereo Vision

What the Null Hypothesis Really Means— According to a Statistics Professor

A simple explanation for statistics most confusing concept Dr. Robert Montgomery is a research assistant professor and biostatistician at the University of Kansas Medical Center. When teaching graduate level statistics courses, he likes to ask students a simple question: “what does the null hypothesis mean?” This is a surprisingly challenging question with a very specific … Read more What the Null Hypothesis Really Means— According to a Statistics Professor

Google breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle AI

Table 1: All of these MLPerf submissions trained from scratch in 33 seconds or faster on Google’s new ML supercomputer.2 Training at scale with TensorFlow, JAX, Lingvo, and XLA Training complex ML models using thousands of TPU chips required a combination of algorithmic techniques and optimizations in TensorFlow, JAX, Lingvo, and XLA. To provide some … Read more Google breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle AI

Multi-agent reinforcement learning and the future of AI

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Reinforcement learning has gotten a lot of attention recently, thanks in large part to systems like AlphaGo and AlphaZero, which … Read more Multi-agent reinforcement learning and the future of AI

A Cloud developer advocate’s top infrastructure sessions at Next OnAirA Cloud developer advocate’s top infrastructure sessions at Next OnAirDeveloper Advocate

It’s week 3 of Google Cloud Next ’20: OnAir, and this week is all about infrastructure and operations. This is an exciting space where we have both mature services and rapid improvements. We have a bunch of great talks this week and I hope you will enjoy them and learn a lot! After checking out … Read more A Cloud developer advocate’s top infrastructure sessions at Next OnAirA Cloud developer advocate’s top infrastructure sessions at Next OnAirDeveloper Advocate

In hybrid and multi-cloud environments, the network really mattersIn hybrid and multi-cloud environments, the network really mattersProduct Marketing Manager, Google Cloud

According to recent research1, among organizations adopting public cloud, a full 70% say that they will use a combination of public cloud and on-premises data centers. At the same time, 21% of business users reported that poor network connectivity negatively impacts web or cloud-based application performance2. How can you ensure that your hybrid or multi-cloud … Read more In hybrid and multi-cloud environments, the network really mattersIn hybrid and multi-cloud environments, the network really mattersProduct Marketing Manager, Google Cloud

Deep Learning on Dynamic Graphs

Photo by Florian Olivo on Unsplash By Adrian Yijie Xu — 13 min read Online learning methods are a dynamic family of algorithms powering many of the latest achievements in reinforcement learning over the past decade. Belonging to the sample-based learning class of reinforcement learning approaches, online learning methods allow for the determination of state … Read more Deep Learning on Dynamic Graphs

Genetic Algorithm Based Approach for Robotic Controllers

implemented in Python Photo by Eric Krull on Unsplash Today, we will solve a real-world problem is to design a robotic controller. There are many techniques that can be used to solve this problem. Some of them include genetic algorithm (GA), particle swarm optimization and neural network (NN). What we need to do is to … Read more Genetic Algorithm Based Approach for Robotic Controllers

Online R trainings

You want to use the R programming language to transform data into strategic knowledge? You are looking for the optimal introduction to work with R? You would like to participate in a training that really helps you despite your home office and limited travel possibilities? Then register for our online R trainings. In our most … Read more Online R trainings

Azure Cost Management + Billing updates – July 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints, and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in. We’re always looking … Read more Azure Cost Management + Billing updates – July 2020

Hybrid Rule-Based Machine Learning With scikit-learn

There are many ways in which we can integrate deterministic rules into our machine learning pipeline. Adding rules progressively as data pre-processing steps might seem intuitive, but this would not suit our goal. Preferably, we aim to leverage the concept of abstraction by adopting object-oriented programming (OOP) to generate a novel ML model class. This … Read more Hybrid Rule-Based Machine Learning With scikit-learn

Introducing Profiler: Select the best AI model for your target device — no deployment required

Profiler is a simulator for profiling the performance of Machine Learning (ML) model scripts. Profiler can be used during both the training and inference stages of the development pipeline. It is particularly useful for evaluating script performance and resource requirements for models and scripts being deployed to edge devices. Profiler is part of Auptimizer. You … Read more Introducing Profiler: Select the best AI model for your target device — no deployment required

N Is The Enemy

Big Population + Big Data = Critical Failure Photo by Joshua Coleman on Unsplash We’ve been sold a false promise. Somewhere down the line we tricked ourselves into thinking that truth was a side-effect of volume. “If we collect enough data,” we said, “our overwhelming statistical power will blow a hole in the unknown.” Instead, … Read more N Is The Enemy

A, B, Cs… of Deep Learning Hyperparameters

Deep learning is currently in the news because of its accuracy and the controls over the models we have. With lots of programming software as TensorFlow, Keras, Caffe, and a huge list in the way simplified the work of programming for deep learning. Now we do not have to worry about backpropagation steps, weight updations, … Read more A, B, Cs… of Deep Learning Hyperparameters

The Scourge of Analytical Variability in AI Systems

In the ICT industry, engineers are increasingly moving towards building AI systems to add value to customers by solving existing problems and making processes more efficient. With the seemingly successful application of deep learning, experts are opining, with conviction, that the AI winter has finally come to an end. But, there are at least three … Read more The Scourge of Analytical Variability in AI Systems

Strategy for improved the characterisation of human metabolic phenotyping using COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS)

We have recently published a strategy for improving human metabolic phenotyping using Combined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS). The COMPASS approach is developed within R environment. The open access manuscript can be found here. In this blog, we describe how to get started. Characterising and understanding how human phenotypes relate to population … Read more Strategy for improved the characterisation of human metabolic phenotyping using COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS)

10 excellent GitHub repositories for every Java developer

Source: GitHub Software Design Patterns are the reusable, general solutions for the Software Engineers to solve recurring problems in Software Design. It also gives a common vocabulary to discuss the common issue among Software Engineers and Architects. Design patterns can improve Code Quality and coding velocity by using the battle-tested and proven development paradigms. The … Read more 10 excellent GitHub repositories for every Java developer

A Practical Introduction to Early Stopping in Machine Learning

Next, let’s create X and y. Keras and TensorFlow 2.0 only take in Numpy array as inputs, so we will have to convert DataFrame back to Numpy array. # Creating X and yX = df[[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]]# Convert DataFrame into np arrayX = np.asarray(X)y = df[[‘label_setosa’, … Read more A Practical Introduction to Early Stopping in Machine Learning

Web Scraping: Scraping Table Data

In this post, we will learn how to scrape table data from the web using Python. Simplified. Photo by Carlos Muza on Unsplash Web Scraping is the most important concept of data collection. In Python, BeautifulSoup, Selenium and XPath are the most important tools that can be used to accomplish the task of web scraping. … Read more Web Scraping: Scraping Table Data

Installing and Running Ubuntu on a 2015-ish MacBook Air

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. So a few months ago kiddo one dropped an … Read more Installing and Running Ubuntu on a 2015-ish MacBook Air

Let the snail crawl: Animated density curves

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Previously, I’ve plotted a ridgeline based on a variable’s density through time. It … Read more Let the snail crawl: Animated density curves

Best Libraries for Geospatial Data Visualisation in Python

1. PyViz/HoloViz(Geoviews, Datashader, HvPlot) Holoviz maintained libraries have all data visualisations you might need, including dashboards and interactive visualisation. Geoviews, in particular, with its dedicated Geospatial data visualisation library, provides an easy to use and convenient geospatial data. GeoViews is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic … Read more Best Libraries for Geospatial Data Visualisation in Python

Waffle Charts Using Python’s Matplotlib

How to draw a waffle chart in Python using the Matplotlib library Source: Unsplash by Pez González Waffle charts can be an interesting element in a dashboard. It is especially useful to display the progress towards goals and seeing how each item contributes to the whole. But waffle charts are not very useful if you … Read more Waffle Charts Using Python’s Matplotlib

ttdo 0.0.6: Bugfix

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet A bugfix release of our (still small) ttdo … Read more ttdo 0.0.6: Bugfix

Amazon RDS Proxy is Generally Available

Applications communicate with databases by establishing connections, which consume memory and compute resources on the database server. Many applications, including those built on modern serverless architectures, can open a large number of database connections or frequently open and close connections. This can stress the database memory and compute, leading to slower performance and limited application … Read more Amazon RDS Proxy is Generally Available

Pausing entitlements now available in AWS Elemental MediaConnect

AWS Elemental MediaConnect is a reliable, secure, and flexible transport service for live video that enables broadcasters and content owners to build live video workflows and securely share live content with partners and customers. MediaConnect helps customers who run 24×7 TV channels or stream live events transport high-value live video streams into, through, and out … Read more Pausing entitlements now available in AWS Elemental MediaConnect

Using Just One Line of Code to Write to a Relational Database

PYTHON AND SQL It’ll make adding to a database that much easier. Art by Instagram @softie__art When writing data from a Pandas DataFrame to a SQL database, we will be using the DataFrame.to_sql method. While you could execute an INSERT INTO type of SQL query, the native Pandas method makes the process even easier. Here’s … Read more Using Just One Line of Code to Write to a Relational Database

TensorFlow vs PyTorch — Convolutional Neural Networks (CNN)

Implementation of CNN in both TensorFlow and PyTorch to a very famous dataset and comparison of the results In my previous article, I had given the implementation of a Simple Linear Regression in both TensorFlow and PyTorch frameworks and compared their results. In this article, we shall go through the application of a Convolutional Neural … Read more TensorFlow vs PyTorch — Convolutional Neural Networks (CNN)

Creating a Racing Bar Chart using the Covid-19 Dataset

In my previous two articles on data analytics using the Covid-19 dataset, I first The Covid-19 dataset is a good candidate for exploring and understanding data analytics and visualisation. In this article, I will show you how to create a dynamic chart in matplotlib. In particular, I will create a racing bar chart to dynamically … Read more Creating a Racing Bar Chart using the Covid-19 Dataset