TextBlob Spelling Correction

What is TextBlob? TextBlob is a Python library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. Why TextBlob? NLU is a subset of NLP in which an unstructured data or sentence is being converted … Read more

Surprising Cost Based Route Optimization Results

Real world cost based route optimizations. Picture by Free-Photos from Pixabay. Cost based route optimizations are different from route optimizations that take into account distance or time only. A cost based route optimization is useful in the real-world because it meets the business objective of any company trying to streamline costs and improve efficiency. By … Read more

Crafting a Machine Learning Model to Predict Student Retention Using R

Modeling a ML Experiment to Predict Student Retention in R Student Retention is one of the most important indicators in Higher Education. Therefore, Predictive Analytics plays a crucial role in that regard. First and foremost, let’s start by defining what student retention is, at least in the scope of this article. We’ll define it, as … Read more

Get Your Own Data — Building a Scalable Web-Scraper with AWS

As you see from the diagram, I am using CloudWatch, Lambda, Batch, S3. I am also using SNS for notifications triggered by the “Batch-Jobs-Monitor.” Here is my reasoning for the services I chose: CloudWatch has “Rules” which behave as Cron Jobs and can pass JSON payloads to lambda functions. This enabled me to submit multiple … Read more

Time for Sustainable Development

The coronavirus pandemic of 2019 and 2020 and the civil rights crisis of 2020, led by the Black Lives Matter movement, have highlighted some of the major limitations of our society today. While our economy is growing at an increasingly rapid pace, many areas of development remain under-considered. A global vision of the interconnection between … Read more

Database Migration using AWS Data Migration Service (DMS) — A few lessons learnt along the way

Heterogeneous Migration of Oracle Database to Amazon Aurora PostgreSQL Image courtesy Pixabay Recently, we migrated a fairly large on-premises Oracle Database to Amazon Aurora PostgreSQL. To start off heterogeneous migrations between different database platforms are never easy. Coupling this with migrating the database to the cloud definitely adds on to the challenge. The intent of … Read more

JSON and APIs with Python

For this tutorial, we will use the free API found at covid19api.com that provides data on the coronavirus. We will find the total number of confirmed cases in each country and then we will create a pandas dataframe that contains that information. So let’s begin! Inspecting the API If you go to the documentation page … Read more

FNN-VAE for noisy time series forecasting

“) training_loop_vae(ds_train) test_batch <- as_iterator(ds_test) %>% iter_next() encoded <- encoder(test_batch[[1]][1:1000]) test_var <- tf\(math\)reduce_variance(encoded, axis = 0L) print(test_var %>% as.numeric() %>% round(5)) } “` Experimental setup and data The idea was to add white noise to a deterministic series. This time, the Roessler system was chosen, mainly for the prettiness of its attractor, apparent even in … Read more

Categories R Tags ExcerptFavorite

Fuzzy Name Matching with Machine Learning

Stacking Phonetic Algorithms, String Metrics and Character Embedding for Semantic Name Matching Photo by Thom Masat on Unsplash It is often the case when working with external data that a common identifier such as a numerical key does not exist. In place of a unique identifier, a person’s full name can be used as part … Read more

Writing lambda Expressions in Python

Photo by Chris Ried on Unsplash Imagine we are coding and need to write a simple function. However, we are only going to be using this function once and thus it seems unnecessary to create an entire function with the def keyword for that one task. Well, that’s where lambda expressions come in. What are … Read more

Object Distance Measurement By Stereo Vision

Stereovision, Trinagulation, Feature Correspondance, Disparity Map In the modern industrial automation production process, computer vision is becoming one of the key technologies to improve production efficiency and inspect product quality, such as automatic detection of machine parts, intelligent robot control, automatic monitoring of production lines, etc. In the fields of defense and aerospace, computer vision … Read more

What the Null Hypothesis Really Means— According to a Statistics Professor

A simple explanation for statistics most confusing concept Dr. Robert Montgomery is a research assistant professor and biostatistician at the University of Kansas Medical Center. When teaching graduate level statistics courses, he likes to ask students a simple question: “what does the null hypothesis mean?” This is a surprisingly challenging question with a very specific … Read more

Google breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle breaks AI performance records in MLPerf with world’s fastest training supercomputerGoogle AI

Table 1: All of these MLPerf submissions trained from scratch in 33 seconds or faster on Google’s new ML supercomputer.2 Training at scale with TensorFlow, JAX, Lingvo, and XLA Training complex ML models using thousands of TPU chips required a combination of algorithmic techniques and optimizations in TensorFlow, JAX, Lingvo, and XLA. To provide some … Read more

Broadcasting PySpark Accumulators

And how to manage them Photo by Greg Rakozy on Unsplash In this post, I am going to discuss an interesting pattern with a broadcast that comes in handy. Before going into more details, let us refresh what Spark Accumulators are. A shared variable that can be accumulated, i.e., has a commutative and associative “add” … Read more

Multi-agent reinforcement learning and the future of AI

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Reinforcement learning has gotten a lot of attention recently, thanks in large part to systems like AlphaGo and AlphaZero, which … Read more

A Cloud developer advocate’s top infrastructure sessions at Next OnAirA Cloud developer advocate’s top infrastructure sessions at Next OnAirDeveloper Advocate

It’s week 3 of Google Cloud Next ’20: OnAir, and this week is all about infrastructure and operations. This is an exciting space where we have both mature services and rapid improvements. We have a bunch of great talks this week and I hope you will enjoy them and learn a lot! After checking out … Read more

In hybrid and multi-cloud environments, the network really mattersIn hybrid and multi-cloud environments, the network really mattersProduct Marketing Manager, Google Cloud

According to recent research1, among organizations adopting public cloud, a full 70% say that they will use a combination of public cloud and on-premises data centers. At the same time, 21% of business users reported that poor network connectivity negatively impacts web or cloud-based application performance2. How can you ensure that your hybrid or multi-cloud … Read more

Deep Learning on Dynamic Graphs

Photo by Florian Olivo on Unsplash By Adrian Yijie Xu — 13 min read Online learning methods are a dynamic family of algorithms powering many of the latest achievements in reinforcement learning over the past decade. Belonging to the sample-based learning class of reinforcement learning approaches, online learning methods allow for the determination of state … Read more

Online R trainings

You want to use the R programming language to transform data into strategic knowledge? You are looking for the optimal introduction to work with R? You would like to participate in a training that really helps you despite your home office and limited travel possibilities? Then register for our online R trainings. In our most … Read more

Categories R Tags ExcerptFavorite

Hybrid Rule-Based Machine Learning With scikit-learn

There are many ways in which we can integrate deterministic rules into our machine learning pipeline. Adding rules progressively as data pre-processing steps might seem intuitive, but this would not suit our goal. Preferably, we aim to leverage the concept of abstraction by adopting object-oriented programming (OOP) to generate a novel ML model class. This … Read more

Introducing Profiler: Select the best AI model for your target device — no deployment required

Profiler is a simulator for profiling the performance of Machine Learning (ML) model scripts. Profiler can be used during both the training and inference stages of the development pipeline. It is particularly useful for evaluating script performance and resource requirements for models and scripts being deployed to edge devices. Profiler is part of Auptimizer. You … Read more

N Is The Enemy

Big Population + Big Data = Critical Failure Photo by Joshua Coleman on Unsplash We’ve been sold a false promise. Somewhere down the line we tricked ourselves into thinking that truth was a side-effect of volume. “If we collect enough data,” we said, “our overwhelming statistical power will blow a hole in the unknown.” Instead, … Read more

What Hackathons have taught me!!

I like to participate in Hackathons to test my skills and also to learn new skills. I find this way of learning quite effective. During my second year in college, I used to think that doing an online course on any of the platforms would be enough. But soon, I would forget what I learned … Read more

A, B, Cs… of Deep Learning Hyperparameters

Deep learning is currently in the news because of its accuracy and the controls over the models we have. With lots of programming software as TensorFlow, Keras, Caffe, and a huge list in the way simplified the work of programming for deep learning. Now we do not have to worry about backpropagation steps, weight updations, … Read more

The Scourge of Analytical Variability in AI Systems

In the ICT industry, engineers are increasingly moving towards building AI systems to add value to customers by solving existing problems and making processes more efficient. With the seemingly successful application of deep learning, experts are opining, with conviction, that the AI winter has finally come to an end. But, there are at least three … Read more

Strategy for improved the characterisation of human metabolic phenotyping using COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS)

We have recently published a strategy for improving human metabolic phenotyping using Combined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS). The COMPASS approach is developed within R environment. The open access manuscript can be found here. In this blog, we describe how to get started. Characterising and understanding how human phenotypes relate to population … Read more

Categories R Tags ExcerptFavorite

10 excellent GitHub repositories for every Java developer

Source: GitHub Software Design Patterns are the reusable, general solutions for the Software Engineers to solve recurring problems in Software Design. It also gives a common vocabulary to discuss the common issue among Software Engineers and Architects. Design patterns can improve Code Quality and coding velocity by using the battle-tested and proven development paradigms. The … Read more

Solving JigSaw using Neural Nets

Solving a 3×3 grid puzzle is extremely difficult. The following are possible combinations of these puzzles. 2×2 puzzle = 4! = 24 combinations3x3 puzzle = 9! = 362880 comb’ns To solve a 3×3 puzzle the network has to predict one correct combination out of 362880. This is one more reason why 3×3 the puzzle is … Read more

A Practical Introduction to Early Stopping in Machine Learning

Next, let’s create X and y. Keras and TensorFlow 2.0 only take in Numpy array as inputs, so we will have to convert DataFrame back to Numpy array. # Creating X and yX = df[[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]]# Convert DataFrame into np arrayX = np.asarray(X)y = df[[‘label_setosa’, … Read more

Web Scraping: Scraping Table Data

In this post, we will learn how to scrape table data from the web using Python. Simplified. Photo by Carlos Muza on Unsplash Web Scraping is the most important concept of data collection. In Python, BeautifulSoup, Selenium and XPath are the most important tools that can be used to accomplish the task of web scraping. … Read more

Installing and Running Ubuntu on a 2015-ish MacBook Air

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. So a few months ago kiddo one dropped an … Read more

Categories R Tags ExcerptFavorite

Let the snail crawl: Animated density curves

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Previously, I’ve plotted a ridgeline based on a variable’s density through time. It … Read more

Categories R Tags ExcerptFavorite

covid19italy v0.3.0 is now on CRAN

[This article was first published on Rami Krispin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Version 0.3.0 of the covid19italy is now available on CRAN. The package … Read more

Categories R Tags ExcerptFavorite

An Example With accumulate()

[This article was first published on R on Data & The World, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As with most useful (collections of) libraries, the … Read more

Categories R Tags ExcerptFavorite

Best Libraries for Geospatial Data Visualisation in Python

1. PyViz/HoloViz(Geoviews, Datashader, HvPlot) Holoviz maintained libraries have all data visualisations you might need, including dashboards and interactive visualisation. Geoviews, in particular, with its dedicated Geospatial data visualisation library, provides an easy to use and convenient geospatial data. GeoViews is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic … Read more

GPT-3, OpenAI’s Revolution

OpenAI is one of the clear leaders in AI right now, and it certainly leads the way in natural language systems. GPT, its text generation algorithm, has shown an uncanny ability to generate human-like text, revolutionising the domain of text generation. Its latest version, GPT-3, is now able to go beyond that, being able to … Read more

Waffle Charts Using Python’s Matplotlib

How to draw a waffle chart in Python using the Matplotlib library Source: Unsplash by Pez González Waffle charts can be an interesting element in a dashboard. It is especially useful to display the progress towards goals and seeing how each item contributes to the whole. But waffle charts are not very useful if you … Read more

ttdo 0.0.6: Bugfix

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet A bugfix release of our (still small) ttdo … Read more

Categories R Tags ExcerptFavorite

Amazon RDS Proxy is Generally Available

Applications communicate with databases by establishing connections, which consume memory and compute resources on the database server. Many applications, including those built on modern serverless architectures, can open a large number of database connections or frequently open and close connections. This can stress the database memory and compute, leading to slower performance and limited application … Read more

Categories AWS ExcerptFavorite

Pausing entitlements now available in AWS Elemental MediaConnect

AWS Elemental MediaConnect is a reliable, secure, and flexible transport service for live video that enables broadcasters and content owners to build live video workflows and securely share live content with partners and customers. MediaConnect helps customers who run 24×7 TV channels or stream live events transport high-value live video streams into, through, and out … Read more

Categories AWS ExcerptFavorite

Using Just One Line of Code to Write to a Relational Database

PYTHON AND SQL It’ll make adding to a database that much easier. Art by Instagram @softie__art When writing data from a Pandas DataFrame to a SQL database, we will be using the DataFrame.to_sql method. While you could execute an INSERT INTO type of SQL query, the native Pandas method makes the process even easier. Here’s … Read more

TensorFlow vs PyTorch — Convolutional Neural Networks (CNN)

Implementation of CNN in both TensorFlow and PyTorch to a very famous dataset and comparison of the results In my previous article, I had given the implementation of a Simple Linear Regression in both TensorFlow and PyTorch frameworks and compared their results. In this article, we shall go through the application of a Convolutional Neural … Read more