Premade AI in the Cloud with Python

Now that we’ve been through the boring stuff, we are ready to use the resources and see what they are capable of. Creating resources has given us access to use their respective REST-APIs. There is a whole lot we can do with what we just made. Source In short, to access these services we’ll be … Read more

How to Use Different Data Models and Visual Representation of Databases

Beginner course to Databases and SQL As you get into the Databases and Data Science, the first thing that you have to master is the relations between entities in your database. That is important because the data that you use has to be absolutely efficient for its further implementations. Photo by JESHOOTS.COM on Unsplash Let’s … Read more

Top 5 Natural Language Processing Python Libraries for Data Scientist.

A Complete Overview Of popular python libraries for Natural Language Processing in a Non-verbose Manner. Around more than 70 percent of the data available on the internet is not in a structured format. since data is very essential organ for the data science, researchers are really worked hard to push out our limits from structured … Read more

Generating Titles for Kaggle Kernels with LSTM

Small Deep Learning Project with PyTorch When I first found out about sequence models, I was amazed by how easily we can apply them to a wide range of problems: text classification, text generation, music generation, machine translation, and others. In this article, I would like to focus on the step-by-step process of creating a … Read more

How to Build Software like Tony Stark

That’s gonna be you by the end of this journey This article will walk you through a process in which you can either get started with projects and break the cycle of just learning endlessly or change your ways and improve your workflow and efficiency while coding, so that you can stop feeling that this … Read more

Copilot — driving assistance

Lane and obstacle detection for active assistance during driving. Fig 1 Collision and lane change autonomous warning Neel turned back to point out the tablet to his daughter. His family of three were driving down to the coast for the long weekend. They GPS had rerouted them to the state highway because of a congestion … Read more

Basics of AI Product Management: Orchestrating the ML Workflow

The theory of ML is hard, the application is even harder! I’ve spent the last few years applying data science in different aspects of business. Some use cases are internal machine learning (ML) tools, analytics reports, data pipelines, prediction APIs, and more recently, end-to-end ML products. I’ve had my fair share of successful and unsuccessful … Read more

Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining … Read more

Categories R Tags ExcerptFavorite

Easy Bar Charts from Simple to Sophisticated

Tell your story with data visualizations Imagine the simplest code possible to generate data visualizations. Let’s start with a visualization we have all seen, and all need, the bar chart. Beach bar chart, horizontal orientation. Each bar in a bar chart represents a category, or level, of a variable with relatively few unique values, such … Read more

What exactly is Kubernetes?

I had a very good journey into deploying a web application on Azure Kubernetes and I want to tell you all the secrets about this funny world. Since I discovered Docker in late 2015 I was impressed by the fact that deployment could have been processed like source code. Yes, I think this was the … Read more

Importance Sampling Introduction

Estimate Expectations from a Different Distribution Importance sampling is an approximation method instead of sampling method. It derives from a little mathematic transformation and is able to formulate the problem in another way. In this post, we are going to: Learn the idea of importance sampling Get deeper understanding by implementing the process Compare results … Read more

How To Deploy A Neural Network From Beirut

Beirut is Lebanon’s gorgeous capital and comes with the typical problems of a bustling city. On top of that it suffers from frequent power cuts and one of the slowest internet connections in the world. It is also where I spent my summer vacation and an ideal testing ground for the purpose of this article: … Read more

Implementing Prophet Time Series Forecasting Model

A step-by-step approach to predict the Bitcoin price for the dummies Photo by Aleksi Räisä on Unsplash Understanding time series data is very critical to any kinds of business. If you are working with numbers and analytics, more often than not, you will need to solve questions like how many customers will continue buying in … Read more

Dealing with Multiclass Data

Forest Cover Type Prediction Photo by Sergei Akulich on Unsplash Have you ever thought about what to do when you encounter a classification problem that consists of over three classes? How did you deal with multiclass data, and how did you evaluate your model? Was overfitting a challenge — and if so, how did you … Read more

Great Developers Never Stop Learning

Proof of Concepts (POC) As an architect I need to justify technical project decisions so I resort to developing POCs. They help me experience the challenges or benefits of the technology in question in order to provide forward looking research, as well as I get better at estimating (and not trivialise how long ‘easy’ tasks … Read more

Why the current AI gold rush must not fail

How our investment in the field has made it too important to fail Everyone is talking about the impeding dangers of artificial intelligence. From machines taking over our jobs, to Stephen Hawkins’ fear of the existential threat they pose to mankind, there’s a lot of people talking about what will happen if the current race … Read more

Enhancing Static Plots with Animations

Using gganimate to spice up ggplot2 visualisations This post aims to introduce you to animating ggplot2 visualisations in r using the gganimate package by Thomas Lin Pedersen. The post will visualise the theoretical winnings I would’ve had, had I followed the simple model to predict (or tip as it’s known in Australia) winners in the … Read more

Lesser known dplyr functions

The dplyr package is an essential tool for manipulating data in R. The “Introduction to dplyr” vignette gives a good overview of the common dplyr functions (list taken from the vignette itself): filter() to select cases based on their values. arrange() to reorder the cases. select() and rename() to select variables based on their names. mutate() and transmute() to add new variables that … Read more

Categories R Tags ExcerptFavorite

Amazon SageMaker Notebooks now export Jupyter logs to Amazon Cloudwatch

With this launch, you no longer need to log into your notebook terminal to access logs and can instead view and analyze the logs directly from CloudWatch. You can use the built-in functionality of CloudWatch to detect anomalies and also set alarms to be automatically notified based on specific conditions. Also, you have the benefit … Read more

Categories AWS ExcerptFavorite

Seeking postdoc (or contractor) for next generation Stan language research and development

The Stan group at Columbia is looking to hire a postdoc* to work on the next generation compiler for the Stan open-source probabilistic programming language. Ideally, a candidate will bring language development experience and also have research interests in a related field such as programming languages, applied statistics, numerical analysis, or statistical computation. The language … Read more

Categories R Tags ExcerptFavorite

Container monitoring for Amazon ECS, EKS, and Kubernetes is now available in Amazon CloudWatch

CloudWatch Container Insights helps you troubleshoot infrastructure and performance issues in your containers environment to increase development velocity.   It’s easy to get started. Start collecting detailed performance metrics, logs, and meta-data from your containers and clusters in just a few clicks by following these steps in the CloudWatch Container Insights documentation. Favorite

Categories AWS ExcerptFavorite

Cracking an 82-year-old stock trading board game using Monte Carlo simulation

Board games are fun. Stock trading is fun. Putting them together, we get the 1937 classic from Copp-Clark Publishing, Stock Ticker. The core gameplay is simple: buy and sell stocks from the broker in a fluctuating market and try to finish with more total assets than everyone around the table. If you’ve never heard of … Read more

Jacobian regularization

Generalization of L1 and L2 regularization L1 and L2 regularization, also known as Lasso and Ridge, are well known regularization techniques, used for a variety of algorithms. The idea of these methods is to impose smoothness of the prediction function and avoid overfitting. Consider this example of Polynomial Regression: In this example we fit polynomials … Read more

How to do Deep Learning for Java?

Some time ago I came across this life-cycle management tool (or cloud service) called Valohai and I was quite impressed by its user-interface and simplicity of design and layout. I had a good chat about the service at that time with one of the members of Valohai and was given a demo. Previous to that … Read more

Why R?

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was working with our copy editor on Appendix A … Read more

Categories R Tags ExcerptFavorite

How to work with BIG Geospatial Data?

I’ve been working as a freelance webGIS developer for over three years now and before that, I did a bachelors degree in GeoInformatics so I have had to work with geospatial data a lot. It is not uncommon for geospatial data to get large, especially when you are dealing with Raster data. A few gigabytes … Read more

Installing Apache PySpark on Windows 10

Apache Spark Installation Instructions for Product Recommender Data Science Project Over the last few months, I was working on a Data Science project which handles a huge dataset and it became necessary to use the distributed environment provided by Apache PySpark. I struggled a lot while installing PySpark on Windows 10. So I decided to … Read more

AWS DataSync is now available in the Middle East (Bahrain) Region

DataSync is an online data transfer service that provides you a simple way to automate and accelerate copying data over the Internet or AWS Direct Connect between Network File System (NFS) or Server Message Block (SMB) file servers, Amazon Simple Storage Service (Amazon S3) buckets, and Amazon Elastic File System (Amazon EFS) file systems. You … Read more

Categories AWS ExcerptFavorite

Bigram Analysis of Democratic Debates

This tutorial will mainly focus on ggplot and bigrams, but it does gloss over clustering for a heatmap. This project started a while back, tweeting the plots at the beginning of this month. Life happens I suppose. Bought a new bike, had a birthday, yaddayadda. Better late then never? I want to preface this with … Read more

Categories R Tags ExcerptFavorite

Amazon EKS Available in Bahrain Region

Amazon EKS is a highly-available, scalable, and secure Kubernetes service. Amazon EKS runs the Kubernetes management infrastructure (control plane) for you and is certified Kubernetes conformant so you can use existing tooling and plugins from the Kubernetes community and AWS partners.  Favorite

Categories AWS ExcerptFavorite

Flower Species Classifier

Build an image classifier to recognize 102 different species of flowers • Artificial Intelligence • Deep Learning • Convolutional Neural Networks• Python • PyTorch • Numpy • Matplotlib • Jupyter Notebooks In this article, I give an overview of the project I developed that led me to be awarded a scholarship to the Deep Learning … Read more

It is Time for CRAN to Ban Package Ads

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. NPM (a popular Javascript package repository) just banned package advertisements. … Read more

Categories R Tags ExcerptFavorite

NLP Text Preprocessing: A Practical Guide and Template

Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. To illustrate the importance of text preprocessing, let’s consider a task on sentiment analysis for customer reviews. Suppose a customer feedbacked that “their customer support service … Read more

Break up with Excel: Intro and Advanced R Data Science Courses at MSACL.org Salzburg Austria, September 21–23, 2019

[This article was first published on The Lab-R-torian, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. MSACL Conference There are two RStats Data Science courses happening in Salzburg … Read more

Categories R Tags ExcerptFavorite

New release of Cloud Storage Connector for Hadoop: Improving performance, throughput and moreNew release of Cloud Storage Connector for Hadoop: Improving performance, throughput and moreSoftware EngineerCloud Data Engineer

Cloud Storage Connector is an open source Apache 2.0 implementation of an HCFS interface for Cloud Storage. Architecturally, it is composed of four major components: In the following sections, we highlight a few of the major features in this new release of Cloud Storage Connector. For a full list of settings and how to use … Read more

How to quickly solve machine learning forecasting problems using Pandas and BigQueryHow to quickly solve machine learning forecasting problems using Pandas and BigQueryML Solutions EngineerMachine Learning Solutions Engineer

We pass the table name that contains our data, the value name that we are interested in, the window size (which is the input sequence length), the horizon of how far ahead in time we skip between our features and our labels, and the labels_size (which is the output sequence length). Labels size is equal … Read more

Kubernetes security audit: What GKE and Anthos users need to knowKubernetes security audit: What GKE and Anthos users need to knowProduct Manager, Container security

Performing this security audit was a big effort on behalf of the CNCF, which has a mandate to improve the security of its projects via its Best Practices Badge Program. To take Kubernetes through this first security audit, the Kubernetes Steering Committee formed a working group, developed an RFP, worked with vendors, reviewed and then … Read more

Expanding your patent set with ML and BigQueryExpanding your patent set with ML and BigQueryData Scientist, Global PatentsHead of Data Science, Global Patents at Google

2. Organize the seed setWith the input set determined and the embedding representations retrieved, you have a few options for determining similarity to the seed set of patents. Let’s go through each of the options in more detail. 1. Calculating an overall embedding point—centroid, medoid, etc.— for the entire input set and performing similarity to … Read more

From scratch to search: setup Elasticsearch under 4 minutes, load a CSV with Python and read…

{“_index” : “test-csv”,”_type” : “_doc”,”_id” : “1”,”_version” : 1,”result” : “created”,”_shards” : {“total” : 3,”successful” : 3,”failed” : 0},”_seq_no” : 0,”_primary_term” : 1} Document indexing… checked! We have indexed our first document to our test-csv index, all shards responded correctly. We have indexed a very simple json document with only one field, but you can … Read more

SVD: Where Model Tuning Goes Wrong

1 — Dataset Prerequisites from surprise import Datasetdata = Dataset.load_builtin(‘ml-100k’) Surprise is a scikit package for building and analysing recommender systems maintained by Nicolas Hug. Reading its documentation page, an objective of the package is to “alleviate the pain of dataset handling”. One way it does so is through built-in datasets. Movie-Lens 100k is one … Read more