Suicide in the 21st Century (Part 2)

Photo by Paola Chaaya on Unsplash Welcome back! If you didn’t catch part 1 you can find it below: As mentioned, part 2 will incorporate machine learning, or, more specifically, machine learning with K-Means in python. Before we get into that, here’s a quick recap of part 1 if you missed it. Recap In part 1, … Read more Suicide in the 21st Century (Part 2)

Is It Time for a Data Scientist Code of Ethics?

As news broke of a new app called DeepNude, which allowed anyone to alter a photo of a woman to make them appear nude, I found myself deeply disturbed by the speed at which deepfakes are evolving. Such a tangible and accessible tool highlights the darker side of AI, computer vision, and other machine learning … Read more Is It Time for a Data Scientist Code of Ethics?

First Democratic Primary Debate: Preliminary Sentiment Analysis

Julián Castro and Tulsi Gabbard’s performances on the first night seem to have been received well. There is a definite uptick in engagement from users that is fairly positive. After the second debate this interest can be seen to die down as the conversation moved on but it will be interesting to see how their … Read more First Democratic Primary Debate: Preliminary Sentiment Analysis

Data visualization for EDA (exploratory data analysis)

(By: Alberto Lucas López, National Geographic, Source) Please take 1 minute to look at the picture above. The canvas painting, “Frames of Mind” presents the themes Pablo Picasso touched on through his tens of thousands of works. 12 major topics are dispersed on the whole picture, separated by using different shapes (the shapes highly related … Read more Data visualization for EDA (exploratory data analysis)

Positive or Negative? Spam or Not-spam? A simple Text classification problem using Python

What is sentiment analysis? Sentiment analysis the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral. In simpler words let’s say that it is when you have a text of review … Read more Positive or Negative? Spam or Not-spam? A simple Text classification problem using Python

Python, Performance, and GPUs

A status update for using GPU accelerators from Python This blogpost was delivered in talk form at the recent PASC 2019 conference. Slides for that talk are here. Executive Summary We’re improving the state of scalable GPU computing in Python. This post lays out the current status, and describes future work. It also summarizes and links … Read more Python, Performance, and GPUs

Lagged Variable Regressions and Truth

Dynamic regression models offer vast representative power but also bias risk Variables related to each other over adjacent time steps, originally in the context of dynamic Bayesian networks (Wikimedia user Guillaume.lozenguez, CC BY-SA 4.0) Turn a nonlinear structural time-series model into a regression on lagged variables using rational transfer functions and common filters, See bias in an … Read more Lagged Variable Regressions and Truth

The Political Twittersphere of the UK

An analysis of how the constituent parties and members of the UK government differ in their approach to social media Politics is once again dominating media coverage in the UK, with the Conservative leadership race to determine the next Prime Minister generating a great deal of attention. A growing trend in the political world of 2019, … Read more The Political Twittersphere of the UK

Understanding Learning Rate

Originally published at OpenGenus IQ. When building a deep learning project the most common problem we all face is choosing the correct hyper-parameters (often known as optimizers). This is critical as the hyper-parameters determine the expertise of the machine learning model. In Machine Learning (ML hereafter), a hyper-parameter is a configuration variable that’s external to … Read more Understanding Learning Rate

Quick hit: ‘dig’-ging Into r-project.org DNS Records with {processx}

The r-project.org domain had some temporary technical difficulties this week (2019-29) that made reaching R-related resources problematic for a bunch of folks for a period of time. Incidents like this underscore the need for regional and network diversity when it comes to ensuring the availability of DNS services. That is, it does no good if … Read more Quick hit: ‘dig’-ging Into r-project.org DNS Records with {processx}

A Unified Model for Text Mining

Automatic information retrieval across all domains made possible through AI Natural Language Processing is an emerging field in the sphere of Artificial Intelligence and research has demonstrated that computers can complete tasks thought to be solely possible by humans. Major evolution in subtasks of linguistic processors and machine learning algorithms provide all the pieces needed for … Read more A Unified Model for Text Mining

Building a Search Engine with BERT and TensorFlow

In this experiment, we will use a pre-trained BERT model checkpoint to build a general-purpose text feature extractor. T-SNE decomposition of BERT text representations (Reuters-21578 benchmark, 6 classes) These things are sometimes referred to as Natural Language Understanding (NLU) modules, because the features they extract are relevant for a wide array of downstream NLP tasks. One … Read more Building a Search Engine with BERT and TensorFlow

Revisited: Forecasting Last Christmas Search Volume

It is June and nearly half of the year is over, marking the middle between Christmas 2018 and 2019. Last year in autumn, I’ve published a blog post about predicting Wham’s „Last Christmas“ search volume using Google Trends data with different types of neural network architectures. Of course, now I want to know how good … Read more Revisited: Forecasting Last Christmas Search Volume

How Do Psychometric Test Results Vary Across Age, Race and Gender?

How is religion related to stress scores? One of the tests available through the Open Source Psychometrics Project is called the DASS inventory, which stands for Depression Anxiety Stress Scales. After filtering out all invalid responses (those who responded incorrectly to validity questions), I plotted the results for the remaining 34,583 test results and conducted a … Read more How Do Psychometric Test Results Vary Across Age, Race and Gender?

Building an ML application using MLlib in Pyspark

Introduction Apache Spark is one of the on-demand big data tools which is being used by many companies around the world. Its ability to do In-Memory computation and Parallel-Processing are the main reasons for the popularity of this tool. Spark Stack MLlib is a scalable Machine learning library which is present alongside other services like … Read more Building an ML application using MLlib in Pyspark

A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity Recognition

Image taken from https://givemechallenge.com/wp-content/uploads/2017/01/rasa-NLU.png The purpose of this article is to explore the new way to use Rasa NLU for intent classification and named-entity recognition. Since version 1.0.0, both Rasa NLU and Rasa Core have been merged into a single framework. As a results, there are some minor changes to the training process and the … Read more A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity Recognition

Calculating And Visualising Correlation Coefficients With Inspectdf

Calculating and visualising correlation coefficients with inspectdf (and why correlations matrices make life hard) In a previous post, we explored categorical data using the inspectdfpackage.In this post, we tackle a different exploratory problem of calculatingand visualising correlation coefficients. To install inspectdf fromCRAN, you’ll first need to run: installed.packages(“inspectdf”) We’ll begin the tutorial by loading the … Read more Calculating And Visualising Correlation Coefficients With Inspectdf

Vignette: Write & Read Multiple Excel files with purrr

Introduction This post will show you how to write and read a list of data tables to and from Excel with purrr, the functional programming package 📦 from tidyverse. In this example I will also use the packages readxl and writexl for reading and writing in Excel files, and cover methods for both XLSX and … Read more Vignette: Write & Read Multiple Excel files with purrr

Extending PyTorch with Custom Activation Functions

A Tutorial for PyTorch and Deep Learning Beginners Introduction Today deep learning is going viral and is applied to a variety of machine learning problems such as image recognition, speech recognition, machine translation, and others. There is a wide range of highly customizable neural network architectures, which can suit almost any problem when given enough … Read more Extending PyTorch with Custom Activation Functions

Reverse Image Search using Auto-Encoders

Have you ever wondered how Google Reverse Image search work? How do they scan all the images and return appropriate results very fast? In this blog, we will make our own lightweight reverse search engine. We will be using AutoEncoders for this purpose. What is an AutoEncoder? AutoEncoders are a special type of feed-forward neural … Read more Reverse Image Search using Auto-Encoders

AI-powered monopolies and the new world order

How AI’s reliance on data will empower tech giants and reshape sociopolitical order Photo by Pankaj Patel on Unsplash Artificial intelligence and new technologies will undoubtedly bring about tremendous changes, both positive and negative. They will have far-reaching impacts upon our daily lives, our work, security, and values. AI is arguably the most dangerous challenge humanity … Read more AI-powered monopolies and the new world order

Using Image Segmentation to Photoshop Images

In this new episode of doing fun things with Colab and Python, we will use Deep Learning to crop out objects from one image and paste them on to another. The deep learning part is Image Segmentation aka identifying objects in images which we can subsequently mask and ultimately crop out. We use the awesome … Read more Using Image Segmentation to Photoshop Images

News and Media Bias Detection using Machine Learning — A Potential Way to Find ‘Fake News’

Media. It’s been a dividing issue in America, with media outlets and newspaper reporting differently on key topics due to implicit biases within journalism. I’ve heard about implicit biases, but what are they? Essentially, articles are written by large newspaper organizations often reflect the authors’ inherit point of view, especially because news organizations tend to … Read more News and Media Bias Detection using Machine Learning — A Potential Way to Find ‘Fake News’

Log Transformation Base For Data Linearization Does Not Matter

What does it look like? We can also visualize this with some python code! import numpy as npimport matplotlib.pyplot as plt # Set up variables, x is 1-9 and y is e^xx = list(np.linspace(1,10,100))y= [np.exp(i) for i in x] # Plot the original variables – this is barebones plotting code, you # can find the more … Read more Log Transformation Base For Data Linearization Does Not Matter

A quick run-through of Holt-Winters, SARIMA and FB Prophet

The link above will take you to the notebook where the following code is sourced. The purpose of this post is to show a quick run-through of Holt Winters, SARIMA and FB Prophet. I am skipping anything about parameter tuning as that could be multiple posts on its own. First, lets get out imports for … Read more A quick run-through of Holt-Winters, SARIMA and FB Prophet

shinyApp(), runApp(), shinyAppDir(), and a fourth option

This title might sounds a little bit weird so let’s being with a little bit of context. It all started with this issue on the {golem} package, which reflects a discussion we previously had inside the team. Also, two weeks ago, I received a tweet on the very same subject, which can be summarised as … Read more shinyApp(), runApp(), shinyAppDir(), and a fourth option

Predictive Analytics in Government Decisions

Government Decisions In many instances, our current laws and regulations require the government to make decisions that directly affect citizens, albeit in varying degrees. Some of these decisions pertain to the eligibility for benefits, others are whether individuals will receive punishments, or whether individuals’ civil liberties will be limited. In representative democracies these decisions are … Read more Predictive Analytics in Government Decisions

Multilevel Modelling of U.S. Home Loan Data

The housing market has undergone quite a change in the past decade, with more stringent lending criteria for housing having been enforced. A key objective of financial institutions is to minimise the risk of mortgage lending by ensuring that the debtor is ultimately able to repay the loan. In this example, multilevel modelling techniques are … Read more Multilevel Modelling of U.S. Home Loan Data

AI, Machine Learning and Data Science Roundup: June 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News Tensorflow 2.0 beta is now available, featuring … Read more AI, Machine Learning and Data Science Roundup: June 2019

Deep Learning and Medical Image Analysis for Malaria Detection with fastai

Learn to classify blood smear images using a high-level deep learning environment Jimmy Chan/Pexels free images After the Lister Hill National Center for Biomedical Communications (LHNCBC), part of National Library of Medicine (NLM), has made available an annotated dataset of healthy and infected blood smear malaria images, various postings and papers have been published showing how … Read more Deep Learning and Medical Image Analysis for Malaria Detection with fastai

Visualisation of Information from Raw Twitter Data — Part 2

Want to find out user activity, see if certain users are Bots, make a Time Series of the Tweet Publications and much more? Read on then! The previous post covered how to download data from Twitter regarding a certain topic, getting this data ready in a Jupyter Notebook, discovering insights from this data, and explored some … Read more Visualisation of Information from Raw Twitter Data — Part 2

Feature Importance with Neural Network

One of the best challenge in Machine Learning tends to let model speak them self. Not also is important to develop a strong solution with great predicting power, but also in lot of business applications is interesting to know how the model provides these results: which variables are engage the most, the presence of correlations, … Read more Feature Importance with Neural Network

Too important to leave to the data scientists by @ellis2013nz

I usually write blog posts that include big chunks of R code and deal with analysis of specific datasets that I hope are of interest to people with specialist statistical and data science skills (or hoping to develop those skills). But I happen to think that broader data literacy is even more important than my … Read more Too important to leave to the data scientists by @ellis2013nz

Make Data Acquisition Easy with AWS & Lambda (Python) in 12 Steps

Goodbye to complex ETL pipelines, SQL databases and other complications This article will serve as a brief introduction to AWS Lambda and building a fully serverless data pipeline. This article is written for people with at least basic (and I mean basic) understanding of Python, but you can be brand new to AWS. We will … Read more Make Data Acquisition Easy with AWS & Lambda (Python) in 12 Steps

Understanding Cancer using Machine Learning

Use of Machine Learning (ML) in Medicine is becoming more and more important. One application example can be Cancer Detection and Analysis. (Source: https://news.developer.nvidia.com/wp-content/uploads/2016/06/DL-Breast-Cancer-Detection-Image.png) Introduction As demonstrated by many researchers [1, 2], the use of Machine Learning (ML) in Medicine is nowadays becoming more and more important. Researchers are now using ML in applications such … Read more Understanding Cancer using Machine Learning

Machine Learning Cheat Sheet — Data Processing

Data Pre-processing Skewed Data Outliers affect the distribution. If a value is significantly below the expected range, it will drag the distribution to the left, making the graph left-skewed or negative. Alternatively, if a value is significantly above the expected range, it will drag the distribution to the right, making the graph right-skewed or positive. … Read more Machine Learning Cheat Sheet — Data Processing

How Artificial Intelligence revolutionizes Quality Assurance

Or how to continuously release a high-quality product that delights your customers. Photo by Markus Spiske. In today’s fast-evolving markets, a tech company that can deliver quality products to market faster than its competitors has a significant competitive advantage. At the same time, the complexity — and therefore proneness to error — of technical products has massively increased. Thus, Quality … Read more How Artificial Intelligence revolutionizes Quality Assurance

Of Sixes and Fours – Analyzing the IPL using the tidyverse

We are back with another post on the Indian Premier League. This is the fourth post in the series. We will assume that you have already read the previous article analyzing strike rates here. One change since the last article is that Cricsheet now has updated data available – so we have the details of … Read more Of Sixes and Fours – Analyzing the IPL using the tidyverse

Blackman-Tukey Spectral Estimator in R

There are two definitions of the power spectral density (PSD). Both definitions are mathematically nearly identical and define a function that describes the distribution of power over the frequency components in our data set. The periodogram PSD estimator is based on the first definition of the PSD (see periodogram post). The Blackman-Tukey spectral estimator (BTSE) … Read more Blackman-Tukey Spectral Estimator in R

Temperatures over the last 100+ Years

This visualization explores how temperatures have varied in the United States over the last 100+ years. Though the United States is undoubtedly getting warmer, there remains significant variation state-by-state and year-by-year. In the map visualization above each datum is show as the difference between the mean temperature in a state on a given year and … Read more Temperatures over the last 100+ Years

Can Machine Learning be used to Forecast Poverty

A Tutorial on Time-Series Forecasting using Machine Learning in Python Photo by Roman Nguyen on Unsplash In 2014 the UN called for a data revolution to put the best available tools and methods to work in service of achieving the Sustainable Development Goals (SDGs). Here, we attempt to use machine learning to forecast one of the indicators … Read more Can Machine Learning be used to Forecast Poverty

Signature Fraud Detection- An Advanced Analytics Approach

Automatic Signature Verification In my previous article, I discussed advanced analytics application in the area of fraud in a generic fashion. In this article, I will delve into details in a specific area of fraud-signature forgery. No wonder that institutions and businesses recognize signatures as the primary way of authenticating transactions. People sign checks, authorize … Read more Signature Fraud Detection- An Advanced Analytics Approach

Leveraging Power of Reinforcement learning in Digital Marketing

AI in Digital Marketing In my previous article, I discussed an advanced analytics solution to increase campaign ROI or Return on Marketing Investment (ROMI) thorough propensity modeling techniques. While supervised learning is mostly used (at least so far) method in the predictive analytics industry, it has few limitations. Firstly as supervised learning uses static lists … Read more Leveraging Power of Reinforcement learning in Digital Marketing

MRNet: Deep-learning-assisted diagnosis for knee MRI scans

And a kaggle-like competition hosted by Stanford ML Group Last week I visited Estepona, a town in southern Spain, for a week-long coding retreat. I worked on reproducing the MRNet paper using PyTorch from scratch, as part of participating in the MRNet Competition. I have open-sourced the code so you can use it as a starting … Read more MRNet: Deep-learning-assisted diagnosis for knee MRI scans