An Examination of Fatal Force by Police in the US

Background: Fatal shootings by police as a topic is heavy in the news. For several years before the current interest in shootings, they’ve been on a small rise. But more importantly, the change seems relatively so small compared to the increasingly disproportionate attention it receives. Image by author Helping propel this issue to the national … Read more

Formating and visualizing time series data

Upfront I want to say what I am not covering in this section — renaming columns, subsetting data, change of data types (e.g. string to int) and missing value treatments. To keep this writing focused on time series formating I will not cover them here, but if interested you could check out my previous article … Read more

Semantic hand segmentation using Pytorch

Semantic segmentation is the task of predicting the class of each pixel in an image. This problem is more difficult than object detection, where you have to predict a box around the object. It is slightly easier than instance segmentation, where you have to not only predict the class of each pixel but also differentiate … Read more

Python Alone Won’t Get You a Data Science Job

It is true that data science jobs mushroomed. But at the same time, landing a decent position in this field remains notoriously challenging especially for novices. This is because of the subtle difference between data science in theory and real-life data science that is correlated with the problems businesses deal with on a day-to-day basis. … Read more

Using Data Science Skills Now: Cleaning up text data.

Be a workplace Hero! Automate the annoying task of cleaning up the spelling and formatting of categorical text columns with fuzzy matching. Image by PublicDomainPictures from Pixabay About ten years ago, a family friend had an interesting problem. He was a medical examiner and was responsible for cleaning up spreadsheets of data for reporting. He … Read more

Learning Data Science Online Without an Instructor: Tips and Tricks

For learning data science, one option is to order and go through an on-demand Data Science course without an instructor. These courses are far easier on your budget than a conventional instructor-led coding boot camp, let alone an accredited computer science degree. Like other technical career preparation courses, you’ll get plenty of opportunities to practice. … Read more

Three Pitfalls for Data Scientists

How misuse of Big Data, Machine Learning and Python can make your work inefficient Making mistakes is part of the learning process, and probably there is no way to avoid it. The important thing is to make sure we don’t make the same mistake twice. This is not possible if we don’t even know we … Read more

How to use Indexing for SQL Query Optimization

Let’s create an index on the ‘product’ table and include ‘category’ in the index. Syntax:CREATE INDEX [index_name]ON product ([column_name]);Query:CREATE INDEX product_category_indexON product (category); When you execute this query, it will take much longer than a normal query. The database scans 12 million rows and builds a ‘category’ index from scratch. Let’s say this takes 4 … Read more

Beyond accuracy: other classification metrics you should know in Machine Learning

Precision, Recall, and F1-score in Python Photo by George Pagan III on Unsplash “How accurate is your model?” This is probably the most commonly asked question when one wants to know how a model performs or rather how accurate a classifier can actually predict an anticipated event. While using accuracy to measure a classifier performance … Read more

Why Every Kid Should Learn How To Code

In fact, I believe everyone should learn how to code. But instead of trying to convince you to learn it, I will focus on how children — yours or someone else’s— can benefit tremendously from learning a programming language at a young age. I will also argue why it is the obvious thing to do, … Read more

Amazon AppFlow now provides Amazon Honeycode connectivity to several cloud applications

Amazon AppFlow is a fully managed integration service that has pre-built connectivity with 15 source SaaS applications, such as Salesforce, Marketo, Zendesk, Slack, and Google Analytics, and AWS services, such as Amazon Simple Storage Service (Amazon S3) in just a few clicks. Amazon Honeycode is a fully managed service that allows customers to quickly build … Read more

Serverless comes to machine learning with container image support in AWS Lambda.

AWS Lambda was released back in 2014, becoming a game-changing technology. By adopting Lambda, many developers have found a new way to build micro-services that could be easily achieved. It comes with many additional advantages such as event-based programming, cloud-native deployment, and the development of the now well-known infrastructure-as-code paradigm. A paradigm-shifting technology like AWS … Read more

Stochastic gradient descent implementation for SoftSVM

We will use a 4-dimensional dataset with 1,372 data points for this classification. This dataset is the banknote authentication dataset from the UCI repository. All source codes used in this tutorial are available in the repository below. After downloading the data_banknote_authentication.txt file, we’ll import it on MATLAB in Home > Import Data. We need to … Read more

Visualizing the Pandemic in the United States

Using the Tidyverse to clean and prep data, build graphs, choropleths, and analyze our current reality. Every time I read the news these days there seems to be another story about how cases of COVID-19 are surging throughout the United States. There is a web of opinions regarding the pandemic. Opinion topics include the severity … Read more

Introducing Amazon QuickSight Q: ask questions about your data and get answers in seconds

Previously, when business users couldn’t find an answer to their question from their data dashboards, they had to submit ad-hoc requests to their BI teams, which could take several weeks to complete. With Amazon QuickSight Q, business users can now get answers to their questions instantly and reduce the burden on their thinly-staffed BI teams. … Read more

Better service orchestration with WorkflowsBetter service orchestration with Workflows Developer Advocate

Going from a single monolithic application to a set of small, independent microservices has clear benefits. Microservices enable reusability, make it easier to change and scale apps on demand. At the same time, they introduce new challenges. No longer is there a single monolith with all the business logic neatly contained and services communicating with … Read more

AI Trends and Applications for Finance & Technology

AI is revolutionizing how financial institutions and technology companies are using their data to automate repetitive tasks and gain valuable insights. Some popular examples of AI applications in Fintech are fraud detection, risk assessment or virtual financial assistants. With regard to the Data Science Salon for Finance & Technology from December 8–10, we had the … Read more

Learn and Teach R

If you haven't explored the RStudio website in a while, your next … Read more

RStudio v1.4 Preview: The Little Things

This post is part of a series on new features in RStudio … Read more

Forecasting Tax Revenue with Error Correction Models

There are several ways to forecast tax revenue. The IMF Financial Programming Manual reviews 3 of them: (i) the effective tax rate approach; (ii) the elasticity approach; and (iii) the regression approach. Approach (iii) typically results in the most accurate short-term forecasts. The simple regression approach regresses tax revenue on its own lags and GDP … Read more

around the table

The Riddler has a variant on the classical (discrete) random … Read more

Handling Environmental Big Data: Introduction to NetCDF and CartoPY

Loading, Analyzing, and Visualizing Environmental Big Data Photo by NOAA on Unsplash Environmental data are getting increasingly abundant with the proliferation of remote sensing apparatus, IoT sensors, and weather stations. As a result, an appropriate data format is required to efficiently encode the spatio-temporal properties of climate information so as to promote transferability and generalizability. … Read more

Applying the MLOps Lifecycle

Understand MLOps needs and how they arise through the MLOps Lifecycle. Apply this to better scope and tackle MLOps projects. MLOps can be difficult for teams to get a grasp of. It is a new field and most teams tasked with MLOps projects are currently coming at it from a different background. It is tempting … Read more

Advent of 2020, Day 1 – What is Azure DataBricks

Azure Databricks is a data analytics platform (PaaS), specially optimised for … Read more

What Can I Do With R? 6 Essential R Packages for Programmers

R is a programming language created by Ross Ihaka and Robert Gentleman in 1993. It was designed for analytics, statistics, and data visualizations. Nowadays, R can handle anything from basic programming to machine learning and deep learning. Today we will explore how to approach learning and practicing R for programmers. As mentioned before, R can … Read more

JavaScript for R — ebook

The R programming language has seen the integration of many languages; C, C++, Python, to name a few, can be seamlessly embedded into R so one can conveniently call code written in other languages from the R console. Little known to many, R works just as well with JavaScript—this book delves into the various ways … Read more

R, Python & Julia in Data Science: A comparison

As digitalization progresses and data science interfaces continue to grow, new opportunities are constantly emerging to reach the personal analysis goals. Despite the „modernity“ of the industry, there is now a wealth of software for every need: From the design of the analysis infrastructure to the complete, decentralized evaluation through e.g. cloud computing (the outsourced … Read more

Student-Crime and Disruption-Related Incident Prediction in the US Public Schools

I’d like to thank my team for their great efforts in this project: Yaqiong (Juno) Cao, Zhaoji Li, Malika Mohan, and Sleiman Serhan. This 6-week project was a part of the Data Science course at Duke Fuqua School of Business, MQM Business Analytics Program. Photo by National Cancer Institute on Unsplash Our findings will encompass … Read more

TandA for NLP : Transfer and Adaptation, or Adaptation and Transfer ?

A robust way to train your Transformer model on your own domain and deal with the lack of labeled specific data Rythme n°2, Robert Delaunay; public domain. NLP has skyrocketed during the last two years thanks to great leaps forward. Many state-of-the-art models based on large deep neural networks and Transformers models have been published, … Read more

How to deploy Machine Learning Model in Laravel Application

The topic of Machine Learning Model Deployment is not new nowadays, but many AI practitioners especially beginners find it difficult to deploy their models into production. In this article, we are going to learn how to call our model API from the Algorithmia platform to the laravel application and make predictions. Before moving on I … Read more

Logistic Regression as the Smallest Possible Neural Network

We already covered Neural Networks and Logistic Regression in this blog. If you want to gain an even deeper understanding of the fascinating connection between those two popular machine learning techniques read on! Let us recap what an artificial neuron looks like: Mathematically it is some kind of non-linear activation function of the scalar product … Read more

Using multi languages Azure Data Studio Notebooks

Using multiple languages is a huge advantages when people choose notebooks … Read more

Topic Modeling and Sentiment Analysis on Amazon Alexa Reviews

First, we import the necessary libraries: import pandas as pdimport numpy as npimport pickleimport seaborn as snsimport matplotlib.pyplot as pltfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve, auc, log_lossimport gensimfrom gensim import corporafrom gensim.models import LdaModel, LdaMulticorefrom vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer After preprocessing the data, we pickled … Read more

Extreme Event Time Series Preprocessing

The simplest methods, which include every possible interpolation technique, don’t sound good for this kind of data. They can detect the underlying trend of the processes but are not able to correctly reproduce the seasonality level. For this reason, we try to fill our series with two very effective algorithms which are very common in … Read more