Let’s break this function down one by one. The first component, face classification, simply penalizes the model for saying that there is a face at a location, while no face exists in the image. “Face box regression” is a fancy term for the distance between the bounding box coordinates of the predicted face and the … Read more Is Facial Recognition Technology Racist? State of the Art algorithms explained
We cannot use comparison operators=,<,>,<>to test for NULL values. Instead, we have to use IS NULL and IS NOT NULL predicates. IS NULL: Return rows that contain NULL values Syntax: expression IS NULL SELECT ID, Student,Email1,Email2FROM tblSouthParkWHERE Email1 IS NULL AND Email2 IS NULLORDER BY ID The above query yields all records where both Email1 … Read more A Complete Beginner’s Guide to Deal with NULL Values in SQL
Recent attempts to predict criminality from facial features recall a long tradition of unethical and racist pseudoscience Source: Wikimedia Commons A recent paper about to be published by Harrisburg University caused quite a stir earlier this month. Titled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” the paper promised: With 80 percent … Read more AI pseudoscience and scientific racism
Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here. In this … Read more Coronavirus: Which country got it right?
Not so recently, a brilliant and ‘original’ idea suddenly struck me — what if I could predict stock prices using Machine Learning. After all, a time series can be easily modeled with an LSTM. I could see myself getting rich overnight! If this is so easy, why hasn’t anyone done it yet? Very excited at … Read more How (NOT) To Predict Stock Prices With LSTMs
Beyond the already complex challenge of implementing AI, some companies have started analyzing the possible benefits of building an AI Decentralized Autonomous Organizations (AI DAOs). During my latest mission, I had to help create new business models, identify the right AI approach, and create a roadmap for the creation of several AI DAOs proof of … Read more Why Building an AI Decentralized Autonomous Organization (AI DAO)
Data visualization tools such as Tableau are loved and used because of how simple they make it to show correlations in large datasets. The exact reason they are used is also their biggest flaw. It’s too easy to simply click buttons until you find something which looks acceptable. Lets look at some examples. I’ve recreated … Read more My Tableau dashboards sucked – until I started drawing them
y=α+βx would give the predicted values and we calculate the values of α & β from the above formula where β is the slope and α is the y-intercept. The goal of the simple linear regression is to create a linear model that minimizes the sum of squares of the residuals(error). An interesting fact about … Read more ANOVA for Regression
import streamlit as stimport pandas as pdimport plotly.express as pximport pydeck as pdkimport numpy as np#Load and Cache the [email protected](persist=True)def getmedata():url = ‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv’df = pd.read_csv(url, delimiter=’,’, header=’infer’)df.rename(index=lambda x: df.at[x, ‘Country/Region’], inplace=True)dft = df.loc[df[‘Province/State’].isnull()]dft = dft.transpose()dft = dft.drop([‘Province/State’, ‘Country/Region’, ‘Lat’, ‘Long’])dft.index = pd.to_datetime(dft.index)return(dft, df)df1 = getmedata()st.title(‘Building a Data Dashboard with Streamlit’)st.subheader(‘while exploring COVID-19 data’)#####In Scope Countriescountrylist … Read more Learn How to Create Web Data Apps in Python
Insights from Open Data and Machine Learning I know what you are thinking: Wars are rare and complicated events, one can’t expect to take into account their entire complexity. And you are right, they spring from an intricate array of political, economic, and historical reasons without forgetting the thick coat of randomness, thus they should … Read more Predicting Future Wars
This article is Part 1 of the series “Winning in Analytics!”. Let’s look at key enablers, to scale your AI initiatives with success. Photo by Tim Mossholder on Unsplash Dear AI Enthusiasts, we love to realize the full potential of our data! We would love to see our analytics proof of concepts achieve reality! But … Read more Industrialize Analytics — How do we get there?
Theory, assumptions, problems, and solutions for practitioners Photo by Pixabay from Pexels The equation offered by Black and Scholes (1973) is the standard theoretical pricing model for European options. The keyword being theoretical as the Black-Scholes model makes some key assumptions that are immediately violated in practice. Key model assumptions: No transaction costs No arbitrage … Read more Black-Scholes Option Pricing is Wrong
This post is for all those data science aficionados out there who recently jumped on to the machine learning bandwagon. Whether you studied data science in college or are autodidactic, most aspiring data scientists get a reality check when trying their hand on a machine learning project in a practical setting. I struggled with the … Read more How to Avoid Potential Machine Learning Pitfalls
Tree-based Feature Importance Machine learning model such as random forests is typically treated as a black-box. Why? A forest consists of a large number of deep trees, where each tree is trained on bagged data using a random selection of features. To gaining a full understanding by examining each tree would close to impossible. For … Read more Machine Learning Explainability Introduction via eli5
Going from floats to integers Neural networks are very resource intensive algorithms. They not only incur significant computational costs, they also consume a lot of memory in addition. Even though the commercially available computational resources increase day by day, optimizing the training and inference of deep neural networks is extremely important. If we run our … Read more How to accelerate and compress neural networks with quantization
Model Evaluation Now that all the theory is out of the way, let’s see how these components come together to produce high-quality recommendations on a well-known real-world data set. We’ll train an implicit feedback FM model using the author’s new RankFM package which implements the techniques described above and compare its performance to the popular … Read more Factorization Machines for Item Recommendation with Implicit Feedback Data
How does it feel to be in one of these roles? Find out here. Photo by Christina @ wocintechchat.com . Introduction Data Analyst Data Scientist Summary References After working as both a professional data analyst and data scientist, I thought it would be insightful to highlight the experience of each position along with some key … Read more Would You Rather be a Data Analyst or Data Scientist?
To my excitement, Spotify already has a developer API in which we can use to get data from Spotify or trigger certain actions for Spotify users. What we need to do is just register to the site, create an app, and get the API token. Then, we can use spotipy package in Python to retrieve … Read more What Covid-related topics are being discussed in Spotify Podcasts?
Using the Apriori algorithm to offer product recommendation, product placement, pricing and bundling strategies Imagine if we could understand what our customers’ next purchase could be! Imagine if we could find patterns in purchase behaviour and use it to our advantage! The key to the future is in history! Market Basket Analysis helps retailers identify … Read more Product Placement, Pricing and Promotion Strategies with Association Rule Learning
Creating an array of colors. Firstly, I picked the corresponding RGB values for a 120 crayons Crayola box and copied them into a list. colorsFile = open(“colors.txt”,”r”)colors = for line in colorsFile.readlines():colorset = line.strip().split(” “)rgbFormat = [int(x) for x in colorset.split(“,”)]colors.append(rgbFormat) Secondly, I started by picking an image and resizing it to a smaller size. … Read more Coloring an Image using Crayola Colors (Python)
The neonatal intensive care unit (NICU) is an environment in which life-changing decisions are made. Neonatologists use information from a variety of sources to build up a picture of a newborn’s condition to ensure they are receiving the right medical care. These highly trained specialists use their judgement in tandem with a constant stream of … Read more Machine Learning for Neonatal Intensive Care
In this post, I will show you how to build a simple face detector using Python. Building a program that detects faces is a very nice project to get started with computer vision. In a previous post, I showed how to recognize text in an image, it is a great way to practice python in … Read more Simple Face Detection in Python
What does that all mean? Practically, if the only thing that you are looking to do is collect a large number tweets, Twint is probably a better tool, whereas Tweepy is better suited for collecting a richer set of metadata, allows for flexibility and potentially scalability as well for those using the official API. That’s … Read more What Python package is best for getting data from Twitter? Comparing Tweepy and Twint.
Image by emmaws4s from Pixabay Note — There is also a YouTube video explaining this paper The paper begins by making a case that often wide & deep models require a huge number of multiplications and that results in high memory and computing demands. Because of this even if the network is a top-performing model … Read more [Knowledge Distillation] FitNets : Hints For Thin Deep Nets
Image by Qimono from Pixabay (CC0) Back in 1958, Han Peter Luhn, a researcher at IBM, initiated the concept of Business Intelligence (BI), using the definition from Webster’s Dictionary: to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal. Given its definition, Business Intelligence is indeed … Read more 6 Key Areas of Business Intelligence in the New Era
An opinionated setup guide for your next python project Python is one of the fastest growing programming languages. It’s tooling is evolving fast to catch up. I have been writing python for over 10 years now and sometimes it’s hard to keep up with all the new tooling out there. Recently, I had an opportunity … Read more State-of-the-art python project setup
In this article, we’ll explore the differences between data scientists as Decision Support, Advisor and Integrated Partner One of the best and worst parts of being a data scientist is the ambiguity that the role can often entail. Since data science is a relatively new function, the mandate and objectives aren’t always clear. This often … Read more The Three Stages of a Data Scientist
WOMEN IN TECHNOLOGY SERIES An Interview with the Head of Operations & Partnerships at Starbutter AI Over the past few years, tech companies and researchers all over the world have been competing to advance the frontiers of artificial intelligence. With the broadening and fast-paced developments in the space of technology, it is clear that utilizing … Read more Jean Alfonso-Decena: Leading Innovation in Conversational AI and Disrupting the Philippine FinTech…
Understand the basics with a concrete example! Photo by Matthew Fournier on Unsplash When your Python code grows in size, most probably it becomes unorganised over time. Keeping your code in the same file as it grows makes your code difficult to maintain. At this point, Python modules and packages help you to organize and … Read more Modules and Packages in Python: Fundamentals for Data Scientists
We’re going to bundle this up in a tiny ruby project. Create our directory $ mkdir kiba-etl && cd kiba-etl/ Add the source CSV Create a CSV file with touch phone.csv and paste in the following. id,number1,123.456.78912,2223,303-030-30304,444-444-44445,900-000-000016,#10000000007,#98989898988,800-000-000009,999.999.999910,22.214.171.124.126.96.36.199.1.111,(112)233-445512,(121)212-0000 In a real situation, you might use a service like Twilio to detect if they’re real phone numbers. … Read more Build The World’s Simplest ETL (Extract, Transform, Load) Pipeline in Ruby With Kiba
Using YFinance and Plotly libraries for Stock Data Analysis Photo by Alec Favale on Unsplash In this article, I will explain to you how you can use YFinance a python library aimed to solve the problem of downloading stock data by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! … Read more Downloading Stock Data and Representing it Visually
On a normal day, this line drew my attention when someone asking for the reason of ++ not as an operator in Python. If you want to know the original reason, you’ll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido) ~ By stackoverflow And this enforces … Read more Why doesn’t Python support i++ increment syntax
Factors play a crucial role in data analysis. Learn how to create, subset, and compare them. A factor refers to a statistical data type used to store categorical variables. Categorical variables belong to a limited number of categories. Continuous variables, on the other hand, can correspond to an infinite number of values. It is important … Read more Introduction to Factors in R
photo by author I recently had the opportunity to deliver a hands-on workshop on training a Keras deep learning model. This workshop was a follow-on for a session I had done for a local meetup that reviewed the content in my upcoming book for Manning Publications, Deep Learning with Structured Data. After the introductory session … Read more Cocalc vs. Colab — Which Is Better for a Hands-On Workshop?
Recently, I’ve been working on a project that involves some time-series modelling for quantitative investment strategies. Specifically, a key component of the strategy involves developing a differentiated investment approach for different regimes. Such an identification is useful because when the underlying dynamics of the financial market shift, the strategies that work well in one regime … Read more Introduction to Trend Filtering with Applications in Python
The story behind Netflix’s famous Recommendation System Image by Thibault Penin on Unsplash What is Netflix and what do they do? Netflix is a media service provider that is based out of America. It provides movie streaming through a subscription model. It includes television shows and in-house produced content along with movies. Initially, Netflix used … Read more Netflix Recommender System — A Big Data Case Study
Python Programming Tips The easiest way to serialise/deserialise between Python objects and JSON — Attr and Cattr In one of my previous article, I have introduced probably the best practice of Object-Oriented Programming (OOP) in Python, which is using the library “Attrs”. Probably the Best Practice of Object-Oriented Python — Attr Makes Python Object-Oriented Programming … Read more Single Line of Code to Interchange Between Python Objects and JSON
Before we deploy an API, we need to have an API with us, right? In one of my last posts, I had written a simple tutorial to understand FastAPI and API basics. Do read the post if you want to understand FastAPI basics. So, here I will try to create an Image detection API. As … Read more Deployment could be easy — A Data Scientist’s Guide to deploy an Image detection FastAPI API using…
Go to this link to download: Node.js. I selected the “Recommended For Most Users” option and then used all the default settings in the Node.js setup. Checkpoint: Once it has finished installing, type into your command line: node -v && npm -v And it should look like this (your versions may be more recent than … Read more Setup Vue.js Hello World In Visual Studio Code
With the recent outbreak of COVID-19 also known as the coronavirus, it does seem like history is repeating itself and we are going back in time to the 1900s during the spanish influenza. The coronavirus is a deadly virus that has claimed hundreds of thousands of lives in countries around the world. Older adults and … Read more Pneumonia Detection using Deep Learning
It is so hard to find an intuitive understanding of how your dataset functions! Yet, coherently interpreting how your system works is crucial to finding a way to model or analyse any feature. Stick on till the end to find out why initial insight is key to good analysis, what you have to do to … Read more Insight is king — How to Get it and avoid pitfalls
In the midst of all our plant medicinals studying, one very stressful pressure always comes up: It’s tough to get clinical trials of any promising candidate compound initiated anywhere in the world. But the class of compounds we study aren’t typical pharma drugs formulated straight from a laboratory drawing board; these are plant medicinals. So … Read more The forgotten legacy of Traditional Medicine in the age of coronavirus
A Magical Algorithm for Convolution and Signal Processing The Fourier Transform and its cousins (the Fourier Series, the Discrete Fourier Transform, and the Spherical Harmonics) are powerful tools that we use in computing and to understand the world around us. The Discrete Fourier Transform (DFT) is used in the convolution operation underlying computer vision and … Read more Build Intuition for the Fourier Transform
Before we begin our melt tutorial, let’s recreate the wide dataframe above. df_wide <- data.table(student = c(“Andy”, “Bernie”, “Cindey”, “Deb”),school = c(“Z”, “Y”, “Z”, “Y”),english = c(10, 100, 1000, 10000), # eng gradesmath = c(20, 200, 2000, 20000), # math gradesphysics = c(30, 300, 3000, 30000) # physics grades)df_wide student school english math physics1: Andy … Read more Reshape R dataframes wide-to-long with melt — tutorial and visualization
Adam Tabriz, MD article combines the world of Artificial intelligence and Healthcare, through the portrayal of how AI will impact the day to day roles and services delivered by various healthcare practices. The non-technical approach of the article makes this a great read for all readers. Adam starts with a deep dive of the term … Read more Interesting AI/ML Articles You Should Read This Week (June 28)
1 — Getting the Data A finicky part of any visualization can be handling the input data, and this was especially true in this case because the data was owned and updated by another party (Johns Hopkins). This meant that when they changed their organizational style, I had to adjust too. I chose to rely … Read more Visualizing Covid-19 Over Time Using React
Using GLM, Decision Tree and Random Forest to predict Churn and compare the models with their accuracy and AUC values Photo by Scott Graham on Unsplash What is Churn ? Churn rate, when applied to a customer base, refers to the proportion of contractual customers or subscribers who leave a supplier during a given time … Read more Hands on Churn Prediction with R and comparison of Different Models for Churn Prediction
So you’ve got a hot dataset you want to take a look at. Nice. How you visualise it is going to depend on what kind of data it is. Is it one, two, three, or more-dimensional? Is it discrete or continuous? Do you know? Often I find myself thinking I know what the nature of … Read more On Data Exploration and Visualisation
mexico_confirmed = pd.read_csv(‘https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_confirmed_Mexico.csv’)mexico_deaths = pd.read_csv(‘https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_deaths_Mexico.csv’)mexico_suspects = pd.read_csv(‘https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_suspects_Mexico.csv’)mexico_negative = pd.read_csv(‘https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/time_series_covid19_negative.csv’)dataset_mexico = pd.read_csv(‘https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/datasetCovid19Mexico.csv’) A time-series data which contains the counts on infected cases, deaths, and recoveries across countries is also given. The time-series data have individual files for each case and needs to be processed before visualization. The country co-ordinates are also provided for time series visualization … Read more An overview of Covid-19 in Mexico
An explanation of SVM’s for linear and non-linear datasets The following explanation assumes that you have a basic understanding of supervised machine learning as well as linear discriminant functions. However, if like me, you have been gifted with the memory of goldfish, let us remind ourselves the following, Supervised Machine Learning entails the creation of … Read more SVM’s — Jack of all trades?