How You Should Read Research Papers According To Andrew Ng (Stanford Deep Learning Lectures)

It is advisable to ensure you go through at least 10–20% of the content of each paper you have added to the list; this will ensure that you have been exposed to enough of the introductory content within an identified resource and are able to gauge its relevancy accurately. For the more relevant papers/resources identified, … Read more How You Should Read Research Papers According To Andrew Ng (Stanford Deep Learning Lectures)

Train a TensorFlow Model in Amazon SageMaker

Example traffic signs from the dataset Amazon SageMaker is a cloud machine-learning platform that enables developers to create, train, and deploy machine-learning models in the cloud. I previously used TensorFlow 2 to classify traffic signs with my onboard CPU. Today, I am going to do it in Amazon SageMaker. SageMaker has several advantages: it offers … Read more Train a TensorFlow Model in Amazon SageMaker

Cuda on WSL2 for Deep Learning — First Impressions and Benchmarks

Not going to lie, Microsoft has been doing some good things in the software development community. I love coding in Visual Studio Code and ONNX has been great if you want to optimize your deep learning models for production. WSL2 allowing you to have access to an entire Linux Kernel is exactly what I’ve been … Read more Cuda on WSL2 for Deep Learning — First Impressions and Benchmarks

Pandas essentials for data science

pandas is a powerful, flexible and accessible data mining library in Python. It was originally developed at a financial management company. Anyone familiar with the finance sector knows a lot of its data science is actually time series analysis. In fact, the name Pandas came from panel data, which is a special type of time … Read more Pandas essentials for data science

Topic Modeling with NLP on Amazon Reviews: An Application of Latent Dirichlet Allocation (LDA)

There are many ways to create topic modelings. By filtering the text for certain criteria, we can obtain different topic groups. In this study, we will create topic modeling by using: All the text data Only nouns from the text Only nouns and adjectives Topic Modeling — Attempt 1 (with all the review data) As … Read more Topic Modeling with NLP on Amazon Reviews: An Application of Latent Dirichlet Allocation (LDA)

Meme Vision: the science of classifying memes

As a person of culture and science, I decided to build a model to identify memes. This problem is far simpler than the Image-Net competition and so a simpler solution is appropriate. I will demonstrate this by comparing the “Meme Vision” framework to ResNet-50 (the winner of Image-Net 2015). Method: Meme Vision framework In a … Read more Meme Vision: the science of classifying memes

How did I learn Data Science in 2 months?

Here’s the complete road-map to learn Data Science from “Zero to Hero” with source links provided. Photo by Campaign Creators on Unsplash Introduction Soft skills required for Data Science Python — basics Python libraries — Numpy, Pandas and Matplotlib SQL Tableau/Power BI Data analysis using Python Statistics and Probability Machine learning Algorithms Deep learning (Neural … Read more How did I learn Data Science in 2 months?

Your Data Science Journey Kickstarts Here

You should start learning before you do anything else. Don’t listen to critics who say online courses/certifications won’t get you the job. As a beginner, how else will you acquire the relevant knowledge and skills? We start with learning but we won’t stop there. So by all means, keep learning. Keep Learning! (Photo by Tim … Read more Your Data Science Journey Kickstarts Here

AI Feynman 2.0: Learning Regression Equations From Data

A NEW AI LIBRARY FROM MAX TEGMARK’S LAB AT MIT Let’s kick the tires on a brand new library Image by Gerd Altmann from Pixabay (CC0) Table of Contents 1. Introduction2. Code3. Their Example4. Our Own Easy Example5. Symbolic Regression on Noisy Data 1. A New Symbolic Regression Library I recently saw a post on … Read more AI Feynman 2.0: Learning Regression Equations From Data

An Intuitive Explanation of the Bayesian Information Criterion

Going back to our example, you could imagine a model that has as many clusters as there are data points. See, no outliers! But that wouldn’t be a very useful model. All models are wrong, but some are useful. We have to balance the maximum likelihood of our model, L, against the number of model … Read more An Intuitive Explanation of the Bayesian Information Criterion

Data science at NASA

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Machine learning isn’t rocket science, unless you’re doing it at NASA. And if you happen to be doing data science … Read more Data science at NASA

Gradient Descent animation: 1. Simple linear Regression

This is the first part of a series of articles on how to create animated plots visualizing gradient descent. The Gradient Descent method is one of the most widely used parameter optimization algorithms in machine learning today. Python’s celluloid-module enables us to create vivid animations of model parameters and costs during gradient descent. In this … Read more Gradient Descent animation: 1. Simple linear Regression

Is your website leaking sensitive information?

There has been a lack of attention for XSLeaks (cross-site leaks) which result in the leaking of user information source Most developers are familiar and aware of the security vulnerabilities XSS (Cross-site scripting), CSRF (Cross-site request forgery) or SQL Injection, but there has been a lack of attention for XSLeaks (cross-site leaks) which can result … Read more Is your website leaking sensitive information?

The Coronavirus vs Voice Technology in Asia

How COVID-19 has accelerated a “voice technology moment” that could change communication forever Tokyo. Photo by author. The coronavirus has led to all kinds of innovation: Korea invented drive-through testing, Lithuania invented 3D-printed hands-free door handles, and at this point, nearly everyone has shifted their meetings and social events to a VoIP solution like Zoom. … Read more The Coronavirus vs Voice Technology in Asia

Getting Started with GANs Using PyTorch

We will see the ability of GAN to generate new images which makes GANs look a little bit “magic”, at first sight. A generative adversarial network (GAN) is a class of machine learning frameworks conceived in 2014 by Ian Goodfellow and his colleagues. Two neural networks (Generator and Discriminator) compete with each other like in … Read more Getting Started with GANs Using PyTorch

Activations, Convolutions, and Pooling — Part 4

Pooling Mechanisms Deep Learning at FAU. Image under CC BY 4.0 from the Deep Learning Lecture These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created … Read more Activations, Convolutions, and Pooling — Part 4

10 Reasons Why You Need Reliable Data Quality

Good, Bad Or Ugly In this data-driven age, organizations are looking to leverage data to enhance business efficiency and effectiveness. The decision on all levels are made with the use of BI and Advanced Analytics tools, better the data better outputs those tools provide and that leads to the creation of opportunities, generating more revenue … Read more 10 Reasons Why You Need Reliable Data Quality

Rethinking Continuous Integration for Data Science

Software Engineering for Data Science A widely used practice in software engineering deserves its own flavor in our field Photo by Yancy Min on Unsplash As Data Science and Machine learning get wider industry adoption, practitioners realize that deploying data products comes with a high (and often unexpected) maintenance cost. As Sculley and co-authors argue … Read more Rethinking Continuous Integration for Data Science

Roadmap to Machine Learning: Key Concepts Explained

What if our memory was a storage device? How much easier the learning process would be. But the reality is to become an excellent professional in something you need to go through the thorny path. You learn, you forget, you make mistakes, you learn again, absorb new things, and thus you form a picture of … Read more Roadmap to Machine Learning: Key Concepts Explained

10 Minutes to Building a Fully-Connected Binary Image Classifier in TensorFlow

Photo by Waranont (Joe) on Unsplash How to build a binary image classifier using fully-connected layers in TensorFlow/Keras This is a short introduction to computer vision — namely, how to build a binary image classifier using only fully-connected layers in TensorFlow/Keras, geared mainly towards new users. This easy-to-follow tutorial is broken down into 3 sections: … Read more 10 Minutes to Building a Fully-Connected Binary Image Classifier in TensorFlow

K Nearest Neighbors by hand: A Computer Science exercise for the Data Scientist

Opening the “black box” and understanding the algorithm within Data scientists sometimes talk about a “black box” approach to data science. That is, when you understand the use cases for different machine learning algorithms and how to plug in the data without understanding how the algorithm works beneath the surface. But the algorithms are just … Read more K Nearest Neighbors by hand: A Computer Science exercise for the Data Scientist

Measuring Financial Risk: A Step-by-Step Guide

To calculate our own VaR and ES, we’ll use data for the Wilshire 5000, a stock market index widely considered to be the broadest measure of U.S. stock prices. We can use quantmod to import our data from FRED, the Federal Reserve Economic Database. We’ll also use ggplot2 to visualize our data. Let’s load our … Read more Measuring Financial Risk: A Step-by-Step Guide

How to create Latex tables directly from Python code

Copying tables of results from the console into a Latex report can be tedious and error fraught — so why not automate it? Making tables should be simple and elegant (Photo by Roman Bozhko on Unsplash). Creating tables of results plays a major part in communicating the outcomes of experiments in data science. Various solutions … Read more How to create Latex tables directly from Python code

Serverless BERT with HuggingFace and AWS Lambda

A typical transformers model consists of a pytorch_model.bin, config.json, special_tokens_map.json, tokenizer_config.json, and vocab.txt. The pytorch_model.bin has already been extracted and uploaded to S3. We are going to add config.json, special_tokens_map.json, tokenizer_config.json, and vocab.txt directly into our Lambda function because they are only a few KB in size. Therefore we create a model directory in our … Read more Serverless BERT with HuggingFace and AWS Lambda

Machine Learning Basics: Multiple Linear Regression

Learn to Implement Multiple Linear Regression with Python programming. In the previous story, I had given a brief of Linear Regression and showed how to perform Simple Linear Regression. In Simple Linear Regression, we had one dependent variable (y) and one independent variable (x). What if the marks of the student depended on two or … Read more Machine Learning Basics: Multiple Linear Regression

Deploying Python script to Docker container and connect to external SQL Server(in 10 minutes)

Finally we want to build and run the image. #Build the imagedocker build -t my-app .#Run itdocker run my-app#Find container namedocker ps –last 1#Check logsdocker logs <container name> If you want to explore the container and run the script manually then modify last line of the Dockerfile, build and run again: #CMD [“python”,”-i”,”main.py”]CMD tail -f … Read more Deploying Python script to Docker container and connect to external SQL Server(in 10 minutes)

How Can AI Boost Call Center Moral?

What if we used these technologies to actually make call center agents’ lives better? I don’t mean coaching them to do a better job. “Feedback overload” is already a recognized problem in call centers. I mean helping them cope with the fact that their job is emotionally draining. Remember how frustrating it was the last … Read more How Can AI Boost Call Center Moral?

Build your own deep learning classification model in Keras

Step #6: Create our model In this task we will build a classification convolutional neural network from scratch and train it to recognize the 20 target classes in the Pascal Voc dataset. Our Model architecture will be based on the popular VGG-16 architecture. This is a CNN with a total of 13 convolutional layers (cfr. … Read more Build your own deep learning classification model in Keras

Anything2Vec: Mapping Reddit into Vector Spaces

Generalizing Word2Vec away from word embeddings “Subreddit Embedding” and the 100 closest subreddits to /r/nba A common problem in ML, natural language processing (NLP), and AI at large surrounds representing objects in a way computers can process. And since computers understand numbers — which we have a common language for comparing, combining and manipulating — … Read more Anything2Vec: Mapping Reddit into Vector Spaces

[Paper Summary] Distilling the Knowledge in a Neural Network

Photo by Aw Creative on Unsplash The authors start the paper with a very interesting analogy to explain the notion that the requirements for the training & inference could be very different. The analogy given is that of a larva and its adult form and the fact the requirements of nourishments for the two forms … Read more [Paper Summary] Distilling the Knowledge in a Neural Network

The Correct Way to Measure Inference Time of Deep Neural Networks

The network latency is one of the more crucial aspects of deploying a deep network into a production environment. Most real-world applications require blazingly fast inference time, varying anywhere from a few milliseconds to one second. But the task of correctly and meaningfully measuring the inference time, or latency, of a neural network, requires profound … Read more The Correct Way to Measure Inference Time of Deep Neural Networks

How to scrape ANY website with python and beautiful soup

Now you don’t need to know how HTML/CSS works (although, it can be really helpful if you do). The only thing that’s important to know is that you can think of every HTML tag as an object. These HTML tags have attributes that you can query, and each one is different. Each line of code … Read more How to scrape ANY website with python and beautiful soup

Measuring Agreement with Cohen’s Kappa Statistic

This lesser-known metric can help you better evaluate how models perform on imbalanced data A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary … Read more Measuring Agreement with Cohen’s Kappa Statistic

Is Facial Recognition Technology Racist? State of the Art algorithms explained

Let’s break this function down one by one. The first component, face classification, simply penalizes the model for saying that there is a face at a location, while no face exists in the image. “Face box regression” is a fancy term for the distance between the bounding box coordinates of the predicted face and the … Read more Is Facial Recognition Technology Racist? State of the Art algorithms explained

A Complete Beginner’s Guide to Deal with NULL Values in SQL

We cannot use comparison operators=,<,>,<>to test for NULL values. Instead, we have to use IS NULL and IS NOT NULL predicates. IS NULL: Return rows that contain NULL values Syntax: expression IS NULL SELECT ID, Student,Email1,Email2FROM tblSouthParkWHERE Email1 IS NULL AND Email2 IS NULLORDER BY ID The above query yields all records where both Email1 … Read more A Complete Beginner’s Guide to Deal with NULL Values in SQL

AI pseudoscience and scientific racism

Recent attempts to predict criminality from facial features recall a long tradition of unethical and racist pseudoscience Source: Wikimedia Commons A recent paper about to be published by Harrisburg University caused quite a stir earlier this month. Titled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” the paper promised: With 80 percent … Read more AI pseudoscience and scientific racism

Coronavirus: Which country got it right?

Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here. In this … Read more Coronavirus: Which country got it right?

How (NOT) To Predict Stock Prices With LSTMs

Not so recently, a brilliant and ‘original’ idea suddenly struck me — what if I could predict stock prices using Machine Learning. After all, a time series can be easily modeled with an LSTM. I could see myself getting rich overnight! If this is so easy, why hasn’t anyone done it yet? Very excited at … Read more How (NOT) To Predict Stock Prices With LSTMs

Why Building an AI Decentralized Autonomous Organization (AI DAO)

Beyond the already complex challenge of implementing AI, some companies have started analyzing the possible benefits of building an AI Decentralized Autonomous Organizations (AI DAOs). During my latest mission, I had to help create new business models, identify the right AI approach, and create a roadmap for the creation of several AI DAOs proof of … Read more Why Building an AI Decentralized Autonomous Organization (AI DAO)

My Tableau dashboards sucked – until I started drawing them

Data visualization tools such as Tableau are loved and used because of how simple they make it to show correlations in large datasets. The exact reason they are used is also their biggest flaw. It’s too easy to simply click buttons until you find something which looks acceptable. Lets look at some examples. I’ve recreated … Read more My Tableau dashboards sucked – until I started drawing them

Learn How to Create Web Data Apps in Python

import streamlit as stimport pandas as pdimport plotly.express as pximport pydeck as pdkimport numpy as np#Load and Cache the [email protected](persist=True)def getmedata():url = ‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv’df = pd.read_csv(url, delimiter=’,’, header=’infer’)df.rename(index=lambda x: df.at[x, ‘Country/Region’], inplace=True)dft = df.loc[df[‘Province/State’].isnull()]dft = dft.transpose()dft = dft.drop([‘Province/State’, ‘Country/Region’, ‘Lat’, ‘Long’])dft.index = pd.to_datetime(dft.index)return(dft, df)df1 = getmedata()[0]st.title(‘Building a Data Dashboard with Streamlit’)st.subheader(‘while exploring COVID-19 data’)#####In Scope Countriescountrylist … Read more Learn How to Create Web Data Apps in Python

Predicting Future Wars

Insights from Open Data and Machine Learning I know what you are thinking: Wars are rare and complicated events, one can’t expect to take into account their entire complexity. And you are right, they spring from an intricate array of political, economic, and historical reasons without forgetting the thick coat of randomness, thus they should … Read more Predicting Future Wars

Industrialize Analytics — How do we get there?

This article is Part 1 of the series “Winning in Analytics!”. Let’s look at key enablers, to scale your AI initiatives with success. Photo by Tim Mossholder on Unsplash Dear AI Enthusiasts, we love to realize the full potential of our data! We would love to see our analytics proof of concepts achieve reality! But … Read more Industrialize Analytics — How do we get there?

Black-Scholes Option Pricing is Wrong

Theory, assumptions, problems, and solutions for practitioners Photo by Pixabay from Pexels The equation offered by Black and Scholes (1973) is the standard theoretical pricing model for European options. The keyword being theoretical as the Black-Scholes model makes some key assumptions that are immediately violated in practice. Key model assumptions: No transaction costs No arbitrage … Read more Black-Scholes Option Pricing is Wrong

How to Avoid Potential Machine Learning Pitfalls

This post is for all those data science aficionados out there who recently jumped on to the machine learning bandwagon. Whether you studied data science in college or are autodidactic, most aspiring data scientists get a reality check when trying their hand on a machine learning project in a practical setting. I struggled with the … Read more How to Avoid Potential Machine Learning Pitfalls