Photo by Aw Creative on Unsplash The authors start the paper with a very interesting analogy to explain the notion that the requirements for the training & inference could be very different. The analogy given is that of a larva and its adult form and the fact the requirements of nourishments for the two forms … Read more [Paper Summary] Distilling the Knowledge in a Neural Network
The network latency is one of the more crucial aspects of deploying a deep network into a production environment. Most real-world applications require blazingly fast inference time, varying anywhere from a few milliseconds to one second. But the task of correctly and meaningfully measuring the inference time, or latency, of a neural network, requires profound … Read more The Correct Way to Measure Inference Time of Deep Neural Networks
Now you don’t need to know how HTML/CSS works (although, it can be really helpful if you do). The only thing that’s important to know is that you can think of every HTML tag as an object. These HTML tags have attributes that you can query, and each one is different. Each line of code … Read more How to scrape ANY website with python and beautiful soup
This lesser-known metric can help you better evaluate how models perform on imbalanced data A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary … Read more Measuring Agreement with Cohen’s Kappa Statistic
Let’s break this function down one by one. The first component, face classification, simply penalizes the model for saying that there is a face at a location, while no face exists in the image. “Face box regression” is a fancy term for the distance between the bounding box coordinates of the predicted face and the … Read more Is Facial Recognition Technology Racist? State of the Art algorithms explained
We cannot use comparison operators=,<,>,<>to test for NULL values. Instead, we have to use IS NULL and IS NOT NULL predicates. IS NULL: Return rows that contain NULL values Syntax: expression IS NULL SELECT ID, Student,Email1,Email2FROM tblSouthParkWHERE Email1 IS NULL AND Email2 IS NULLORDER BY ID The above query yields all records where both Email1 … Read more A Complete Beginner’s Guide to Deal with NULL Values in SQL
Recent attempts to predict criminality from facial features recall a long tradition of unethical and racist pseudoscience Source: Wikimedia Commons A recent paper about to be published by Harrisburg University caused quite a stir earlier this month. Titled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” the paper promised: With 80 percent … Read more AI pseudoscience and scientific racism
Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here. In this … Read more Coronavirus: Which country got it right?
Not so recently, a brilliant and ‘original’ idea suddenly struck me — what if I could predict stock prices using Machine Learning. After all, a time series can be easily modeled with an LSTM. I could see myself getting rich overnight! If this is so easy, why hasn’t anyone done it yet? Very excited at … Read more How (NOT) To Predict Stock Prices With LSTMs
Beyond the already complex challenge of implementing AI, some companies have started analyzing the possible benefits of building an AI Decentralized Autonomous Organizations (AI DAOs). During my latest mission, I had to help create new business models, identify the right AI approach, and create a roadmap for the creation of several AI DAOs proof of … Read more Why Building an AI Decentralized Autonomous Organization (AI DAO)
[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Photo by Brian McGowan on Unsplash This is a guest post from … Read more Future-Proofing Your Data Science Team
[This article was first published on Posts on Tychobra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Polished.tech is our new software service that makes it easier than … Read more Introducing Polished.tech
[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Lately I’ve been publishingscreencasts demonstrating how to use thetidymodels framework, … Read more The Bechdel test and the X-Mansion with tidymodels and #TidyTuesday
Data visualization tools such as Tableau are loved and used because of how simple they make it to show correlations in large datasets. The exact reason they are used is also their biggest flaw. It’s too easy to simply click buttons until you find something which looks acceptable. Lets look at some examples. I’ve recreated … Read more My Tableau dashboards sucked – until I started drawing them
[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. July 2nd (8:00pm UTC+2) is a date for the last Webinar at Why … Read more Why R? Webinar – JD Long – Helping drive data science adoption in organizations
y=α+βx would give the predicted values and we calculate the values of α & β from the above formula where β is the slope and α is the y-intercept. The goal of the simple linear regression is to create a linear model that minimizes the sum of squares of the residuals(error). An interesting fact about … Read more ANOVA for Regression
[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Jackie Wong, Jon Forster (Warwick) and Peter Smith have just … Read more one bridge further
import streamlit as stimport pandas as pdimport plotly.express as pximport pydeck as pdkimport numpy as np#Load and Cache the [email protected](persist=True)def getmedata():url = ‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv’df = pd.read_csv(url, delimiter=’,’, header=’infer’)df.rename(index=lambda x: df.at[x, ‘Country/Region’], inplace=True)dft = df.loc[df[‘Province/State’].isnull()]dft = dft.transpose()dft = dft.drop([‘Province/State’, ‘Country/Region’, ‘Lat’, ‘Long’])dft.index = pd.to_datetime(dft.index)return(dft, df)df1 = getmedata()st.title(‘Building a Data Dashboard with Streamlit’)st.subheader(‘while exploring COVID-19 data’)#####In Scope Countriescountrylist … Read more Learn How to Create Web Data Apps in Python
Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data. Favorite
Insights from Open Data and Machine Learning I know what you are thinking: Wars are rare and complicated events, one can’t expect to take into account their entire complexity. And you are right, they spring from an intricate array of political, economic, and historical reasons without forgetting the thick coat of randomness, thus they should … Read more Predicting Future Wars
This article is Part 1 of the series “Winning in Analytics!”. Let’s look at key enablers, to scale your AI initiatives with success. Photo by Tim Mossholder on Unsplash Dear AI Enthusiasts, we love to realize the full potential of our data! We would love to see our analytics proof of concepts achieve reality! But … Read more Industrialize Analytics — How do we get there?
[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. On June 25th we had a pleasure to host Why R? Webinar with … Read more Neural Networks using Tensorflow via Keras in R – Video
Theory, assumptions, problems, and solutions for practitioners Photo by Pixabay from Pexels The equation offered by Black and Scholes (1973) is the standard theoretical pricing model for European options. The keyword being theoretical as the Black-Scholes model makes some key assumptions that are immediately violated in practice. Key model assumptions: No transaction costs No arbitrage … Read more Black-Scholes Option Pricing is Wrong
This post is for all those data science aficionados out there who recently jumped on to the machine learning bandwagon. Whether you studied data science in college or are autodidactic, most aspiring data scientists get a reality check when trying their hand on a machine learning project in a practical setting. I struggled with the … Read more How to Avoid Potential Machine Learning Pitfalls
Cloud Storage provides worldwide, highly durable object storage that scales to exabytes of data. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, and big data analytics. The objects are … Read more A guide to setting up monitoring for object creation in Cloud StorageA guide to setting up monitoring for object creation in Cloud StorageBig Data and Analytics Cloud ConsultantStrategic Cloud Engineer
Tree-based Feature Importance Machine learning model such as random forests is typically treated as a black-box. Why? A forest consists of a large number of deep trees, where each tree is trained on bagged data using a random selection of features. To gaining a full understanding by examining each tree would close to impossible. For … Read more Machine Learning Explainability Introduction via eli5
[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. On a late evening, I was scrolling through … Read more How to become a data scientist in 30 days?
Going from floats to integers Neural networks are very resource intensive algorithms. They not only incur significant computational costs, they also consume a lot of memory in addition. Even though the commercially available computational resources increase day by day, optimizing the training and inference of deep neural networks is extremely important. If we run our … Read more How to accelerate and compress neural networks with quantization
Model Evaluation Now that all the theory is out of the way, let’s see how these components come together to produce high-quality recommendations on a well-known real-world data set. We’ll train an implicit feedback FM model using the author’s new RankFM package which implements the techniques described above and compare its performance to the popular … Read more Factorization Machines for Item Recommendation with Implicit Feedback Data
The virtual event Azure Synapse Analytics: How It Works is now available on demand. In demos and technical discussions, Microsoft customers explain how they’re using the newest Azure Synapse Analytics capabilities to deliver insights faster, bring together an entire analytics ecosystem in a central location, reduce costs, and transform decision-making. This post outlines five key … Read more Five reasons to view this Azure Synapse Analytics virtual event
How does it feel to be in one of these roles? Find out here. Photo by Christina @ wocintechchat.com . Introduction Data Analyst Data Scientist Summary References After working as both a professional data analyst and data scientist, I thought it would be insightful to highlight the experience of each position along with some key … Read more Would You Rather be a Data Analyst or Data Scientist?
To my excitement, Spotify already has a developer API in which we can use to get data from Spotify or trigger certain actions for Spotify users. What we need to do is just register to the site, create an app, and get the API token. Then, we can use spotipy package in Python to retrieve … Read more What Covid-related topics are being discussed in Spotify Podcasts?
Using the Apriori algorithm to offer product recommendation, product placement, pricing and bundling strategies Imagine if we could understand what our customers’ next purchase could be! Imagine if we could find patterns in purchase behaviour and use it to our advantage! The key to the future is in history! Market Basket Analysis helps retailers identify … Read more Product Placement, Pricing and Promotion Strategies with Association Rule Learning
Creating an array of colors. Firstly, I picked the corresponding RGB values for a 120 crayons Crayola box and copied them into a list. colorsFile = open(“colors.txt”,”r”)colors = for line in colorsFile.readlines():colorset = line.strip().split(” “)rgbFormat = [int(x) for x in colorset.split(“,”)]colors.append(rgbFormat) Secondly, I started by picking an image and resizing it to a smaller size. … Read more Coloring an Image using Crayola Colors (Python)
The neonatal intensive care unit (NICU) is an environment in which life-changing decisions are made. Neonatologists use information from a variety of sources to build up a picture of a newborn’s condition to ensure they are receiving the right medical care. These highly trained specialists use their judgement in tandem with a constant stream of … Read more Machine Learning for Neonatal Intensive Care
In this post, I will show you how to build a simple face detector using Python. Building a program that detects faces is a very nice project to get started with computer vision. In a previous post, I showed how to recognize text in an image, it is a great way to practice python in … Read more Simple Face Detection in Python
What does that all mean? Practically, if the only thing that you are looking to do is collect a large number tweets, Twint is probably a better tool, whereas Tweepy is better suited for collecting a richer set of metadata, allows for flexibility and potentially scalability as well for those using the official API. That’s … Read more What Python package is best for getting data from Twitter? Comparing Tweepy and Twint.
Image by emmaws4s from Pixabay Note — There is also a YouTube video explaining this paper The paper begins by making a case that often wide & deep models require a huge number of multiplications and that results in high memory and computing demands. Because of this even if the network is a top-performing model … Read more [Knowledge Distillation] FitNets : Hints For Thin Deep Nets
Image by Qimono from Pixabay (CC0) Back in 1958, Han Peter Luhn, a researcher at IBM, initiated the concept of Business Intelligence (BI), using the definition from Webster’s Dictionary: to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal. Given its definition, Business Intelligence is indeed … Read more 6 Key Areas of Business Intelligence in the New Era
An opinionated setup guide for your next python project Python is one of the fastest growing programming languages. It’s tooling is evolving fast to catch up. I have been writing python for over 10 years now and sometimes it’s hard to keep up with all the new tooling out there. Recently, I had an opportunity … Read more State-of-the-art python project setup
[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I’m beyond excited to introduce modeltime, a new time series forecasting package designed … Read more Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels
In this article, we’ll explore the differences between data scientists as Decision Support, Advisor and Integrated Partner One of the best and worst parts of being a data scientist is the ambiguity that the role can often entail. Since data science is a relatively new function, the mandate and objectives aren’t always clear. This often … Read more The Three Stages of a Data Scientist
This weeks express deals with an erratic driver: In Riddler City, the city streets follow a grid layout, running north-south and east-west. You’re driving north when you decide to play a little game. Every time you reach an intersection, you randomly turn left or right, each with a 50 percent chance. After driving through 10 … Read more The Riddler – June 26th
WOMEN IN TECHNOLOGY SERIES An Interview with the Head of Operations & Partnerships at Starbutter AI Over the past few years, tech companies and researchers all over the world have been competing to advance the frontiers of artificial intelligence. With the broadening and fast-paced developments in the space of technology, it is clear that utilizing … Read more Jean Alfonso-Decena: Leading Innovation in Conversational AI and Disrupting the Philippine FinTech…
Understand the basics with a concrete example! Photo by Matthew Fournier on Unsplash When your Python code grows in size, most probably it becomes unorganised over time. Keeping your code in the same file as it grows makes your code difficult to maintain. At this point, Python modules and packages help you to organize and … Read more Modules and Packages in Python: Fundamentals for Data Scientists
We’re going to bundle this up in a tiny ruby project. Create our directory $ mkdir kiba-etl && cd kiba-etl/ Add the source CSV Create a CSV file with touch phone.csv and paste in the following. id,number1,123.456.78912,2223,303-030-30304,444-444-44445,900-000-000016,#10000000007,#98989898988,800-000-000009,999.999.999910,184.108.40.206.220.127.116.11.1.111,(112)233-445512,(121)212-0000 In a real situation, you might use a service like Twilio to detect if they’re real phone numbers. … Read more Build The World’s Simplest ETL (Extract, Transform, Load) Pipeline in Ruby With Kiba
Using YFinance and Plotly libraries for Stock Data Analysis Photo by Alec Favale on Unsplash In this article, I will explain to you how you can use YFinance a python library aimed to solve the problem of downloading stock data by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! … Read more Downloading Stock Data and Representing it Visually
On a normal day, this line drew my attention when someone asking for the reason of ++ not as an operator in Python. If you want to know the original reason, you’ll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido) ~ By stackoverflow And this enforces … Read more Why doesn’t Python support i++ increment syntax
Factors play a crucial role in data analysis. Learn how to create, subset, and compare them. A factor refers to a statistical data type used to store categorical variables. Categorical variables belong to a limited number of categories. Continuous variables, on the other hand, can correspond to an infinite number of values. It is important … Read more Introduction to Factors in R
photo by author I recently had the opportunity to deliver a hands-on workshop on training a Keras deep learning model. This workshop was a follow-on for a session I had done for a local meetup that reviewed the content in my upcoming book for Manning Publications, Deep Learning with Structured Data. After the introductory session … Read more Cocalc vs. Colab — Which Is Better for a Hands-On Workshop?