Data Collection There are three datasets we’re using to run this experiment: A dataset we’ll collect ourselves that includes over 3400 song lyrics between 1970 and 2018. A list of prohibited/restricted words from www.freewebheaders.com that we’ll use to assess the perceived levels of profanity in lyrics. A training dataset from Kaggle (originally used for the … Read more49 Years of Lyrics: Why so Angry?
For anyone who has been paying attention, it will not have gone unnoticed that the past year has seen a dramatic expansion in the use of face recognition technology, including at schools, border crossing, and interactions with the police. Most recently, Delta announced that some passengers in Atlanta will be able to check in and … Read moreOn the Perils of Automated Face Recognition
Explore further the world of data science, machine learning and artificial intelligence We are on a mission to get the best content relevant to data science, machine learning, and artificial intelligence out there for everyone. One of the challenges with any content platform on the internet is having a dedicated and curated list of resources … Read moreOur Collections
Article jointly written by Arthur Pesah and Antoine Wehenkel Motivation There are usually two ways of coming up with a new scientific theory: Starting from first principles, deducing the consequent laws, and coming up with experimental predictions in order to verify the theory Starting from experiments and inferring the simplest laws that explain your data. … Read moreImprove your scientific models with meta-learning and likelihood-free inference
How to conceptualize and implement effective data science projects Results, not hype Motivation The more I delve in data science, the more convinced I am that companies and data science practitioners must have a clear view on how to cut through the machine learning and AI hype, to implement an effective data science strategy that drives business … Read more6 uncommon principles for effective data sciences
Using Keras and TensorFlow.js to classify seven types of skin lesions Alex YuBlockedUnblockFollowFollowing Dec 16 After doing research on Convolutional Neural Networks, I became interested in developing an end-to-end machine learning solution. I decided to use the HAM10000 dataset to build a web app to classify skin lesions. In this article, I’ll provide some background information … Read moreBuilding a Skin Lesion Classification Web App
Learn how to Deal with Anxiety. When you start researching how to become a data scientist, you will discover an unfortunate fact about the profession. Namely, that becoming a data scientist requires knowledge of a broad and deep set of tools, technologies, and skills. All of which makes the prospect of becoming a data scientist VERY … Read moreHow to Learn Data Science: Staying Motivated.
Advanced NLP techniques for deep learning With the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning. For those who don’t know, Text classification is a common task in natural language processing, which transforms a sequence of a text of indefinite length … Read moreWhat Kagglers are using for Text Classification
Dec 16, 2018 Source: Dark Reading Background Our project was inspired by Jamie Ryan Kiros who created a model trained on 14 million romance passages to generate a short romantic story for a single image input. Similarly, the ultimate goal of our project was to output a short story for children. “neural-storyteller is a recurrent neural … Read moreIs a Picture Worth A Thousand Words?
Opening up a Colab Notebook When using Colab for the first time, you can launch a new notebook here: Once you have a notebook created, it’ll be saved in your Google Drive (Colab Notebooks folder). You can access it by visiting your Google Drive page, then either double-click on the file name, or right-click, and then … Read moreGetting Started with TensorFlow in Google Colaboratory
Flask API, Document Classification, Spam Filter By far, we have developed many machine learning models, generated numeric predictions on the testing data, and tested the results. And we did everything offline. In reality, generating predictions is only part of a machine learning project, although it is the most important part in my opinion. Considering a system … Read moreDevelop a NLP Model in Python & Deploy It with Flask, Step by Step
The origin of logic theory starts at the concept of an argument. The majority of logic textbooks contain an opening, central definition for an argument — one that likely sounds much like the following: An argument contains one or more special statements, called premises , offered as a reason to believe that a further statement, called the conclusion, … Read moreLogic Theory —Basic Notation
Yes, SQL still exists During the years of working with telecom data my folder with code snippets collected a lot of reusable examples. And it is not about “SELECT * FROM Table1”, I am talking about finding and handling or removing duplicate values, selecting top N values from each group of data within same table, shuffling … Read moreAdvanced Queries With SQL That Will Save Your Time
Dec 16, 2018 Art of Generative Adversarial Networks Code link for all the work mention in the post:- We had this pleasure of working on Generative adversarial network project for our final project for Business Data Science in our curriculum. Though we could have chosen any other subject as our final project yet we went … Read moreArt of Generative Adversarial Networks (GAN)
Progressively growing GANs enables them to get bigger and more stable The people in the high resolution images above may look real, but they are actually not — they were synthesized by a ProGAN trained on millions of celebrity images. “ProGAN” is the colloquial term for a type of generative adversarial network that was pioneered at NVIDIA. It … Read moreProGAN: How NVIDIA Generated Images of Unprecedented Quality
How to take a map visualization to the next level. First, I will cover two reasons why visualizing data using maps is often compelling to an audience. Then, I will cover three tips that will help you make the transition from good to exceptional when building map visualizations. Why Use a Map for a Data Visualization? … Read moreWays to Improve a Map Visualization
Inorganic knowledge traditions with model-based reinforcement learning This essay explores the concept of inorganic knowledge traditions capable of sequential improvement using model based reinforcement learning Many behavioral economists presently believe that there are two primary methods used by humans for strategic decision making. One is fast, intuitive and unconscious — what has been called System 1 thinking. … Read moreRobots that Reason
The profession of reality is moving into the 21st century, and as you can imagine home listings are flooding the internet. If you have ever looked at buying a home, renting an apartment, or just wanted to see what the most expensive home in town is (we have all been there), then chances are you … Read moreSimple House Price Predictor using ML through TensorFlow in Python
3. Model Building in R I have used the dataset which contains the details of 2,201 flights. The descriptions of each variable are as below. 3.1) Datasets schedtime : the scheduled time of departure (using the 24-hour clock) carrier : the two-letter code indicating which airline operated the flight deptime : the actual departure time dest : the three-letter code … Read moreRegression Analysis: Linear Regression
As a newbie to machine learning most people get excited when their training error starts reducing. They try hard further and it starts reducing even further, their excitement knows no bounds. They show their results to master Oogway ( elderly wise tortoise in Kungfu Panda) and he calmly says well not a good model you … Read moreWhat’s the fuss about Regularization?
Project by David Chen, Ashwin Gupta, Shruthi Krish, Raghav Prakash, Wei Wang Twitter is a social media platform that millions of users use to share updates about their lives. Often, these tweets are about local events happening around the user. Though news agencies report on local events, the time it takes an agency to learn … Read moreFinding Local Events Using Twitter Data
Exploring an important healthcare performance metric Photo by Hush Naidoo on Unsplash Project Overview Predictive analytics is an increasingly important tool in the healthcare field since modern machine learning (ML) methods can use large amounts of available data to predict individual outcomes for patients. For example, ML predictions can help healthcare providers determine likelihoods of disease, … Read morePredicting hospital length-of-stay at time of admission
Using TensorFlow probability for Hamiltonian Sampling Free photo from https://pixabay.com One type of criticism I received for the previous work on project estimation is that the log-Normal distribution has short tails. And this is true, despite all the benefits of log-Normal distribution. The reason is very simple: when fitting the data to the distribution shape … Read moreUsing Markov Chain Monte Carlo method for project estimation
This article focuses on how to utilize a popular open source database “Influxdb” along with spark-structured streaming to process, store and visualize data in real time. Here, we will go in detail over how to set up a single node instance of Influxdb, how to extend the Foreach writer of SPARK to use it to … Read moreProcessing Time Series Data in Real-Time with InfluxDB and Structured Streaming
The law has language at its heart, so it’s not surprising that software that operates on natural language has played a role in some areas of the legal profession for a long time. But the last few years have seen an increased interest in applying modern techniques to a wider range of problems, so this … Read moreLaw and Word Order: NLP in Legal Tech
Keras is an API used for running high-level neural networks. The model runs on top of TensorFlow, and was developed by Google. In this particular example, a neural network will be built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to … Read moreKeras with R: Predicting car sales
I tried my hand at using the R package, randomForest to create two regression models for tree height and basal area based off some lidar and field-collected data in the Finger Lakes National Forest, NY. Disclaimer: this project was my first real taste of R. Earlier in the semester I had done some simple learning into … Read moreModeling tree height and basal area in the Finger Lakes National Forest, NY
Generate characters from Alice in Wonderland Introduction Text generation is a popular problem in Data Science and Machine Learning, and it is a suitable task for Recurrent Neural Nets. This report uses TensorFlow to build an RNN text generator and builds a high-level API in Python3. The report is inspired by @karpathy ( min-char-rnn) and … Read moreText Generation Using RNNs
Introduction to Plotly Plotly is a company that makes visualization tools including a Python API library. (Plotly also makes Dash, a framework for building interactive web-based applications with Python code). For this article, we’ll stick to working with the plotly Python library in a Jupyter Notebook and touching up images in the online plotly editor. When … Read moreIntroduction to Interactive Time Series Visualizations with Plotly in Python
The first and most important step of our journey: As I have said before, we are going to simply ask questions that will guide us to build an image classifier. For the sake of brevity, we will call Image Classifier an ICNow, we are ready to start our journey. So let us ask the first question: … Read moreThe Ultimate NanoBook to understand Deep Learning based Image Classifier
SRGAN Results from Ledig et al.  Generative adversarial networks (GANs) have found many applications in Deep Learning. One interesting problem that can be better solved using GANs is super-resolution. Super-resolution is a task concerned with upscaling images from low-resolution sizes such as 90 x 90, into high-resolution sizes such as 360 x 360. In this … Read moreApplying GANs to Super Resolution
Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision. The agenda for this field is to enable machines … Read moreA Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
Skewed datasets are not uncommon. And they are tough to handle. Usual classification models and techniques often fail miserably when presented with such a problem. Although your model could get you to even a 99% accuracy on such cases, yet, if you are measuring yourself against a sensible metric such as the ROC Auc score, … Read moreDealing With Class Imbalanced Datasets For Classification.
A brief introduction on critical steps in demand forecasting Collecting The key here is the format of data storage. Intuitively we think of time-series data in the format below, also known as the wide format. However, the wide format is bad for SQL-based storage, as when we add new dates, we need to add another … Read moreTime-series Forecasting Flow
Image classification with 15k classes! Project by Catherine McNabb, Anuraag Mohile, Avani Sharma, Evan David, Anisha Garg Dealing with a large number of classes with very few images in many classes is what makes this task really challenging! The problem comes from a famous Kaggle competition, the Google Landmark Recognition Challenge. Training set contains over 1.2 … Read moreGoogle Landmark Recognition using Transfer Learning
Dec 14, 2018 In this article you will see how the theories presented in previous two articles can be implemented in easy to understand java code. The full neural network implementation can be downloaded, inspected in detail, built upon and experimented with. This is the third part in a series of articles: I assume you … Read morePart 3: Implementation in Java
Computer-generated imagery in movies has gotten so good these days, much of the time you don’t even realize it’s there. You probably never noticed how Michael Cera’s physique had been altered, or how Lost in Translation used motion capture technology from the future. [embedded content] That’s all from the blog team for this week. Have … Read moreBecause it’s Friday: CGI you never knew was CGI
Anime Obsession gone too far!! OtakusHenry Chang, Joey Chen, Guanhua Zhang, Preetika Srivastava and Cherry Agarwal The vast amount of data that is hosted on the internet today has led to the information overflow and thus there is a constant need to improve the user experience. A recommendation engine is a system that helps support … Read moreAnime Recommendation engine: From Matrix Factorization to Learning-to-rank
Find specific elements in the page The created BeautifulSoup object can now be used to find elements in the HTML. When we inspected the website we saw that every list item in the content section has a class that starts with tocsection- and we can us BeautifulSoup’s find_all method to find all list items with that … Read moreIntroduction to Web Scraping with BeautifulSoup
Why cleaning data is the most important step Original Project Mission: Find interesting insights to see where the remodeling market is headed Project Mission (Twist): How to handle well manicured excel data in Python (spoiler: neat is a deceptive word) Timeline: One week (I tell you, it’s not enough!) Project Findings for the Original Goal : … Read moreHome Remodeling Analysis Turned Excel Data Handling in Python
Every author dreams of writing full time, but the sad truth is that the majority of authors don’t make nearly enough to support themselves, let alone a family. If you hit the NY Times Best Seller list, your chances of making writing your career will be much higher, especially if it stays on the list … Read moreFinding Trends in NY Times Best Sellers
A recurring subject in NLP is to understand large corpus of texts through topic extraction. Whether you analyze users’ online reviews, product descriptions, or text entered in search bars, understanding key topics will always come in handy. Popular picture explaining LDA Before going into the LDA method, let me remind you that not reinventing the … Read moreThe complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python
The first thing I like to do when doing EDA on a dataset with a reasonable amount of numeric columns, is to check the relationship between my target variable and these numeric features. One quick way to do this is to use the seaborn heatmap plot. This seaborn heatmap takes the correlation matrix calculated on … Read moreExploratory Data Analysis, Feature Engineering and Modelling using Supermarket Sales Data. Part 1.
Tf-idf vectors with word-embeddings are analyzed for clustering effectiveness. The text corpus examples considered here indicate that custom word-embeddings can help improve clusterability of the corpus That is welcome news after our ho-hum results for text classification when using word-embeddings. In the context of classification we concluded that keeping it simple with naive bayes and tf-idf … Read moreWant to Cluster Text? Try Custom Word-Embeddings!
3rd Wave Data Visualization By Elijah Meeks — 12 min read Imagine what it was like to do data visualization 30 years ago. It’s 1988 and you’re using Excel 2.0 for simple charts like pie charts and line charts, or maybe something like SPSS for more complicated exploration and Arc/Info for geospatial data visualization. Favorite
Executing standard SQL queries on your Amazon S3 bucket files Dec 14, 2018 “What’s Amazon Athena?”, I hear you ask. Good question. It’s one of Amazon Web Services’ amenities for architecture in the cloud. More specifically, Athena allows us to query data we hold in another service called Amazon Simple Storage Service (S3) using standard SQL … Read moreCSV Analysis with Amazon Athena
“The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups (although you tend to only see it used when there are a minimum of three, rather than two groups)”. Having entered the world of digital analytics from a … Read moreUsing Analysis of Variance with Experimentation Data
Dec 14, 2018 It looks like Christmas is a little early this year 😉 Here’s a little something from me to all of you out there: a map to navigate ML services on AWS. With all the new stuff launched at re:Invent, I’m quite sure it will come in handy! This is very much work in … Read moreA map for Machine Learning on AWS
Time Series Forecasting using Auto-ARIMA in python. AI and future Currently, there is a lot of development going on in Artificial intelligence research to get an accurate glimpse of the future. If any mathematical model predicts future data taking input as only time then that terminology called as time series forecasting. There are many machine learning and … Read moreGet a glimpse of future using time series forecasting using Auto-ARIMA and Artificial Intelligence
Joseph Catanzarite The Naïve Bayes Classifier is perhaps the simplest machine learning classifier to build, train, and predict with. This post will show how and why it works. Part 1 reveals that the much-celebrated Bayes Rule is just a simple statement about joint and conditional probabilities. But its blandness belies astonishing power, as we’ll see … Read moreThe Naive Bayes Classifier