Where do you fall on the data science distribution?

Ever feel like you destroyed a job interview and then ended up not getting the job. Or how about completely bombing a technical screen and still passing onto the next round? You’re not alone, hiring standards are confusing at best but it still begs the question: how do you know how well you’re performing in … Read moreWhere do you fall on the data science distribution?

America’s Clustered Consensus

Whatever happened to “majority rule”? A continual source of democratic frustration today is that public opinion does not seem to directly translate into public policy. For example, a large majority of Americans want to see campaign finance reform, background checks for gun ownership, and reductions in fossil fuel consumption. Yet, while overwhelming public support has … Read moreAmerica’s Clustered Consensus

What I Discovered About Opportunity Zones From Analyzing Half a Million Data Points

There has been a lot of buzz about Opportunity Zones recently and understandably so; it is the newest federal effort to create long-term investments in low-income urban and rural census tract areas. Once designated as a qualified Opportunity Zone, these places are able to receive investments through Opportunity Funds, which are created specifically to invest … Read moreWhat I Discovered About Opportunity Zones From Analyzing Half a Million Data Points

Data Science of Evictions

Dramatic visualization of 2016 eviction filings by state at the National Building Museum From April 2018 through May 2019, a gallery wing of the D.C. National Building Museum was transformed into a labyrinth of forbidding plywood structures, towering piles of shrinkwrapped home furnishings, and striking visual displays. The exhibition showcased the statistics and stories revealed … Read moreData Science of Evictions

H2O Driverless AI: Data Science without Coding

AI that does AI — Develop your first model today. Anyone can be a Data Science. No Coding Required. Photo by Frank Albrecht on Unsplash In today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you … Read moreH2O Driverless AI: Data Science without Coding

Why I Donate All of My Book’s Proceeds to Girls Who Code

Grace Hopper, Ph.D. (Vassar Archives) Doing a small part to help close a gender gap Few, if any, of my classmates shared my fascination with the Mark I Computer that was on display in our university’s Science Center. It is hard to blame them. Towering at 8 feet and filled with rotary switches, crystal diodes, … Read moreWhy I Donate All of My Book’s Proceeds to Girls Who Code

Fake Face Generator Using DCGAN Model

The implementation part is broken down into a series of tasks from loading data to defining and training adversarial networks. At the end of this section, you’ll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look fairly like realistic faces with small amounts of … Read moreFake Face Generator Using DCGAN Model

Deploy A Text Generating API With Hugging Face’s DistilGPT-2

For the better part of a year, OpenAI’s GPT-2 has been one of the hottest topics in machine learning — and for good reason. The text generating model, which initially was dubbed “too dangerous” to be released in full, is capable of producing uncanny outputs. If you haven’t seen any examples, I recommend looking at … Read moreDeploy A Text Generating API With Hugging Face’s DistilGPT-2

From Dev to Prod – All you need to know to get your Flask application running on AWS

Getting the right configurations, making sure it is secured, ensuring resource access through endpoints and having a pretty rendering, … all of them made easy thanks to AWS! As a machine-learning engineer, I never really faced the issue of putting my algorithms out there myself. Well, that was until recently, when I decided to start … Read moreFrom Dev to Prod – All you need to know to get your Flask application running on AWS

Building a machine learning classifier model for diabetes

Based on medical diagnostic measurements Python codes are available: https://github.com/JNYH/diabetes_classifier The Pima Indians of Arizona and Mexico have the highest reported prevalence of diabetes of any population in the world. A small study has been conducted to analyse their medical records to assess if it is possible to predict the onset of diabetes based on … Read moreBuilding a machine learning classifier model for diabetes

AiPM

So what? The cost scales exponentially and unpredictably. The example we shared is just to manage one model, for one business line, and for one model cycle (a different issue may happen in the future). Now, imagine scaling this process to hundreds of models for multiple business units and functions. The bottom line: companies cannot … Read moreAiPM

How Spotify Recommends Your New Favorite Artist

A story of data, taste, and a very effective recommender system. Just a short few days ago, I was discussing the impact of recommender systems with some students on a course I’m teaching. Netflix, Amazon, Facebook, and many other online services, use our data to suggest other products we might like. Is this helpful, or … Read moreHow Spotify Recommends Your New Favorite Artist

What do campaign contributions tell us about the federal election?

With Canada’s 43rd Federal Election not too far in the rearview mirror, we at ThinkData Works were curious as to what we can learn about our most recent election by stepping back from the punditry and analyzing some data. After all, using government data is a great way to understand how our government works. There … Read moreWhat do campaign contributions tell us about the federal election?

Quickly Build and Deploy an Application with Streamlit

With the launch of Streamlit, developing a dashboard for your machine learning solution has been made incredibly easy. Streamlit is an open source app framework specifically designed for ML engineers working with Python. It allows you to create a stunning looking application with only a few lines of code. I want to take this opportunity … Read moreQuickly Build and Deploy an Application with Streamlit

The hardest question you’ve been asked in a data science interview

What’s the most difficult question you ever encountered in a data science interview? I’ll share mine: “How many years of experience do you have in language X?” This is really hard to answer: Do I count the years I used it in academia? Do I count the years I used it in my hobby projects? … Read moreThe hardest question you’ve been asked in a data science interview

Take your Machine Learning Models to Production with these 5 simple steps

I have created this impressive ML model, it gives 90% accuracy, but it takes around 10 seconds to fetch a prediction. Is that acceptable? For some use-cases maybe, but really no. In the past, there have been many Kaggle competitions whose winners ended up creating monster ensembles to take the top spots on the leaderboard. … Read moreTake your Machine Learning Models to Production with these 5 simple steps

Full Stack Development Tutorial: Serverless REST API running on AWS Lambda

Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. — Wikipedia Photo by Anthony Cantin on Unsplash (This is the second … Read moreFull Stack Development Tutorial: Serverless REST API running on AWS Lambda

How I Use AI Across One of My Favorite Hobbies — Photography

Neural Networks for labeling, compression, effects and more! You can read the article and follow along with the code in the repo: Poseyy/AI-Photography You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or… github.com An obvious application of AI to photography is … Read moreHow I Use AI Across One of My Favorite Hobbies — Photography

Online Marketing Measurement: Which Half?

A constant presence on today’s internet are ads. They power Google and Facebook and follow us everywhere. As with all marketing spend, they’re an investment. As with most investments, it’s crucial to measure their return. What makes online marketing different is the unprecedented possibility of building accurate measurement tools. In this post I’ll describe a … Read moreOnline Marketing Measurement: Which Half?

How to Identify Hotel Deals — Using Machine Learning

Web Scraping I used BeautifulSoup and Selenium in parallel to scrape 3 months of hotel listing information from Hotel.com. Some of the information I scraped were the checkin and checkout dates, number of adults and children, distance to city and convention centers, hotel addresses, hotel reviews and ratings, TripAdvisor’s ratings and reviews, hotel amenities, and … Read moreHow to Identify Hotel Deals — Using Machine Learning

STL decomposition : How to do it from Scratch?

Figure out what STL decomposition is and how it works. This article will help you understand what is STL decomposition and how to do it from scratch. At the end, I will use statsmodel library too, to get the results in seconds. So, STL stands for Seasonal and Trend decomposition using Loess. This is a … Read moreSTL decomposition : How to do it from Scratch?

Managerial Analytics and Data Science

Previously, we learned about two general areas of machine learning: Supervised and Unsupervised learning. Here, we’ll investigate two special fields of machine learning: time series prediction and natural language processing. Time Series Forecasting Time series forecasting refers to any type of supervised Machine Learning where time is an important feature. A good time series forecast … Read moreManagerial Analytics and Data Science

Enter Analytics: From Boot Camp to working in Data Science

We covered a lot in a short amount of time… almost too many topics, actually. Just when you start getting comfortable and ready to do more advanced things, they change the topic. It is really up to you to decide what direction you want to take things outside of the classroom. For example, I am … Read moreEnter Analytics: From Boot Camp to working in Data Science

Teaching A Computer To Land On The Moon

I spent a fair amount of time last year catching up on what’s happening in machine learning. The tools available now are really impressive — you can implement a complex neural net in just a few lines of code now with the libraries that are available. I’ve always been fascinated by the idea of machines … Read moreTeaching A Computer To Land On The Moon

Few tips you can use while collecting data

1- Organize your scripts I consider each scripts executing specific tasks separate to keep my Jupyter Notebook clean and neat… It is important to stay organized while collecting data. That helps find out easily where your mistakes are. Writing one block of code wouldn’t help. I propose two habits to develop: keep the codes commented … Read moreFew tips you can use while collecting data

Beginner’s Guide to Encoding Data

As you can see the Book_Table column has been encoded into numerical values of 0/1. The output of le.fit_transform(df[“Book_Table”]) is a Dataframe/Series depending on no. of columns encoded. Mostly Binary Columns (Book_Table from df)are encoded using Label Encoder. For Multiclass it will give different (0 to n_classes-1) values for different classes eg. 0,1,2,3,….,n-1; which are … Read moreBeginner’s Guide to Encoding Data

How to graph a Bar Chart Race and realize I don’t need one?

I can totally read the chart with hindsight bias, campaign funding money doesn’t predict the performance of the candidate. But other than that, a cool animated bar graph doesn’t tell you that much going forward. After all, if we want to read a truly scientific and analytical piece, making some visualization is far not enough. … Read moreHow to graph a Bar Chart Race and realize I don’t need one?

Building a Pseudorandom Number Generator

This giant formula can be read like this: The probability that an algorithm in the class of probabilistic polynomial time problems (BPP) could distinguish a sequence between a real random source and a PRNG tends to zero faster than any polynomial as the length of the seed increases. Therefore, a PRNG is an algorithm that … Read moreBuilding a Pseudorandom Number Generator

Preventing Data Leakage in Your Machine Learning Model

It goes without saying that knowledge about the dataset you are working with is necessary in order to be able to perform effective analyses and develop sound models. However, what is not often said, with regards to data leakage, is that you should refrain from studying the distributions or basic statistics your dataset until after … Read morePreventing Data Leakage in Your Machine Learning Model

A Hybrid Neural Machine Translation Model (Luong & Manning):

Luong & Manning published a recent paper entitled “Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models.” the contents of which I summarize below. For a quick summary of the current State-of-the-Art in Neural machine Translation (NMT) you can take a quick look at my other post here Currently the general word-based NMTs generate … Read moreA Hybrid Neural Machine Translation Model (Luong & Manning):

Exoplanets III: Habitability and Conclusion

Two possibilities exist: either we are alone in the Universe or we are not. Both are equally terrifying. -Arthur C. Clarke Although the search for other planets is partly motivated by our efforts to understand their formation and to improve the understanding of our own solar system, the ultimate goal is to find extraterrestrial life. … Read moreExoplanets III: Habitability and Conclusion

Exoplanets II: Interpretation of Data

Now that we’ve seen and understood the historical background, the scientific value of the research, and the implications of the discoveries, we will look at the actual data found and compiled by space missions, so that we can relate them to actual physics. By plotting graphs of the data, we can visually see the correlations … Read moreExoplanets II: Interpretation of Data

Exoplanets I: Methods and Discoveries

Mankind has long since speculated about planetary systems other than our own. Philosophers hypothesized centuries ago that our solar system was not unique; that there were in fact countless more that existed in the seemingly limitless ocean of stars. The possibility of life existing on a planet orbiting another star was not just a plausible … Read moreExoplanets I: Methods and Discoveries

How AI Will Redefine Economics

RESOLVING THE PROBLEM OF CAUSATION WITH BIG DATA For decades, economists have made their analyses of the economy based on data sets only as large as their research assistants could handle, hence severely limiting the scope and precision of their work. AI and machine learning will enable economists to dramatically enlarge these data sets and … Read moreHow AI Will Redefine Economics

Credit Card Fraud Detection using Self Organizing FeatureMaps

What are self organising feature maps ? A self-organizing map ( SOM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing … Read moreCredit Card Fraud Detection using Self Organizing FeatureMaps

Multi-Label Text Classification with XLNet

Click here for the Colab notebook accompanying this article. First, let’s install the necessary library, actually just transformers. Next, we import the necessary libraries. Check if the GPU is available. Mount your google drive to your Colab notebook. For our example, we will create a Data folder in our google drive and put the datasets … Read moreMulti-Label Text Classification with XLNet

Nuances in the usage of Word Embeddings: Semantic and Syntactic Relationships

Note: Super short post ahead. Just food for thought I guess? 🙂 In the past weeks, I’ve been writing about Word Embeddings. How I created word embeddings from scratch for a colloquial language such as Singlish, and how I augmented it to handle misspellings or out-of-vocabulary words with translation vectors. In the latter article, I … Read moreNuances in the usage of Word Embeddings: Semantic and Syntactic Relationships

The ‘Ingredients’ of Machine Learning Algorithms

The components that most machine learning algorithms have in common. Photo by Dan Gold on Unsplash What’s a cost function, optimization, a model, or an algorithm? The esoteric nuances of machine learning algorithms and terminology can easily overwhelm the machine learning novice. As I was reading the Deep Learning book by Yoshua Bengio, Aaron Courville, … Read moreThe ‘Ingredients’ of Machine Learning Algorithms

Paper review: DenseNet -Densely Connected Convolutional Networks

CVPR 2017, Best Paper Award winner Dense connections “Simple models and a lot of data trump more elaborate models based on less data. “ — Peter Norvig ‘Densely Connected Convolutional Networks’ received the Best Paper Award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. The paper can be read here. The … Read morePaper review: DenseNet -Densely Connected Convolutional Networks

How to visualize data on top of a map in python using the geoviews library

For the purposes of this tutorial, we are going to make a plot to visualize the passengers volume for the busiest airports in my country, Greece, and the neighbor country, Turkey, for comparison reasons. First, we need to import the libraries and the methods we are about to use. import pandas as pdimport numpy as … Read moreHow to visualize data on top of a map in python using the geoviews library

A brief intro to the Central Limit Theorem

According to wikipedia. In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. Translation: If you take enough samples from a population, the … Read moreA brief intro to the Central Limit Theorem

Using TF-IDF to form descriptive chapter summaries via keyword extraction.

Source: https://pixabay.com/photos/library-books-education-literature-869061/ TF IDF is a natural language processing technique useful for the extraction of important keywords within a set of documents or chapters. The acronym stands for “term frequency-inverse document frequency” and describes how the algorithm works. The dataset As our dataset, we shall take the script of Mary Shelley’s Frankenstein (provided by Project … Read moreUsing TF-IDF to form descriptive chapter summaries via keyword extraction.

The Easy Way to Extend Pandas API

In this article, you’ll learn how to tailor pandas API to your business, research, or personal workflow using by using pandas_flavour. Pandas-flavor is a library that introduces API for extending Pandas. This API handled the boilerplate code for registering custom accessors onto Pandas objects. There are plenty of examples of extensions in the wild including: … Read moreThe Easy Way to Extend Pandas API

Cleaning Web-Scraped Data with Pandas (Part II)

As I mentioned in my previous post, cleaning data is a prerequisite to machine learning. Measuring the sanity of your data can also give you a good indication of how precise or accurate your model would be. When it comes to web-scraped data, you would often lose a lot of information in the process of … Read moreCleaning Web-Scraped Data with Pandas (Part II)

Utilize Your Self-Imposed Deadlines | Punch Today in The Face

The art of creating self-imposed deadlines is crucial not only to go above and beyond meeting requirements, but also to make our working progress in achieving small and big goals a lot smoother. Love them or hate them; they are incredibly motivational deadlines! However, here are two things we should try to avoid when dealing … Read moreUtilize Your Self-Imposed Deadlines | Punch Today in The Face