Where do you fall on the data science distribution?

Ever feel like you destroyed a job interview and then ended up not getting the job. Or how about completely bombing a technical screen and still passing onto the next round? You’re not alone, hiring standards are confusing at best but it still begs the question: how do you know how well you’re performing in … Read moreWhere do you fall on the data science distribution?

America’s Clustered Consensus

Whatever happened to “majority rule”? A continual source of democratic frustration today is that public opinion does not seem to directly translate into public policy. For example, a large majority of Americans want to see campaign finance reform, background checks for gun ownership, and reductions in fossil fuel consumption. Yet, while overwhelming public support has … Read moreAmerica’s Clustered Consensus

What I Discovered About Opportunity Zones From Analyzing Half a Million Data Points

There has been a lot of buzz about Opportunity Zones recently and understandably so; it is the newest federal effort to create long-term investments in low-income urban and rural census tract areas. Once designated as a qualified Opportunity Zone, these places are able to receive investments through Opportunity Funds, which are created specifically to invest … Read moreWhat I Discovered About Opportunity Zones From Analyzing Half a Million Data Points

Data Science of Evictions

Dramatic visualization of 2016 eviction filings by state at the National Building Museum From April 2018 through May 2019, a gallery wing of the D.C. National Building Museum was transformed into a labyrinth of forbidding plywood structures, towering piles of shrinkwrapped home furnishings, and striking visual displays. The exhibition showcased the statistics and stories revealed … Read moreData Science of Evictions

H2O Driverless AI: Data Science without Coding

AI that does AI — Develop your first model today. Anyone can be a Data Science. No Coding Required. Photo by Frank Albrecht on Unsplash In today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you … Read moreH2O Driverless AI: Data Science without Coding

An API for @racently

[This article was first published on R | datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. @racently is a side project that I have been nursing along … Read moreAn API for @racently

Why I Donate All of My Book’s Proceeds to Girls Who Code

Grace Hopper, Ph.D. (Vassar Archives) Doing a small part to help close a gender gap Few, if any, of my classmates shared my fascination with the Mark I Computer that was on display in our university’s Science Center. It is hard to blame them. Towering at 8 feet and filled with rotary switches, crystal diodes, … Read moreWhy I Donate All of My Book’s Proceeds to Girls Who Code

Fake Face Generator Using DCGAN Model

The implementation part is broken down into a series of tasks from loading data to defining and training adversarial networks. At the end of this section, you’ll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look fairly like realistic faces with small amounts of … Read moreFake Face Generator Using DCGAN Model

Deploy A Text Generating API With Hugging Face’s DistilGPT-2

For the better part of a year, OpenAI’s GPT-2 has been one of the hottest topics in machine learning — and for good reason. The text generating model, which initially was dubbed “too dangerous” to be released in full, is capable of producing uncanny outputs. If you haven’t seen any examples, I recommend looking at … Read moreDeploy A Text Generating API With Hugging Face’s DistilGPT-2

From Dev to Prod – All you need to know to get your Flask application running on AWS

Getting the right configurations, making sure it is secured, ensuring resource access through endpoints and having a pretty rendering, … all of them made easy thanks to AWS! As a machine-learning engineer, I never really faced the issue of putting my algorithms out there myself. Well, that was until recently, when I decided to start … Read moreFrom Dev to Prod – All you need to know to get your Flask application running on AWS

Building a machine learning classifier model for diabetes

Based on medical diagnostic measurements Python codes are available: https://github.com/JNYH/diabetes_classifier The Pima Indians of Arizona and Mexico have the highest reported prevalence of diabetes of any population in the world. A small study has been conducted to analyse their medical records to assess if it is possible to predict the onset of diabetes based on … Read moreBuilding a machine learning classifier model for diabetes

AiPM

So what? The cost scales exponentially and unpredictably. The example we shared is just to manage one model, for one business line, and for one model cycle (a different issue may happen in the future). Now, imagine scaling this process to hundreds of models for multiple business units and functions. The bottom line: companies cannot … Read moreAiPM

How Spotify Recommends Your New Favorite Artist

A story of data, taste, and a very effective recommender system. Just a short few days ago, I was discussing the impact of recommender systems with some students on a course I’m teaching. Netflix, Amazon, Facebook, and many other online services, use our data to suggest other products we might like. Is this helpful, or … Read moreHow Spotify Recommends Your New Favorite Artist

Using R and H2O Isolation Forest For Data Quality

[This article was first published on R-Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. suppressWarnings( suppressMessages( library( h2o ) ) ) suppressWarnings( suppressMessages( library( dygraphs ) ) … Read moreUsing R and H2O Isolation Forest For Data Quality

What do campaign contributions tell us about the federal election?

With Canada’s 43rd Federal Election not too far in the rearview mirror, we at ThinkData Works were curious as to what we can learn about our most recent election by stepping back from the punditry and analyzing some data. After all, using government data is a great way to understand how our government works. There … Read moreWhat do campaign contributions tell us about the federal election?

Quickly Build and Deploy an Application with Streamlit

With the launch of Streamlit, developing a dashboard for your machine learning solution has been made incredibly easy. Streamlit is an open source app framework specifically designed for ML engineers working with Python. It allows you to create a stunning looking application with only a few lines of code. I want to take this opportunity … Read moreQuickly Build and Deploy an Application with Streamlit

The hardest question you’ve been asked in a data science interview

What’s the most difficult question you ever encountered in a data science interview? I’ll share mine: “How many years of experience do you have in language X?” This is really hard to answer: Do I count the years I used it in academia? Do I count the years I used it in my hobby projects? … Read moreThe hardest question you’ve been asked in a data science interview

Take your Machine Learning Models to Production with these 5 simple steps

I have created this impressive ML model, it gives 90% accuracy, but it takes around 10 seconds to fetch a prediction. Is that acceptable? For some use-cases maybe, but really no. In the past, there have been many Kaggle competitions whose winners ended up creating monster ensembles to take the top spots on the leaderboard. … Read moreTake your Machine Learning Models to Production with these 5 simple steps

Announcing the general availability of the new Azure HPC Cache service

If data-access challenges have been keeping you from running high-performance computing (HPC) jobs in Azure, we’ve got great news to report! The now-available Microsoft Azure HPC Cache service lets you run your most demanding workloads in Azure without the time and cost of rewriting applications and while storing data where you want to—in Azure or … Read moreAnnouncing the general availability of the new Azure HPC Cache service

Full Stack Development Tutorial: Serverless REST API running on AWS Lambda

Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. — Wikipedia Photo by Anthony Cantin on Unsplash (This is the second … Read moreFull Stack Development Tutorial: Serverless REST API running on AWS Lambda

How I Use AI Across One of My Favorite Hobbies — Photography

Neural Networks for labeling, compression, effects and more! You can read the article and follow along with the code in the repo: Poseyy/AI-Photography You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or… github.com An obvious application of AI to photography is … Read moreHow I Use AI Across One of My Favorite Hobbies — Photography

Online Marketing Measurement: Which Half?

A constant presence on today’s internet are ads. They power Google and Facebook and follow us everywhere. As with all marketing spend, they’re an investment. As with most investments, it’s crucial to measure their return. What makes online marketing different is the unprecedented possibility of building accurate measurement tools. In this post I’ll describe a … Read moreOnline Marketing Measurement: Which Half?

How to Identify Hotel Deals — Using Machine Learning

Web Scraping I used BeautifulSoup and Selenium in parallel to scrape 3 months of hotel listing information from Hotel.com. Some of the information I scraped were the checkin and checkout dates, number of adults and children, distance to city and convention centers, hotel addresses, hotel reviews and ratings, TripAdvisor’s ratings and reviews, hotel amenities, and … Read moreHow to Identify Hotel Deals — Using Machine Learning

STL decomposition : How to do it from Scratch?

Figure out what STL decomposition is and how it works. This article will help you understand what is STL decomposition and how to do it from scratch. At the end, I will use statsmodel library too, to get the results in seconds. So, STL stands for Seasonal and Trend decomposition using Loess. This is a … Read moreSTL decomposition : How to do it from Scratch?

Managerial Analytics and Data Science

Previously, we learned about two general areas of machine learning: Supervised and Unsupervised learning. Here, we’ll investigate two special fields of machine learning: time series prediction and natural language processing. Time Series Forecasting Time series forecasting refers to any type of supervised Machine Learning where time is an important feature. A good time series forecast … Read moreManagerial Analytics and Data Science

Enter Analytics: From Boot Camp to working in Data Science

We covered a lot in a short amount of time… almost too many topics, actually. Just when you start getting comfortable and ready to do more advanced things, they change the topic. It is really up to you to decide what direction you want to take things outside of the classroom. For example, I am … Read moreEnter Analytics: From Boot Camp to working in Data Science

Teaching A Computer To Land On The Moon

I spent a fair amount of time last year catching up on what’s happening in machine learning. The tools available now are really impressive — you can implement a complex neural net in just a few lines of code now with the libraries that are available. I’ve always been fascinated by the idea of machines … Read moreTeaching A Computer To Land On The Moon

Few tips you can use while collecting data

1- Organize your scripts I consider each scripts executing specific tasks separate to keep my Jupyter Notebook clean and neat… It is important to stay organized while collecting data. That helps find out easily where your mistakes are. Writing one block of code wouldn’t help. I propose two habits to develop: keep the codes commented … Read moreFew tips you can use while collecting data

Beginner’s Guide to Encoding Data

As you can see the Book_Table column has been encoded into numerical values of 0/1. The output of le.fit_transform(df[“Book_Table”]) is a Dataframe/Series depending on no. of columns encoded. Mostly Binary Columns (Book_Table from df)are encoded using Label Encoder. For Multiclass it will give different (0 to n_classes-1) values for different classes eg. 0,1,2,3,….,n-1; which are … Read moreBeginner’s Guide to Encoding Data

How to graph a Bar Chart Race and realize I don’t need one?

I can totally read the chart with hindsight bias, campaign funding money doesn’t predict the performance of the candidate. But other than that, a cool animated bar graph doesn’t tell you that much going forward. After all, if we want to read a truly scientific and analytical piece, making some visualization is far not enough. … Read moreHow to graph a Bar Chart Race and realize I don’t need one?

Building a Pseudorandom Number Generator

This giant formula can be read like this: The probability that an algorithm in the class of probabilistic polynomial time problems (BPP) could distinguish a sequence between a real random source and a PRNG tends to zero faster than any polynomial as the length of the seed increases. Therefore, a PRNG is an algorithm that … Read moreBuilding a Pseudorandom Number Generator

Preventing Data Leakage in Your Machine Learning Model

It goes without saying that knowledge about the dataset you are working with is necessary in order to be able to perform effective analyses and develop sound models. However, what is not often said, with regards to data leakage, is that you should refrain from studying the distributions or basic statistics your dataset until after … Read morePreventing Data Leakage in Your Machine Learning Model

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Florianne Verkroost is a PhD candidate at Nuffield College at the University … Read moreA comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Statistical uncertainty with R and pdqr

General description Statistical estimation usually has the following setup. There is a sample (observed, usually randomly chosen, set of values of measurable quantities) from some general population (whole set of values of the same measurable quantities). We need to make conclusions about the general population based on a sample. This is done by computing summary … Read moreStatistical uncertainty with R and pdqr

Cleaning the Table

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While I’m talking about getting data into R this weekend, here’s … Read moreCleaning the Table

A Hybrid Neural Machine Translation Model (Luong & Manning):

Luong & Manning published a recent paper entitled “Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models.” the contents of which I summarize below. For a quick summary of the current State-of-the-Art in Neural machine Translation (NMT) you can take a quick look at my other post here Currently the general word-based NMTs generate … Read moreA Hybrid Neural Machine Translation Model (Luong & Manning):

Exoplanets III: Habitability and Conclusion

Two possibilities exist: either we are alone in the Universe or we are not. Both are equally terrifying. -Arthur C. Clarke Although the search for other planets is partly motivated by our efforts to understand their formation and to improve the understanding of our own solar system, the ultimate goal is to find extraterrestrial life. … Read moreExoplanets III: Habitability and Conclusion

Exoplanets II: Interpretation of Data

Now that we’ve seen and understood the historical background, the scientific value of the research, and the implications of the discoveries, we will look at the actual data found and compiled by space missions, so that we can relate them to actual physics. By plotting graphs of the data, we can visually see the correlations … Read moreExoplanets II: Interpretation of Data

Exoplanets I: Methods and Discoveries

Mankind has long since speculated about planetary systems other than our own. Philosophers hypothesized centuries ago that our solar system was not unique; that there were in fact countless more that existed in the seemingly limitless ocean of stars. The possibility of life existing on a planet orbiting another star was not just a plausible … Read moreExoplanets I: Methods and Discoveries

How AI Will Redefine Economics

RESOLVING THE PROBLEM OF CAUSATION WITH BIG DATA For decades, economists have made their analyses of the economy based on data sets only as large as their research assistants could handle, hence severely limiting the scope and precision of their work. AI and machine learning will enable economists to dramatically enlarge these data sets and … Read moreHow AI Will Redefine Economics

Credit Card Fraud Detection using Self Organizing FeatureMaps

What are self organising feature maps ? A self-organizing map ( SOM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing … Read moreCredit Card Fraud Detection using Self Organizing FeatureMaps

Multi-Label Text Classification with XLNet

Click here for the Colab notebook accompanying this article. First, let’s install the necessary library, actually just transformers. Next, we import the necessary libraries. Check if the GPU is available. Mount your google drive to your Colab notebook. For our example, we will create a Data folder in our google drive and put the datasets … Read moreMulti-Label Text Classification with XLNet

Nuances in the usage of Word Embeddings: Semantic and Syntactic Relationships

Note: Super short post ahead. Just food for thought I guess? 🙂 In the past weeks, I’ve been writing about Word Embeddings. How I created word embeddings from scratch for a colloquial language such as Singlish, and how I augmented it to handle misspellings or out-of-vocabulary words with translation vectors. In the latter article, I … Read moreNuances in the usage of Word Embeddings: Semantic and Syntactic Relationships