Ever feel like you destroyed a job interview and then ended up not getting the job. Or how about completely bombing a technical screen and still passing onto the next round? You’re not alone, hiring standards are confusing at best but it still begs the question: how do you know how well you’re performing in … Read moreWhere do you fall on the data science distribution?
Whatever happened to “majority rule”? A continual source of democratic frustration today is that public opinion does not seem to directly translate into public policy. For example, a large majority of Americans want to see campaign finance reform, background checks for gun ownership, and reductions in fossil fuel consumption. Yet, while overwhelming public support has … Read moreAmerica’s Clustered Consensus
There has been a lot of buzz about Opportunity Zones recently and understandably so; it is the newest federal effort to create long-term investments in low-income urban and rural census tract areas. Once designated as a qualified Opportunity Zone, these places are able to receive investments through Opportunity Funds, which are created specifically to invest … Read moreWhat I Discovered About Opportunity Zones From Analyzing Half a Million Data Points
Dramatic visualization of 2016 eviction filings by state at the National Building Museum From April 2018 through May 2019, a gallery wing of the D.C. National Building Museum was transformed into a labyrinth of forbidding plywood structures, towering piles of shrinkwrapped home furnishings, and striking visual displays. The exhibition showcased the statistics and stories revealed … Read moreData Science of Evictions
AI that does AI — Develop your first model today. Anyone can be a Data Science. No Coding Required. Photo by Frank Albrecht on Unsplash In today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you … Read moreH2O Driverless AI: Data Science without Coding
Grace Hopper, Ph.D. (Vassar Archives) Doing a small part to help close a gender gap Few, if any, of my classmates shared my fascination with the Mark I Computer that was on display in our university’s Science Center. It is hard to blame them. Towering at 8 feet and filled with rotary switches, crystal diodes, … Read moreWhy I Donate All of My Book’s Proceeds to Girls Who Code
The implementation part is broken down into a series of tasks from loading data to defining and training adversarial networks. At the end of this section, you’ll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look fairly like realistic faces with small amounts of … Read moreFake Face Generator Using DCGAN Model
For the better part of a year, OpenAI’s GPT-2 has been one of the hottest topics in machine learning — and for good reason. The text generating model, which initially was dubbed “too dangerous” to be released in full, is capable of producing uncanny outputs. If you haven’t seen any examples, I recommend looking at … Read moreDeploy A Text Generating API With Hugging Face’s DistilGPT-2
The second is the same information, laid out the way parse trees are visualised in spaCy. And the third is a slightly different illustration of the same tree, which reads easily top to bottom. Here are four things to note about parse trees:1. Each word in the sentence is a node (=point) in the graph. … Read moreGetting to grips with parse trees
Getting the right configurations, making sure it is secured, ensuring resource access through endpoints and having a pretty rendering, … all of them made easy thanks to AWS! As a machine-learning engineer, I never really faced the issue of putting my algorithms out there myself. Well, that was until recently, when I decided to start … Read moreFrom Dev to Prod – All you need to know to get your Flask application running on AWS
Based on medical diagnostic measurements Python codes are available: https://github.com/JNYH/diabetes_classifier The Pima Indians of Arizona and Mexico have the highest reported prevalence of diabetes of any population in the world. A small study has been conducted to analyse their medical records to assess if it is possible to predict the onset of diabetes based on … Read moreBuilding a machine learning classifier model for diabetes
So what? The cost scales exponentially and unpredictably. The example we shared is just to manage one model, for one business line, and for one model cycle (a different issue may happen in the future). Now, imagine scaling this process to hundreds of models for multiple business units and functions. The bottom line: companies cannot … Read moreAiPM
A story of data, taste, and a very effective recommender system. Just a short few days ago, I was discussing the impact of recommender systems with some students on a course I’m teaching. Netflix, Amazon, Facebook, and many other online services, use our data to suggest other products we might like. Is this helpful, or … Read moreHow Spotify Recommends Your New Favorite Artist
With Canada’s 43rd Federal Election not too far in the rearview mirror, we at ThinkData Works were curious as to what we can learn about our most recent election by stepping back from the punditry and analyzing some data. After all, using government data is a great way to understand how our government works. There … Read moreWhat do campaign contributions tell us about the federal election?
Connecting the story behind each stone (the understanding you gained from them) to generate a grand story or theory is called fitting a model in technical terms. So, for instance, you coming up with a theory or a reason behind why your air conditioner is not cooling the room as usual, is actually you fitting … Read moreWhat is Learning in Machine Learning?
With the launch of Streamlit, developing a dashboard for your machine learning solution has been made incredibly easy. Streamlit is an open source app framework specifically designed for ML engineers working with Python. It allows you to create a stunning looking application with only a few lines of code. I want to take this opportunity … Read moreQuickly Build and Deploy an Application with Streamlit
What’s the most difficult question you ever encountered in a data science interview? I’ll share mine: “How many years of experience do you have in language X?” This is really hard to answer: Do I count the years I used it in academia? Do I count the years I used it in my hobby projects? … Read moreThe hardest question you’ve been asked in a data science interview
I have created this impressive ML model, it gives 90% accuracy, but it takes around 10 seconds to fetch a prediction. Is that acceptable? For some use-cases maybe, but really no. In the past, there have been many Kaggle competitions whose winners ended up creating monster ensembles to take the top spots on the leaderboard. … Read moreTake your Machine Learning Models to Production with these 5 simple steps
Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. — Wikipedia Photo by Anthony Cantin on Unsplash (This is the second … Read moreFull Stack Development Tutorial: Serverless REST API running on AWS Lambda
Neural Networks for labeling, compression, effects and more! You can read the article and follow along with the code in the repo: Poseyy/AI-Photography You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or… github.com An obvious application of AI to photography is … Read moreHow I Use AI Across One of My Favorite Hobbies — Photography
A constant presence on today’s internet are ads. They power Google and Facebook and follow us everywhere. As with all marketing spend, they’re an investment. As with most investments, it’s crucial to measure their return. What makes online marketing different is the unprecedented possibility of building accurate measurement tools. In this post I’ll describe a … Read moreOnline Marketing Measurement: Which Half?
Web Scraping I used BeautifulSoup and Selenium in parallel to scrape 3 months of hotel listing information from Hotel.com. Some of the information I scraped were the checkin and checkout dates, number of adults and children, distance to city and convention centers, hotel addresses, hotel reviews and ratings, TripAdvisor’s ratings and reviews, hotel amenities, and … Read moreHow to Identify Hotel Deals — Using Machine Learning
Figure out what STL decomposition is and how it works. This article will help you understand what is STL decomposition and how to do it from scratch. At the end, I will use statsmodel library too, to get the results in seconds. So, STL stands for Seasonal and Trend decomposition using Loess. This is a … Read moreSTL decomposition : How to do it from Scratch?
Previously, we learned about two general areas of machine learning: Supervised and Unsupervised learning. Here, we’ll investigate two special fields of machine learning: time series prediction and natural language processing. Time Series Forecasting Time series forecasting refers to any type of supervised Machine Learning where time is an important feature. A good time series forecast … Read moreManagerial Analytics and Data Science
We covered a lot in a short amount of time… almost too many topics, actually. Just when you start getting comfortable and ready to do more advanced things, they change the topic. It is really up to you to decide what direction you want to take things outside of the classroom. For example, I am … Read moreEnter Analytics: From Boot Camp to working in Data Science
I spent a fair amount of time last year catching up on what’s happening in machine learning. The tools available now are really impressive — you can implement a complex neural net in just a few lines of code now with the libraries that are available. I’ve always been fascinated by the idea of machines … Read moreTeaching A Computer To Land On The Moon
1- Organize your scripts I consider each scripts executing specific tasks separate to keep my Jupyter Notebook clean and neat… It is important to stay organized while collecting data. That helps find out easily where your mistakes are. Writing one block of code wouldn’t help. I propose two habits to develop: keep the codes commented … Read moreFew tips you can use while collecting data
As you can see the Book_Table column has been encoded into numerical values of 0/1. The output of le.fit_transform(df[“Book_Table”]) is a Dataframe/Series depending on no. of columns encoded. Mostly Binary Columns (Book_Table from df)are encoded using Label Encoder. For Multiclass it will give different (0 to n_classes-1) values for different classes eg. 0,1,2,3,….,n-1; which are … Read moreBeginner’s Guide to Encoding Data
I can totally read the chart with hindsight bias, campaign funding money doesn’t predict the performance of the candidate. But other than that, a cool animated bar graph doesn’t tell you that much going forward. After all, if we want to read a truly scientific and analytical piece, making some visualization is far not enough. … Read moreHow to graph a Bar Chart Race and realize I don’t need one?
This giant formula can be read like this: The probability that an algorithm in the class of probabilistic polynomial time problems (BPP) could distinguish a sequence between a real random source and a PRNG tends to zero faster than any polynomial as the length of the seed increases. Therefore, a PRNG is an algorithm that … Read moreBuilding a Pseudorandom Number Generator
It goes without saying that knowledge about the dataset you are working with is necessary in order to be able to perform effective analyses and develop sound models. However, what is not often said, with regards to data leakage, is that you should refrain from studying the distributions or basic statistics your dataset until after … Read morePreventing Data Leakage in Your Machine Learning Model
Photo by Shiv Prasad on Unsplash Every day I hear a lot of stories from my friends and colleagues at office about their hard time finding a Flat / Apartment for rent in Hyderabad, they mostly being either the flats are not open for the bachelors or the rents being too high in a given … Read moreHyderabad Housing Prices.
Luong & Manning published a recent paper entitled “Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models.” the contents of which I summarize below. For a quick summary of the current State-of-the-Art in Neural machine Translation (NMT) you can take a quick look at my other post here Currently the general word-based NMTs generate … Read moreA Hybrid Neural Machine Translation Model (Luong & Manning):
Two possibilities exist: either we are alone in the Universe or we are not. Both are equally terrifying. -Arthur C. Clarke Although the search for other planets is partly motivated by our efforts to understand their formation and to improve the understanding of our own solar system, the ultimate goal is to find extraterrestrial life. … Read moreExoplanets III: Habitability and Conclusion
Now that we’ve seen and understood the historical background, the scientific value of the research, and the implications of the discoveries, we will look at the actual data found and compiled by space missions, so that we can relate them to actual physics. By plotting graphs of the data, we can visually see the correlations … Read moreExoplanets II: Interpretation of Data
Mankind has long since speculated about planetary systems other than our own. Philosophers hypothesized centuries ago that our solar system was not unique; that there were in fact countless more that existed in the seemingly limitless ocean of stars. The possibility of life existing on a planet orbiting another star was not just a plausible … Read moreExoplanets I: Methods and Discoveries
RESOLVING THE PROBLEM OF CAUSATION WITH BIG DATA For decades, economists have made their analyses of the economy based on data sets only as large as their research assistants could handle, hence severely limiting the scope and precision of their work. AI and machine learning will enable economists to dramatically enlarge these data sets and … Read moreHow AI Will Redefine Economics
What are self organising feature maps ? A self-organizing map ( SOM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing … Read moreCredit Card Fraud Detection using Self Organizing FeatureMaps
Click here for the Colab notebook accompanying this article. First, let’s install the necessary library, actually just transformers. Next, we import the necessary libraries. Check if the GPU is available. Mount your google drive to your Colab notebook. For our example, we will create a Data folder in our google drive and put the datasets … Read moreMulti-Label Text Classification with XLNet
Note: Super short post ahead. Just food for thought I guess? 🙂 In the past weeks, I’ve been writing about Word Embeddings. How I created word embeddings from scratch for a colloquial language such as Singlish, and how I augmented it to handle misspellings or out-of-vocabulary words with translation vectors. In the latter article, I … Read moreNuances in the usage of Word Embeddings: Semantic and Syntactic Relationships
This is definitely number 1 in my list. There is nothing more frustrating than opening the project’s code, compiling it, or running a test locally and having to wait more than a few seconds for it to come back. Developers work in haste, so any ineffective tools that delay them, even in the slightest, are … Read moreShiny Happy … Developers
The components that most machine learning algorithms have in common. Photo by Dan Gold on Unsplash What’s a cost function, optimization, a model, or an algorithm? The esoteric nuances of machine learning algorithms and terminology can easily overwhelm the machine learning novice. As I was reading the Deep Learning book by Yoshua Bengio, Aaron Courville, … Read moreThe ‘Ingredients’ of Machine Learning Algorithms
CVPR 2017, Best Paper Award winner Dense connections “Simple models and a lot of data trump more elaborate models based on less data. “ — Peter Norvig ‘Densely Connected Convolutional Networks’ received the Best Paper Award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. The paper can be read here. The … Read morePaper review: DenseNet -Densely Connected Convolutional Networks
In this post, we will explain the bias-variance tradeoff, a fundamental concept in Machine Learning, and show what it means in practice. We will show that the mean squared error of an unseen (test) point is a result of two competing forces (bias/variance) and the inherent noise in the problem itself. We often see in … Read moreThe Bias-Variance Tradeoff
For the purposes of this tutorial, we are going to make a plot to visualize the passengers volume for the busiest airports in my country, Greece, and the neighbor country, Turkey, for comparison reasons. First, we need to import the libraries and the methods we are about to use. import pandas as pdimport numpy as … Read moreHow to visualize data on top of a map in python using the geoviews library
According to wikipedia. In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. Translation: If you take enough samples from a population, the … Read moreA brief intro to the Central Limit Theorem
Source: https://pixabay.com/photos/library-books-education-literature-869061/ TF IDF is a natural language processing technique useful for the extraction of important keywords within a set of documents or chapters. The acronym stands for “term frequency-inverse document frequency” and describes how the algorithm works. The dataset As our dataset, we shall take the script of Mary Shelley’s Frankenstein (provided by Project … Read moreUsing TF-IDF to form descriptive chapter summaries via keyword extraction.
In this article, you’ll learn how to tailor pandas API to your business, research, or personal workflow using by using pandas_flavour. Pandas-flavor is a library that introduces API for extending Pandas. This API handled the boilerplate code for registering custom accessors onto Pandas objects. There are plenty of examples of extensions in the wild including: … Read moreThe Easy Way to Extend Pandas API
As I mentioned in my previous post, cleaning data is a prerequisite to machine learning. Measuring the sanity of your data can also give you a good indication of how precise or accurate your model would be. When it comes to web-scraped data, you would often lose a lot of information in the process of … Read moreCleaning Web-Scraped Data with Pandas (Part II)
The art of creating self-imposed deadlines is crucial not only to go above and beyond meeting requirements, but also to make our working progress in achieving small and big goals a lot smoother. Love them or hate them; they are incredibly motivational deadlines! However, here are two things we should try to avoid when dealing … Read moreUtilize Your Self-Imposed Deadlines | Punch Today in The Face