5 Reasons Why I Love Google Colaboratory So Much

Colaboratory comes with pre-loaded libraries. This means that you can start importing libraries into your notebook right away. Available pre-installed packages are; Data Processing Packages: Pandas, Numpy, Scipy, StatsmodelsMachine Learning Packages: TensorFlow, SklearnVisualization Packages: matplotlib, plotly, seaborn Although it covers most of the commonly used machine learning libraries for you, you can install any library … Read more 5 Reasons Why I Love Google Colaboratory So Much

Essentials of Hypothesis Testing and the Mistakes to Avoid

Hypothesis testing is the bedrock of the scientific method and by implication, scientific progress. It allows you to investigate a thing you’re interested in and tells you how surprised you should be about the results. It’s the detective that tells you whether you should continue investigating your theory or divert efforts elsewhere. Does that diet … Read more Essentials of Hypothesis Testing and the Mistakes to Avoid

REDDIT: A one word reason why I support OpenAI’s GPT-2 decision

TLDR: They seeded their webscrape via REDDIT, the mother lode of all ideas tinderboxy and weaponizable. So, it will at the very least be a PR disaster if they release the bigger model. The smaller 117M is nasty as is sans the subtleties! Table of Contents Intro: One paper that I recommend my interns to read … Read more REDDIT: A one word reason why I support OpenAI’s GPT-2 decision

Reinforcement Learning from Scratch: Simple Application and Evaluating Parameters in Detail

Introducing RL Algorithms We have introduced episodes and how to choose actions but we have yet to demonstrate how and algorithm uses this to learn the best actions. Therefore, we will formally define our first RL algorithm, Temporal Difference 0. Temporal Difference — Zero Temporal Difference λ are a family of algorithms depending on the choice of … Read more Reinforcement Learning from Scratch: Simple Application and Evaluating Parameters in Detail

My journey to Performance Analysis 2/2 (HAR files)

This article is to explain mostly the second type of performance analysis I have performed recently. It is a follow up from this article. This article covers mostly the analysis of HAR files and what type of KPI you can retrieve from them. What are HAR files ? Following the W3C github post, HAR stands for HTTP … Read more My journey to Performance Analysis 2/2 (HAR files)

Building a WiFi spots Map of networks around you with WiGLE and R

It’s always fun to explore the world around us — that’s even more fun when you get to explore the world with the Vision of a Data Scientist. In this analysis, We are going to identify the open WiFi Networks around us and map them on an interactive map . Toolkit wiglr R Package for interfacing with WiGLE … Read more Building a WiFi spots Map of networks around you with WiGLE and R

Jupytext 1.0 highlights

In version 1.0 the jupytext command was extended with new modes: –sync to synchronize the multiple representations of a notebook –set-formats (and optionally, –sync), to set or change the pairing of a notebook or a text file –pipe to pipe the text representation of a notebook into another program. Perhaps you would like to reformat … Read more Jupytext 1.0 highlights

People don’t trust AI. We need to change that.

This past week, I had the pleasure of attending and speaking at THINK, IBM’s annual customer and developer conference that brought over 25,000 attendees to San Francisco. Of course, it’s next to impossible to sum up an event of that scale in a simple blog post — but I’d like to share a few key ideas that … Read more People don’t trust AI. We need to change that.

Pandas Index Explained

The following Notebook is very easy to follow and also has small tips and tricks to make daily work a little better. adult = pd.read_csv(“https://archive.ics.uci.edu/ml/machine- learning-databases/adult/adult.data”, names = [‘age’,’workclass’,’fnlwgt’, ‘education’, ‘education_num’,’marital_status’,’occupation’,’relationship’,’race’,’sex’,’capital_gain’,’capital_loss’, ‘hours_per_week’, ‘native_country’,’label’], index_col = False) print(“Shape of data{}”.format(adult.shape)) adult.head() Dataset has 32561 rows and 15 features, the leftmost series 0,1 2,3 … is index. Let’s … Read more Pandas Index Explained

Thinking of Self-Studying Machine Learning? Remind yourself of these 6 things

We were hosting a Meetup on robotics in Australia and it was question time. Someone asked a question. “How do I get into artificial intelligence and machine learning from a different background?” Nick turned and called my name. “Where’s Dan Bourke?” I was backstage and talking to Alex. I walked over. “Here he is,” Nick … Read more Thinking of Self-Studying Machine Learning? Remind yourself of these 6 things

Deeper Dive into Finding Similar Faces with Spotify’s Annoy, Tensorflow, and Pytorch

In my previous post I teased that I had jumped down a rabbit hole to try and improve my Fate Grand Order facial similarity pipeline where I was making use of Tensorflow object detectors, Pytorch feature extractors, and Spotify’s approximate nearest neighbor library (annoy). The general idea that I was running with was that I … Read more Deeper Dive into Finding Similar Faces with Spotify’s Annoy, Tensorflow, and Pytorch

AI — The End of the WoRLd?

Understanding Reinforcement Learning Intuitively Let’s say you picked up your first ever arcade game with no idea how to play it. No one really reads instructions so your learning process is basically button-mashing and seeing what happens to your character on the screen. You figure out what to do without any prior knowledge or intended … Read more AI — The End of the WoRLd?

A quick response to Genevera Allen about Machine learning ‘causing science crisis’

What’s next? Be careful on what you are reading. I’m not saying I have the truth in my hands, I’m just giving my opinion from someone who’s been applying machine learning in production as a data scientist and it’s working on making it more reproducible, transparent and powerful. Thanks to Genevera for bringing this important … Read more A quick response to Genevera Allen about Machine learning ‘causing science crisis’

How to Perform Exploratory Data Analysis with Seaborn

Data Preparation Data preparation is the first step of any data analysis to ensure data is cleaned and transformed in a form that can be analyzed. We will be performing EDA on the Ames Housing dataset. This dataset is popular among those beginning to learn data science and machine learning as it contains data about … Read more How to Perform Exploratory Data Analysis with Seaborn

ML Algorithms: One SD (σ)- Bayesian Algorithms

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of … Read more ML Algorithms: One SD (σ)- Bayesian Algorithms

A graduate student’s perspective on statistics

When I’m meeting someone new, and the inevitable question is asked, “So, what do you do?”, I respond that I’m a Ph.D. student in statistics, to which the response is a majority of the time some variation of this: “Statistics?! I had one required statistics class in undergrad, and it was so confusing.” The mere … Read more A graduate student’s perspective on statistics

Low-Cost Cell Biology Experiments for Data Scientists

Part 3: Analyzing image data Deep learning algorithms are exciting in part because of their potential to automate biomedical research tasks [6,7]. For example, deep learning algorithms can be used to automate the time-consuming process of manually counting mitotic structures in breast histopathology images [8,9]. Differences in the rates of cellular division and differences in … Read more Low-Cost Cell Biology Experiments for Data Scientists

Analyzing my weight loss journey with machine learning

How I built a logistic regression classifier from scratch with Python to predict whether I will lose weight or not Please feel free to visit the Github page for this project to access the data, visualizations, and Jupyter notebook that I used to analyze my data. Background I started my weight loss journey at the start … Read more Analyzing my weight loss journey with machine learning

Role of Data Science in Artificial Intelligence

Steve Urkel and Urkelbot, whose intelligence doubled every 2 minutes. Image Credit: ABC’s Family Matters The age of spreadsheet is over. A google search, a passport scan, your online shopping history, a tweet. All of these contain data that can be collected, analyzed, and monetized. Supercomputer and algorithms allow us to make sense of an increasingly … Read more Role of Data Science in Artificial Intelligence

5 Lines of Code to Convince You to Learn R

A brief treatise for those on the fence All of the code supporting this article can be forklifted from this MatrixDS project. Some good advice for data scientists (or really anyone). If you write the same code more than once; create a function. If you give the same advice more than once; write a blog post. … Read more 5 Lines of Code to Convince You to Learn R

What my first Silver Medal taught me about Text Classification and Kaggle in general?

Sailing through the world of Kaggle Kaggle is an excellent place for learning. And I learned a lot of things from the recently concluded competition on Quora Insincere questions classification in which I got a rank of 182/4037 In this post, I will try to provide a summary of the things I tried. I will also … Read more What my first Silver Medal taught me about Text Classification and Kaggle in general?

10 Lessons Learned from Scraping Websites

Valuable insights which I gained from retrieving data from many websites over the last years which I want to share with you “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; … Read more 10 Lessons Learned from Scraping Websites

Sentiment Analysis using LSTM Step-by-Step

1) Load in and visualize the data We are using IMDB movies review dataset. If it is stored in your machine in a txt file then we just load it in # read data from text fileswith open(‘data/reviews.txt’, ‘r’) as f:reviews = f.read()with open(‘data/labels.txt’, ‘r’) as f:labels = f.read() print(reviews[:50])print()print(labels[:26]) — Output — bromwell high … Read more Sentiment Analysis using LSTM Step-by-Step

Why Strategy and Analytics (Together) are the Future of AI

…And how AI will thus kill data visualization tools In my last company, I led the development of a major data visualization addition to our core product. It allowed our product to be used for the first time by the C-Suite of our client base and that alone raised our price point considerably. But one … Read more Why Strategy and Analytics (Together) are the Future of AI

Ego Network Analysis for the Detection of Fake News

Using a combination of network analysis and natural language processing to determine the sources of “fake news” on Twitter Twitter network of verified users with over 1 million followers. Circles (nodes) represent users and the lines connecting the circles represent one user “following” another. Colors represent classes determined through modularity clustering. While “Fake News” has existed … Read more Ego Network Analysis for the Detection of Fake News

Customer Segmentation Analysis with Python

In this article I’ll explore a data set on mall customers to try to see if there are any discernible segments and patterns. Customer segmentation is useful in understanding what demographic and psychographic sub-populations there are within your customers in a business case. By understanding this, you can better understand how to market and serve … Read more Customer Segmentation Analysis with Python

Jump Out of the Jupyter Notebook with nbconvert

Easily Convert Notebooks to Python Scripts and Sharable Files If you’re a data scientist, nbconvert is a great tool to add to your tool belt. With nbconvert you can easily turn your Jupyter Notebook into a Python script from the command line. It also allows you to turn your Jupyter notebook into share-friendly formats like .html and.pdf … Read more Jump Out of the Jupyter Notebook with nbconvert

OpenAI’s GPT-2: the model, the hype, and the controversy

Last Thursday, OpenAI released a very large language model called GPT-2. This model can generate realistic text in a variety of styles, from news articles to fan fiction, based off some seed text. Controversially, they decided not to release the data or the parameters of their biggest model, citing concerns about potential abuse. It’s been … Read more OpenAI’s GPT-2: the model, the hype, and the controversy

Time Series Forecasting with Prophet

Learn how to use Facebook’s Prophet to predict air quality Photo by Frédéric Paulussen on Unsplash Producing high quality forecasts is hard for many machine learning engineers. It requires a substantial amount of experience and and very specific skills. Also, other forecasting tools were too inflexible to incorporate useful assumptions. For those reasons, Facebook open sourced Prophet, … Read more Time Series Forecasting with Prophet

How to PyTorch in Production

Photo by Sharon McCutcheon on Unsplash ML is fun, ML is popular, ML is everywhere. Most of the companies use either TensorFlow or PyTorch. There are some oldfags who prefer caffe, for instance. Mostly it’s all about Google vs Facebook battle. Most of my experience goes to PyTorch, even though most of the tutorials and online … Read more How to PyTorch in Production

Setting up Email Updates for Your Scraper using Python and a Gmail Account

Photo by Jamie Street on Unsplash Very often when building web scrapers to collect data, you’ll run into one of these situations: You want to send the program’s results to someone else You’re running the script on a remote server and you want automatic, real-time reports on results (e.g. updates on price information from an online … Read more Setting up Email Updates for Your Scraper using Python and a Gmail Account

Building a model? Here’s the first question you should ask

Whether your model is meant to be explanatory or predictive has profound implications for its design Someone somewhere right now is building a model. Many, many people in fact. Whether for a business, an academic study or even personal interest, people have been using mathematics more and more to model real world phenomena in order to … Read more Building a model? Here’s the first question you should ask

Guide to choosing Hyperparameters for your Neural Networks

https://www.wired.com/2016/12/2016-year-deep-learning-took-internet/ In recent times, Deep Learning has created a significant impact in the field of computer vision, natural language processing, and speech recognition. Due to the large amounts of data being generated day after day, it could be used to train Deep Neural Networks and is preferred over traditional Machine Learning algorithms for higher performance … Read more Guide to choosing Hyperparameters for your Neural Networks

Thou Shalt Not Fear Automatons

TL;DR The imminent danger with Artificial Intelligence has nothing to do with machines becoming too intelligent. It has to do with machines inheriting the stupidity of people. Background Whom should we believe about the magnitude of change that comes with Artificial Intelligence? Andrew Ng when he says AI is the next electricity, or the Francois … Read more Thou Shalt Not Fear Automatons

On how I acknowledge human based bias and how to handle it

Since 2016 I have been working on the Antispam team of a dating and social platform, where my goal is to build solutions to detect spammers and avoid the proliferation of them. In the beginning of my career at the company, I had entirely no knowledge about our users (as expected); I did not fully … Read more On how I acknowledge human based bias and how to handle it

Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders

Now, let’s take a look at how we can resist adversarial samples via thresholding. Suppose the input image (from class 1) has been adversarially perturbed to fool the encoder into believing that it belongs to class 0. This implies that the latent encoding of the input image must lie within the cluster for 0’s. Although … Read more Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders

Convolutional Neural Networks

Researchers came up with the concept of CNN or Convolutional Neural Network while working on image processing algorithms. Traditional fully connected networks were kind of a black box — that took in all of the inputs and passed through each value to a dense network that followed into a one hot output. That seemed to work with … Read more Convolutional Neural Networks

Comparing common analysis strategies for repeated measures data

Dealing with dependencies in data. What is this all about? My hope with this post is to provide a conceptual overview of how to deal with a specific type of dataset commonly encountered in the social sciences (and very common in my own disciplines of experimental psychology and cognitive neuroscience). My goal is not to provide mathematical … Read more Comparing common analysis strategies for repeated measures data

What to do when your data fails OLS Regression assumptions

Regression analysis falls in the realm of inferential statistics. Consider the following equation: y ≈ β0 + β1x + e The approximate equals sign indicates that there is an approximate linear relationship between x and y. The error e term indicates this model isn’t going to fully reflect reality via a simple linear relation. The … Read more What to do when your data fails OLS Regression assumptions

Progressively-Growing GANs

The Progressively-Growing GAN architecture released from NVIDIA and published at ICLR 2018 has become the primary display of impressive GAN image synthesis. Classically, GANs have struggled to output low- and mid- resolution images such as 32² (CIFAR-10) and 128² (ImageNet), but this GAN model was able to generate high-resolution facial images at 1024². 1024 x … Read more Progressively-Growing GANs

Using Machine Learning to Identify the Minerals in Meteorites

How Meteorites are Studied The scientists scan meteorites using an electron microprobe (EMP). An EMP shoots a beam of electrons at the meteorite. When the beam of electrons collides with the atoms in the meteorite, the atoms emit x-rays. Each element has a distinct, characteristic frequencies. A graph of characteristic frequencies of different elements. (https://commons.wikimedia.org/wiki/File:XRFScan.jpg) Before … Read more Using Machine Learning to Identify the Minerals in Meteorites

Word2vec from Scratch with NumPy

How to implement a Word2vec model with Python and NumPy Introduction Recently, I have been working with several projects related to NLP at work. Some of them had something to do with training the company’s in-house word embedding. At work, the tasks were mostly done with the help of a Python library: gensim. However, I decided … Read more Word2vec from Scratch with NumPy

Review: MultiChannel — Segment Colon Histology Images (Biomedical Image Segmentation)

Foreground Segmentation using FCN + Edge Detection Using HED + Object Detection Using Faster R-CNN Gland Haematoxylin and Eosin (H&E) stained slides and ground truth labels Foreground Segmentation using FCN + Edge Detection Using HED + Object Detection Using Faster R-CNN In this story, MultiChannel is briefly reviewed. It is a Deep MultiChannel Neural Networks used for gland … Read more Review: MultiChannel — Segment Colon Histology Images (Biomedical Image Segmentation)

RTX 2060 Vs GTX 1080Ti in Deep Learning GPU Benchmarks: Cheapest RTX vs. Most Expensive GTX card.

Less than a year ago, with its GP102 chip + 3584 CUDA Cores + 11GB of VRAM, the GTX 1080Ti was the apex GPU of last-gen Nvidia Pascal range (bar the Titan editions).The demand was so high that retail prices often exceeded $900, way above the official $699 MSRP. In Fall 2018, Nvidia launched its … Read more RTX 2060 Vs GTX 1080Ti in Deep Learning GPU Benchmarks: Cheapest RTX vs. Most Expensive GTX card.

Understanding Semantic Segmentation with UNET

A Salt Identification Case Study Table of Contents: Introduction Prerequisites What is Semantic Segmentation? Applications Business Problem Understanding the data Understanding Convolution, Max Pooling and Transposed Convolution UNET Architecture and Training Inference Conclusion References 1. Introduction Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from … Read more Understanding Semantic Segmentation with UNET

Profiling my Favorite Songs on Spotify through clustering

Music plays an integral part of most of our lives. It is the common language that helps us express ourselves when no words could describe how we feel. Music also helps us set the mood. It affects our soul and our emotion, making us feel happy, sad or energetic. We will probably be playing songs … Read more Profiling my Favorite Songs on Spotify through clustering

Docker for python development?

Part 1 covered what is docker. In this article, we’ll be talking about how to start using Docker for python development. A standard python installation involves setting up environment variables and if you’re dealing with different versions of python, there are tons of environment variables to be dealt with be it Windows or Linux. And … Read more Docker for python development?