Jupytext 1.0 highlights

In version 1.0 the jupytext command was extended with new modes: –sync to synchronize the multiple representations of a notebook –set-formats (and optionally, –sync), to set or change the pairing of a notebook or a text file –pipe to pipe the text representation of a notebook into another program. Perhaps you would like to reformat … Read more

People don’t trust AI. We need to change that.

This past week, I had the pleasure of attending and speaking at THINK, IBM’s annual customer and developer conference that brought over 25,000 attendees to San Francisco. Of course, it’s next to impossible to sum up an event of that scale in a simple blog post — but I’d like to share a few key ideas that … Read more

Pandas Index Explained

The following Notebook is very easy to follow and also has small tips and tricks to make daily work a little better. adult = pd.read_csv(“https://archive.ics.uci.edu/ml/machine- learning-databases/adult/adult.data”, names = [‘age’,’workclass’,’fnlwgt’, ‘education’, ‘education_num’,’marital_status’,’occupation’,’relationship’,’race’,’sex’,’capital_gain’,’capital_loss’, ‘hours_per_week’, ‘native_country’,’label’], index_col = False) print(“Shape of data{}”.format(adult.shape)) adult.head() Dataset has 32561 rows and 15 features, the leftmost series 0,1 2,3 … is index. Let’s … Read more

Moore’s law is dead

We are accustomed to thinking that computer speed doubles every 18 months as predicted by Moore’s law. Indeed for the past 50 years that was the case. However Moore’s law is coming to an end due to technical obstacles. What alternatives do we have? Intel 4004 Gordon Moore predicted in 1965 that the number of … Read more

Thinking of Self-Studying Machine Learning? Remind yourself of these 6 things

We were hosting a Meetup on robotics in Australia and it was question time. Someone asked a question. “How do I get into artificial intelligence and machine learning from a different background?” Nick turned and called my name. “Where’s Dan Bourke?” I was backstage and talking to Alex. I walked over. “Here he is,” Nick … Read more

Deeper Dive into Finding Similar Faces with Spotify’s Annoy, Tensorflow, and Pytorch

In my previous post I teased that I had jumped down a rabbit hole to try and improve my Fate Grand Order facial similarity pipeline where I was making use of Tensorflow object detectors, Pytorch feature extractors, and Spotify’s approximate nearest neighbor library (annoy). The general idea that I was running with was that I … Read more

AI — The End of the WoRLd?

Understanding Reinforcement Learning Intuitively Let’s say you picked up your first ever arcade game with no idea how to play it. No one really reads instructions so your learning process is basically button-mashing and seeing what happens to your character on the screen. You figure out what to do without any prior knowledge or intended … Read more

A quick response to Genevera Allen about Machine learning ‘causing science crisis’

What’s next? Be careful on what you are reading. I’m not saying I have the truth in my hands, I’m just giving my opinion from someone who’s been applying machine learning in production as a data scientist and it’s working on making it more reproducible, transparent and powerful. Thanks to Genevera for bringing this important … Read more

ML Algorithms: One SD (σ)- Bayesian Algorithms

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of … Read more

A graduate student’s perspective on statistics

When I’m meeting someone new, and the inevitable question is asked, “So, what do you do?”, I respond that I’m a Ph.D. student in statistics, to which the response is a majority of the time some variation of this: “Statistics?! I had one required statistics class in undergrad, and it was so confusing.” The mere … Read more

Low-Cost Cell Biology Experiments for Data Scientists

Part 3: Analyzing image data Deep learning algorithms are exciting in part because of their potential to automate biomedical research tasks [6,7]. For example, deep learning algorithms can be used to automate the time-consuming process of manually counting mitotic structures in breast histopathology images [8,9]. Differences in the rates of cellular division and differences in … Read more

Role of Data Science in Artificial Intelligence

Steve Urkel and Urkelbot, whose intelligence doubled every 2 minutes. Image Credit: ABC’s Family Matters The age of spreadsheet is over. A google search, a passport scan, your online shopping history, a tweet. All of these contain data that can be collected, analyzed, and monetized. Supercomputer and algorithms allow us to make sense of an increasingly … Read more

5 Lines of Code to Convince You to Learn R

A brief treatise for those on the fence All of the code supporting this article can be forklifted from this MatrixDS project. Some good advice for data scientists (or really anyone). If you write the same code more than once; create a function. If you give the same advice more than once; write a blog post. … Read more

10 Lessons Learned from Scraping Websites

Valuable insights which I gained from retrieving data from many websites over the last years which I want to share with you “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; … Read more

What my first Silver Medal taught me about Text Classification and Kaggle in general?

Sailing through the world of Kaggle Kaggle is an excellent place for learning. And I learned a lot of things from the recently concluded competition on Quora Insincere questions classification in which I got a rank of 182/4037 In this post, I will try to provide a summary of the things I tried. I will also … Read more

What should I Read Next?

Series, New Authors, and My Ratings I noticed as I ran the algorithm that books that were part of a Series dominated the results even as I played around with the inputs. It’s logical — when someone reads and enjoys book 1 in a series, they’re likely to read book 2. If I’ve read book 1 too, … Read more

Sentiment Analysis using LSTM Step-by-Step

1) Load in and visualize the data We are using IMDB movies review dataset. If it is stored in your machine in a txt file then we just load it in # read data from text fileswith open(‘data/reviews.txt’, ‘r’) as f:reviews = f.read()with open(‘data/labels.txt’, ‘r’) as f:labels = f.read() print(reviews[:50])print()print(labels[:26]) — Output — bromwell high … Read more

Ego Network Analysis for the Detection of Fake News

Using a combination of network analysis and natural language processing to determine the sources of “fake news” on Twitter Twitter network of verified users with over 1 million followers. Circles (nodes) represent users and the lines connecting the circles represent one user “following” another. Colors represent classes determined through modularity clustering. While “Fake News” has existed … Read more

Customer Segmentation Analysis with Python

In this article I’ll explore a data set on mall customers to try to see if there are any discernible segments and patterns. Customer segmentation is useful in understanding what demographic and psychographic sub-populations there are within your customers in a business case. By understanding this, you can better understand how to market and serve … Read more

Jump Out of the Jupyter Notebook with nbconvert

Easily Convert Notebooks to Python Scripts and Sharable Files If you’re a data scientist, nbconvert is a great tool to add to your tool belt. With nbconvert you can easily turn your Jupyter Notebook into a Python script from the command line. It also allows you to turn your Jupyter notebook into share-friendly formats like .html and.pdf … Read more

OpenAI’s GPT-2: the model, the hype, and the controversy

Last Thursday, OpenAI released a very large language model called GPT-2. This model can generate realistic text in a variety of styles, from news articles to fan fiction, based off some seed text. Controversially, they decided not to release the data or the parameters of their biggest model, citing concerns about potential abuse. It’s been … Read more

Time Series Forecasting with Prophet

Learn how to use Facebook’s Prophet to predict air quality Photo by Frédéric Paulussen on Unsplash Producing high quality forecasts is hard for many machine learning engineers. It requires a substantial amount of experience and and very specific skills. Also, other forecasting tools were too inflexible to incorporate useful assumptions. For those reasons, Facebook open sourced Prophet, … Read more

How to PyTorch in Production

Photo by Sharon McCutcheon on Unsplash ML is fun, ML is popular, ML is everywhere. Most of the companies use either TensorFlow or PyTorch. There are some oldfags who prefer caffe, for instance. Mostly it’s all about Google vs Facebook battle. Most of my experience goes to PyTorch, even though most of the tutorials and online … Read more

Setting up Email Updates for Your Scraper using Python and a Gmail Account

Photo by Jamie Street on Unsplash Very often when building web scrapers to collect data, you’ll run into one of these situations: You want to send the program’s results to someone else You’re running the script on a remote server and you want automatic, real-time reports on results (e.g. updates on price information from an online … Read more

Building a model? Here’s the first question you should ask

Whether your model is meant to be explanatory or predictive has profound implications for its design Someone somewhere right now is building a model. Many, many people in fact. Whether for a business, an academic study or even personal interest, people have been using mathematics more and more to model real world phenomena in order to … Read more

Guide to choosing Hyperparameters for your Neural Networks

https://www.wired.com/2016/12/2016-year-deep-learning-took-internet/ In recent times, Deep Learning has created a significant impact in the field of computer vision, natural language processing, and speech recognition. Due to the large amounts of data being generated day after day, it could be used to train Deep Neural Networks and is preferred over traditional Machine Learning algorithms for higher performance … Read more

Thou Shalt Not Fear Automatons

TL;DR The imminent danger with Artificial Intelligence has nothing to do with machines becoming too intelligent. It has to do with machines inheriting the stupidity of people. Background Whom should we believe about the magnitude of change that comes with Artificial Intelligence? Andrew Ng when he says AI is the next electricity, or the Francois … Read more

Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders

Now, let’s take a look at how we can resist adversarial samples via thresholding. Suppose the input image (from class 1) has been adversarially perturbed to fool the encoder into believing that it belongs to class 0. This implies that the latent encoding of the input image must lie within the cluster for 0’s. Although … Read more

Convolutional Neural Networks

Researchers came up with the concept of CNN or Convolutional Neural Network while working on image processing algorithms. Traditional fully connected networks were kind of a black box — that took in all of the inputs and passed through each value to a dense network that followed into a one hot output. That seemed to work with … Read more

Comparing common analysis strategies for repeated measures data

Dealing with dependencies in data. What is this all about? My hope with this post is to provide a conceptual overview of how to deal with a specific type of dataset commonly encountered in the social sciences (and very common in my own disciplines of experimental psychology and cognitive neuroscience). My goal is not to provide mathematical … Read more

What to do when your data fails OLS Regression assumptions

Regression analysis falls in the realm of inferential statistics. Consider the following equation: y ≈ β0 + β1x + e The approximate equals sign indicates that there is an approximate linear relationship between x and y. The error e term indicates this model isn’t going to fully reflect reality via a simple linear relation. The … Read more

Progressively-Growing GANs

The Progressively-Growing GAN architecture released from NVIDIA and published at ICLR 2018 has become the primary display of impressive GAN image synthesis. Classically, GANs have struggled to output low- and mid- resolution images such as 32² (CIFAR-10) and 128² (ImageNet), but this GAN model was able to generate high-resolution facial images at 1024². 1024 x … Read more

Using Machine Learning to Identify the Minerals in Meteorites

How Meteorites are Studied The scientists scan meteorites using an electron microprobe (EMP). An EMP shoots a beam of electrons at the meteorite. When the beam of electrons collides with the atoms in the meteorite, the atoms emit x-rays. Each element has a distinct, characteristic frequencies. A graph of characteristic frequencies of different elements. (https://commons.wikimedia.org/wiki/File:XRFScan.jpg) Before … Read more

Word2vec from Scratch with NumPy

How to implement a Word2vec model with Python and NumPy Introduction Recently, I have been working with several projects related to NLP at work. Some of them had something to do with training the company’s in-house word embedding. At work, the tasks were mostly done with the help of a Python library: gensim. However, I decided … Read more

Review: MultiChannel — Segment Colon Histology Images (Biomedical Image Segmentation)

Foreground Segmentation using FCN + Edge Detection Using HED + Object Detection Using Faster R-CNN Gland Haematoxylin and Eosin (H&E) stained slides and ground truth labels Foreground Segmentation using FCN + Edge Detection Using HED + Object Detection Using Faster R-CNN In this story, MultiChannel is briefly reviewed. It is a Deep MultiChannel Neural Networks used for gland … Read more

RTX 2060 Vs GTX 1080Ti in Deep Learning GPU Benchmarks: Cheapest RTX vs. Most Expensive GTX card.

Less than a year ago, with its GP102 chip + 3584 CUDA Cores + 11GB of VRAM, the GTX 1080Ti was the apex GPU of last-gen Nvidia Pascal range (bar the Titan editions).The demand was so high that retail prices often exceeded $900, way above the official $699 MSRP. In Fall 2018, Nvidia launched its … Read more

Understanding Semantic Segmentation with UNET

A Salt Identification Case Study Table of Contents: Introduction Prerequisites What is Semantic Segmentation? Applications Business Problem Understanding the data Understanding Convolution, Max Pooling and Transposed Convolution UNET Architecture and Training Inference Conclusion References 1. Introduction Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from … Read more

Docker for python development?

Part 1 covered what is docker. In this article, we’ll be talking about how to start using Docker for python development. A standard python installation involves setting up environment variables and if you’re dealing with different versions of python, there are tons of environment variables to be dealt with be it Windows or Linux. And … Read more

The Advent of Architectural AI

Artificial Intelligence Artificial Intelligence is fundamentally a statistical approach to architecture. The premise of AI, that blends statistical principles with computation is a new approach that can improve over the drawbacks of parametric architecture. “Learning”, as understood by machines, corresponds to the ability of a computer, when faced with a complicated issue, first to grasp … Read more

The complete beginner’s guide to data cleaning and preprocessing

How to successfully prepare your data for a machine learning model in minutes Data preprocessing is the first (and arguably most important) step toward building a working machine learning model. It’s critical! If your data hasn’t been cleaned and preprocessed, your model does not work. It’s that simple. Data preprocessing is generally thought of as the … Read more