Epoch 1/251000/1000 [==============================] – 913s 913ms/step – loss: 0.3476 – acc: 0.8502 – val_loss: 2.2280 – val_acc: 0.5000Epoch 2/251000/1000 [==============================] – 907s 907ms/step – loss: 0.1354 – acc: 0.9564 – val_loss: 0.5738 – val_acc: 0.8629Epoch 3/251000/1000 [==============================] – 904s 904ms/step – loss: 0.0675 – acc: 0.9825 – val_loss: 0.6880 – val_acc: 0.8710Epoch 4/251000/1000 [==============================] – … Read more My first contribution to Data Science -A Convolutional Neural Network that recognizes images of…
Wikipedia records a plethora of QAs and academia has defined multiple taxonomies over the years: Courtesy: Kennet Henningsson Likewise, there are at least a couple of ISO Standards (that I know of) which try to classify the QAs in different categories: ① ISO 25010 and ② ISO 9126. All of the QAs are crucial but … Read more Architecting For The -ilities
Simple evolution guidelines for data science products Evolving data science products for new teams can be a daunting task. There are conflicting requirements embedded in the nature of data science products. First constraint: product teams want to move proof of concepts to market as fast as possible. Second constraint: data science (DS) teams need a … Read more Avoiding the “Automatic Hand-off” Syndrome in Data Science Products
Credit Looking at today’s Internet, it is easy to wonder: whatever happened to the dream that it would be good for democracy? Well, looking past the scandals of big social media and scary plays of autocracy’s hackers, I think there’s still room for hope. The web remains full of small experiments in self-governance. It’s still … Read more Bringing big data to the science of community: Minecraft Edition
Dream about a job connected to AI? This guide is your must-read. Artificial Intelligence. Well, it looks like this cutting-edge technology is now the most popular and at the same time the most decisive one for humanity. We are ceaselessly amazed at the AI capabilities and the effective way they can be used in almost … Read more How to Get Started as a Developer in AI
How did I get started working with data Even before I started working in ICORating, in my previous work as a blockchain developer, I had to make a service to build graphs based on data in the blockchain and based on data from exchanges. It all sounds pretty simple, but the problem starts exactly at … Read more Working as a Data Scientist in Blockchain Startup
Suppose we are given a 6-days operational log file from a piece of particular equipment. The equipment is used for 10 hours a day. Our objective is to find if there are days with issues. The only problem is that the log status is written in a foreign language (let’s say it happens that the … Read more Overall Equipment Effectiveness and Topic Modeling
Pitchfork has been one of the most prolific music publications of the last twenty years. Only two others have more reviews in the dataset, and since 2018, Pitchfork has published more reviews on a monthly basis than anyone else, aside from AllMusic. The trends here are interesting in and of themselves — what causes this … Read more The Devil’s Music
Photo by Mika Baumeister on Unsplash Let’s kick off with asking ourselves a question, What is Data? → In simpler terms, Data is a collection of objects and their attributes. Other names for a data object are record, point, vector, pattern, event, case, sample, observation, or entity. Now, we must know what are Attributes? → … Read more Journey into Data Mining
An event-driven approach In traditional neural networks using the sigmoid activation function, all neurons are more or less activated. There is no clear case of an inactive neuron here. That can be problematic if you want to compute extremely large networks, because in each round you’d have to update all the neurons. Intuitively, it would … Read more How to efficiently propagate activations in a massive neural network
In order to monitor a feature, we need to compute a single metric that will compare the training and inference distributions. There are many different similarity distributions to choose from but few that can be applied to both categorical and numerical variables. Computing similarity metrics The Wasserstein distance  is a similarity measure that can … Read more Life of a model after deployment
You don’t. Photo by Jamie Street Collaborations are at the heart of what we do as researchers, and success in one’s field has often equally to do with forging effective collaborations as coming up with innovative ideas. Yet collaborations are such fragile things, it’s sometimes a wonder that they can be sustained at all in … Read more How do I get Someone to Work on my Research?
Let’s make logreg great again! Nowadays there are a lot of pre-trained nets for NLP which are SOTA and beat all benchmarks: BERT, XLNet, RoBERTa, ERNIE… They are successfully applied to various datasets even when there is little data available. At the end of July (23.07.2019–28.07.2019) there was a small online hackathon on Analytics Vidhya … Read more Approaches to sentimental analysis on a small imbalanced dataset without Deep Learning
Recently I’ve been working with manufacturing customers (both OEM and CM) who want to jump on the bandwagon of machine learning. One common use case is to better detect products (or Device Under Test/DUT) that are defective in their production line. Using machine learning’s terminology, this falls under the problem of binary classification as a … Read more One Class Learning in Manufacturing: Autoencoder and Golden Units Baselining
The Beginner’s Guide to Unsupervised Learning Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized every aspect of our life and disrupted how we do business, unlike any other technology in the the history of mankind. Such disruption brings many challenges for professionals and businesses. In this article, I will provide an introduction to one … Read more K-Means Clustering for Unsupervised Machine Learning
… in which I discuss a workflow where you can start writing your contents on a jupyter notebook, create a reveal.js slide deck, and host it on github for presentations. This is for a very simple presentation that you can fully control yourself A first simple slide deck Part I: Basic slide deckPart II: Basic … Read more How to create data-driven presentations with jupyter notebooks, reveal.js,
Stat 110: The quintessential Probability and Statistics course you gotta take. All the lectures and notes are available on Youtube and his site for free. If not for the content then for Prof. Joseph Blitzstein sense of humor. The above picture is a testament to that. I took this course to enhance my understanding of … Read more How did I learn Data Science?
An introduction to Stochastic processes and how they are applied every day in Data Science and Machine Learning. “The only simple truth is that there is nothing simple in this complex universe. Everything relates. Everything connects” — Johnny Rich, The Human Script One of the main application of Machine Learning is modelling stochastic processes. Some … Read more Stochastic Processes Analysis
The Limericking Project Identifying key summary sentences from text Welcome to part 3 in my ongoing Limericking series, where I explore the great potential of Natural Language Processing to parse news text and write poetry. As I explained in part 1 of the series, I am an avid fan of the Twitter account Limericking which … Read more Limericking part 3: text summarization
CC by 2.0 Different players have different strengths and weaknesses — is there a way to visualize them? Back in the early 2000’s, the New York Giants had an exciting running back duo. Tiki Barber (“Lightning”) went for about 1000 yards rushing and 550 yards receiving a year. That’s impressive on its own, but even … Read more Visualizing Different NFL Player Styles
One of my main interests during my modeling process was to determine which words were more weighted with greater importance when predicting the subreddit of origin, rather the simply count of certain predictive words. So while I did utilize a Count Vectorization model, I found that the weighted Vectorization produced with Scikit-Learn’s TF-IDF functionality, TfidfVectorizer, … Read more Natural Language Processing and Sports Subreddits
The vision of RAPIDS cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks. This is a goal that many of us on the cuGraph team have been working on for almost twenty years. Many of the early attempts focused on solving … Read more RAPIDS cuGraph — The vision and journey to version 1.0 and beyond
First we are going to check if Python is installed and install another library that will help us deal with spreadsheets. A library is a collection of code that has implemented (usually) hard things to do in a simpler way. We need to first open up the Terminal which will let us interact with our … Read more Intro to Reading and Writing Spreadsheets with Python
Learn the nuts and bolts of a neural network’s most important ingredient This article is inspired by my frustration over the inability to find a simple and concise explanation of backpropagation which includes the necessary mathematics and covers the essentials. So I decided to write it here. Enjoy! Backpropagation algorithm is probably the most fundamental … Read more Understanding backpropagation algorithm
“TELL ME AND I FORGET. TEACH ME AND I REMEMBER. INVOLVE ME AND I LEARN.” –BENJAMIN FRANKLIN Life is a journey through learning experiences. As, we are continuously learning new tasks and acquiring new knowledge and we have a magical, and purely understood ability to leverage previous experiences to optimize how we build new knowledge. … Read more What are Progressive Neural Networks?
Discover the treasures of Python! Discover the treasures of Python! Python is a beautiful language. Simple to use yet powerfully expressive. But are you using everything that it has to offer? Every well experienced developer knows that knowing the hidden treasures of their programming language of choice helps them get around many common bugs and … Read more The Treasures of Python’s built in Libraries
Let’s get away from the neural network hype and go back to the basics a bit, to the times when things actually made sense why they work. Machine learning is a really hot topic, everybody wants to do it or use it somehow in a product, or to reduce business operating costs. Machine learning seems … Read more You Must Know Least Squares
The fundamental basis behind this commonly used algorithm Linear regression, while a useful tool, has significant limits. As it’s name implies, it can’t easily match any data set that is non-linear. It can only be used to make predictions that fit within the range of the training data set. And, most importantly for this article, … Read more Understanding Multiple Regression
How to teach products to make decisions What’s so transformative about AI, anyway? Artificial Intelligence (AI) is regularly breaking new ground, from DeepMind’s AlphaGo Zero teaching itself to play Go and beating human champions to text-generating algorithms so powerful that their creators at OpenAI decided not to release them publicly for fear of malicious use. … Read more 4 Product-Driven Steps to an AI Roadmap
Photo by Kane Reinholdtsen Public speaking used to be a big sore spot for me. I was able to survive it, but just barely. I truly hated it to my core and it caused me a great deal of grief. And don’t get me started on impromptu speaking — whenever something like that would pop … Read more Why Data Scientists Should join Toastmasters
What if they find out you’re clueless? Impostor syndrome is the elephant in the data science lab. Everyone has it, no one thinks other people have it, and no one talks about it. I’m amazed that more people don’t discuss it openly. I work at a data science mentorship startup where I probably spend 20% … Read more How to manage impostor syndrome in data science
Machine as Creative Partners https://www.nytimes.com/2016/05/07/arts/design/harold-cohen-a-pioneer-of-computer-generated-art-dies-at-87.html Humans are the natural maker; we enjoy the freedom of making things. However, in the context of automation, machines challenge humans’ role in fabrication. It wastes the human’s unique skills and makes people disconnect to real-world materials. To address this issue, researchers proposed the hybrid workflow, which starts from studying … Read more Hybrid Intelligence
[image of a ballpit with some slides and other play-ground like things; bright colors and whimsy] Ballpits are probably a public health crisis, but I think they get across the magical feeling of “playtime” quite well. Let’s talk about play! I’ve held some interesting jobs since graduating from college. While working as a cognitive science … Read more How to Meaningfully Play With Data
This is a Lasso; it is used to pick and capture animals. As a non-native English speaker, my first exposure to this word is in supervised learning. In this LASSO data science tutorial, we discuss the strengths of the Lasso logistic regression by stepping through how to apply this useful statistical method for classification problems … Read more Variable selection using LASSO
Now that you hopefully have a general grasp of Git as well as an account on GitHub, let’s combine the two and see what these two technological components can really offer us. I’ll be demonstrating one primary use case with Git an GitHub that most users will execute in their time with the version control … Read more Version Control Systems— Git & GitHub
Julia to the rescue! Photo by Debby Hudson on Unsplash Nowadays, most data scientists use either Python or R as their main programming language. That was also my case until I met Julia earlier this year. Julia promises performance comparable to statically typed compiled languages (like C) while keeping the rapid development features of interpreted … Read more Freeing the data scientist mind from the curse of vectoRization
The Power of Selenium, Excel and Pandas Josh Calabrese via unsplash Serving in Tennis often determines the outcome of the match. A break of serve or a hold of serve at a key moment in a set can ultimately decide whether silverware beckons. Through the combination of measured web scraping with appropriately placed explicit waits, … Read more The King of Serving: Tennis Web Scraping with Selenium
A serverless approach for Data Scientists Photo by Daniel Eledut on Unsplash Amazon Lambda is probably the most famous serverless service available today offering low cost and practically no cloud infrastructure governance needed. It offers a relatively simple and straightforward platform for implementing functions on different languages like Python, Node.js, Java, C# and many more. … Read more Introduction to Amazon Lambda, Layers and boto3 using Python3
Doing cool things with data! Introduction Artificial Intelligence is transforming several sectors of the economy such as automotive, marketing and healthcare. Retail could be next. The essential retail experience of shopping in store has remained unchanged for decades. AI could radically transform this experience by making it cost-effective to deliver a completely personalized, immersive and … Read more Visual Product Search for Smart Retail Checkout
This article shows you how to improve your data science skill set. In particular, this can apply to data scientists looking to switch roles or others breaking into the field. Think of this as a flexible framework to guide you where to go next in your data science career. This method is what I generally … Read more How to Develop as a Data Scientist
How can I save my White House joy, from 2020’s deadly toy? A trade war is good politics. I wrote in January, May, and June about a global battle being waged between Freedom and Autocracy. The President, despite being attacked for his own supposed autocratic instincts, has successfully framed himself at the center of this … Read more Believe Me When I Say To You, I Hope You Love a Trade War Too!
In 1997, a computer named “Deep Blue” defeated reigning world chess champion Garry Kasparov — a defining moment in the history of AI theory. But the great minds behind the chess computer problem had started publishing in the subject nearly 6 decades earlier. Known as the father of modern computer science, Alan Turing is credited … Read more How a Computerized Chess Opponent “Thinks” — The Minimax Algorithm
In the last weeks he went together into a journey about Recommendation Systems. We saw a gentle introduction to the topic and also an introduction to the most important similarity measures around it (remember that the whole repository about recommendation system and other projects are always available on my GitHub profile). And yes, I know, … Read more 6 amateur mistakes I’ve made working with train-test splits
Photo by Bernd Klutsch on Unsplash One of the first resources I like to look through as I’m learning a new library, a new function within a library, a new programming language, etc. is the documentation for that specific library / function / language. In many cases, the documentation has been meticulously curated to provide … Read more 5 Resources Every Data Scientist (and Programmer) Should Use
Artificial Intelligence Turn your online presence from a negative to a positive environment to protect us from AI and ourselves. Online presence photo from Pexels Artificial Intelligence is generating a lot of press in recent times and rightly so. Data is now the most valuable resource in the world and Artificial Intelligence (AI) can utilise … Read more Why we all have to start being nicer to each other online.
When AB testing doesn’t cut it Today I am going to talk about experimentation in data science, why it is so important and some of the different techniques that we might consider using when AB testing is not appropriate. Experiments are designed to identify causal relationships between variables and this is a really important concept … Read more Experimentation in Data Science
An end to end guide on how to create, train and test a Machine Learning model in your browser using Tensorflow.js. Thanks to recent advancements in Artificial Intelligence it is now becoming relatively easy to build and train Machine Learning models. Although, these models can only benefit society by sharing them and making them ready … Read more Online Machine Learning with Tensorflow.js
What the bot developers don’t want you to know It is trivial to make a working bot, less so to have a profitable one. — Reddit User When choosing a trading bot to invest our hard-earned money into, we have one of three options: Use an Open Source script. These would be from Github repositories … Read more Which Trading Bots are even Profitable?
Dick Brandon hit it bang on the nail when he observed. “Documentation is like sex; when it’s good, it’s very, very good, and when it’s bad, it’s better than nothing.” Documentation is the castor oil of programming. Managers think it is good for programmers and programmers love to hate it! But that said, great developers, … Read more 5 Bad Habits of Absolutely Ineffective Programmers.
Including more features in the model makes the model more complex, and the model may be overfitting the data. Some features can be the noise and potentially damage the model. By removing those unimportant features, the model may generalize better. The Sklearn website listed different feature selection methods. This article is mainly based on the … Read more Feature selection using Python for classification problem