Data Science for Startups: Containers

Source: Building reproducible setups for machine learning One of the skills that is becoming more in demand for data scientists is the ability to reproduce analyses. Having code and scripts that only work on your machine is no longer sustainable. You need to be able to share your work and have other teams be able … Read more

Earthquake Analysis (2/4): Categorical Variables Exploratory Analysis

Categories Basic Statistics Tags Data Visualisation Exploratory Analysis R Programming This is the second part of our post series about the exploratory analysis of a publicly available dataset reporting earthquakes and similar events within a specific time window of 30 days. In the following, we are going to analyze the categorical variables of our dataset. … Read more

Categories R Tags ExcerptFavorite

Process Mining (Part 3/3): More analysis and visualizations

The interruption index measures how much does a resource have to toggle between cases before completing the case instead of completing all the activities for a case before proceeding to the next case. The toggling between incomplete cases could be due to many reasons. However, if the reason is due to disruptions in the workflow, … Read more

Categories R Tags ExcerptFavorite

Practical Introduction to Market Basket Analysis – Asociation Rules

Introduction Ever wondered why items are displayed in a particular way in retail/onlinestores. Why certain items are suggested to you based on what you have added tothe cart? Blame it on market basket analysis or association rule mining. Resources Below are the links to all the resources related to this post: What? Market basket analysis … Read more

Categories R Tags ExcerptFavorite

The Million-Dollar Neural Network, Part I: Understanding the Biological Basis

Learn How to Build a Neural Network & Enter to Win the $1.65M CMS AI Health Outcomes Challenge In This 3-Part Series What if I told you that you could learn to use machine learning — more specifically, neural networks — to tackle some of the biggest problems in healthcare? Some of you might be interested. Others, not so much. … Read more

Computational Analysis of Big Pharma

Each year, amidst long-awaited commencement celebrations, the words of the Hippocratic Oath — an ancient promise that medical professionals have made for centuries — echo through the halls of medical schools across the countries. These hallowed, esteemed institutions send thousands of bright graduates across the United States to alleviate suffering, change lives, and in some cases, hopefully, save them. … Read more

10 Machine Learning Methods that Every Data Scientist Should Know

Regression Regressions methods fall within the category of supervised ML. They help to predict or explain a particular numerical value based on a set of prior data, for example predicting the price of a property based on previous pricing data for similar properties. The simplest method is linear regression where we use the mathematical equation … Read more

Web Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Introduction I was learning about web scraping recently and thought of sharing my experience in scraping using beautifulsoup, scrapy,selenium and also using Twitter API’s and pandas datareader.Web scraping is fun and very useful tool.Python language made web scraping much easier. With less than 100 lines of code you can extract the data. Web scraping is … Read more

NLP Jam: The Grateful Dead and Phish

For this blog post, I will share one of the projects that I completed as part of my data science boot camp. Introduction My data science boot camp class was assigned the task of building an NLP model that would take in a reddit post, and classify it as belonging to one subreddit or another, based … Read more

Demystifying Quantum Machine Learning

Quantum Mechanics History Quantum Mechanics is a collection of scientific laws that describe the behaviour of subatomic particles an endlessly fascinating subject that has been notoriously difficult to master and somewhat controversial, even for physicists. It all started in the 17th century when scientists were trying to figure out the properties of light. At first, … Read more

Visualizing beyond 3 Dimensions

Peeking into unseeably complex data Vision is arguably one of our greatest strengths as humans. Even the most tremendously complex ideas tend to become easy (or at least, much easier) to understand as soon as you find a way to visualize them — our occipital lobes have done us well. That’s why when you’re working with datasets that … Read more

The Hitchhiker’s Guide to AI Ethics

A 3-part series exploring ethics issues in Artificial Intelligence But what is RIGHT? And is that enough? (Image: Machine Learning, XKCD) Don’t Panic “The Hitchhiker’s Guide to AI Ethics is a must read for anyone interested in the ethics of AI. The book is written in the style and spirit that has inspired many sci … Read more

So, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds

In this article I will demystify the term Artificial Intelligence, I will reveal where and how it is used. Lastly, using basic programming techniques I will provide a simple proof of concept that AI can be applicable in uncomplicated business processes. Do not fear, you do not have to be a tech guru to understand … Read more

Bayesian models in R

Greater Ani (Crotophaga major) is a cuckoo species whose females occasionally lay eggs in conspecific nests, a form of parasitism recently explored [source] If there was something that always frustrated me was not fully understanding Bayesian inference. Sometime last year, I came across an article about a TensorFlow-supported R package for Bayesian analysis, called greta. … Read more

Categories R Tags ExcerptFavorite

May Edition: Careers in Data Science

It’s already been almost nine long years since the famous declaration by the Harvard Business Review on Data Scientist being the “Sexiest Job of the 21st Century”. Since then, the data science field as a whole has matured in rapid ways. Notable among these developments is in the careers, from the rise of data science … Read more

Congress shut down, so she became a data scientist at Netflix

Learn how Becky Tucker overcame a setback in her academic career, what she does at Netflix and the importance of listening with empathy Netflix has revolutionized the way we watch and has turned subscription-based streaming content into the norm. With over 139 million global subscribers and a US market share of 51%, Netflix is a dominant … Read more

Deep Learning Book Series 3.1 to 3.3 Probability Mass and Density Functions

This content is part of a series about Chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. Github: the corresponding Python notebook can be found here. I’m happy … Read more

Python for Finance: Robo Advisor Edition

Extending Stock Portfolio Analyses and Dash by Plotly to track Robo Advisor-like Portfolios. Photo by Aditya Vyas on Unsplash. Part 3 of Leveraging Python for Stock Portfolio Analyses. Introduction. This post is the third installment in my series on leveraging Python for finance, specifically stock portfolio analyses. In part 1, I reviewed a Jupyter notebook … Read more

Reflecting on 6 Months of Leveraging Tech & Data in Theater

METHOD: STRUCTURING OUR WORKSHOPS We were concerned that our audience would be overwhelmed by our content. Therefore, we designed our workshops around lectureship (75% of the event) before concluding our events with hands-on exercises. Our first events along with their taglines are listed below: Websites & SEO How search engines find your site and how you can … Read more

An intuitive understanding of the LAMB optimizer

The latest technique for distributed training of large deep learning models In software engineering, decreasing cycle time has a super-linear effect on progress. In modern deep learning, cycle time is often on the order of hours or days. The easiest way to speed up training, data parallelism, is to distribute copies of the model across GPUs … Read more

A gentle guide into Decision Trees with Python

Decision tree algorithm is a supervised learning model used in predicting a dependent variable with a series of training variables. Decision trees algorithms can be used for classification and regression purposes. In this particular project, I am going to illustrate it in the classification of a discrete random variable. Some questions decision tree can answer. Should … Read more

K-Means Clustering in SAS

What is Clustering? “Clustering is the process of dividing the datasets into groups, consisting of similar data-points”. Clustering is a type of unsupervised machine learning, which is used when you have unlabeled data. Let’s understand in the real scenario, Group of diners sitting in a restaurant. Let’s say two tables in the restaurant called T1 … Read more

Detailed Guide to the Bar Chart in R with ggplot

When it comes to data visualization, flashy graphs can be fun. Believe me, I’m as big a fan of flashy graphs as anybody. But if you’re trying to convey information, especially to a broad audience, flashy isn’t always the way to go. Whether it’s the line graph, scatter plot, or bar chart (the subject of … Read more

Categories R Tags ExcerptFavorite

Employee Turnover: a Risk Segmenting Investigation

In this post, I conduct a simple risk analysis of employee turnover using the Human Resources Analytics data set from Kaggle. I describe this analysis as an example of simple risk segmenting because I would like to have a general idea of which combination of employee characteristics can provide evidence towards higher employee turnover. To … Read more

Analyzing the Titanic with a Business Analyst mindset using R (ggplot2)

Lately, i have been fascinated with R programming software and the fantastic data visualization package (ggplot2) created by Hadley Wickham. I am a Business analyst by profession with lots of experience in the e-payments industry but i have found passion with data analysis, visualizing and communicating data to stakeholders. Motivation for this post Well, why Titanic…one … Read more

Set Analysis: A face off between Venn diagrams and UpSet plots

It’s time for me to come clean about something; I think Venn diagrams are fun! Yes that’s right, I like them. They’re pretty, they’re often funny, and they convey the straight forward overlap between one or two sets somewhat easily. Because I like making nerd comedy graphs, I considered sharing with y’all how to create … Read more

Categories R Tags ExcerptFavorite

Ultimate AI Strategy Guide

Strategy and AI are terms that are hard to pin down — they mean different things to different people. Combine AI and Strategy — 人工知能 战略 and now you have a harder problem to tackle! The goal of this post is to bring in the best advice out there about AI strategy and add a practitioner’s point of view … Read more

Abstractive Summarization of Dialogues

“ There are 2.5 quintillion bytes of data created each day ” — That was 2017, and now we seem to have even lost the count. Putting things in a concise way has become very critical. No doubt, automation has reached the art of summarization. Recently, there has been a buzz about summarizing news articles using deep … Read more

Visualizing an NFL Big Board with Pandas and Plotly

Ah, spring. A time when a young person’s fancy turns away from being disappointed by theirNBA team and looks forward to being disappointed by their MLB team. Specifically for me however, spring brings another big sports phenomenon — the NFL Draft. Since I root for the Buffalo Bills, “winning” the Draft is as close as I can … Read more

Three key fails in Machine Learning

The mythology that has been associated with Machine Learning can lead to poor judgment about when and how to apply it I kind of had a backwards introduction to analytics. My first heavy involvement in any sort of dedicated analytics project was a Machine Learning project. I mean if you are going to do it, why … Read more

State of the Art Audio Data Augmentation with Google Brain’s SpecAugment and Pytorch

Implementing SpecAugment with Pytorch & TorchAudio Zach CBlockedUnblockFollowFollowing Apr 30 Google Brain recently published SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, which achieved state of the art results on various speech recognition tasks. Unfortunately, Google Brain did not release code and it seems like they wrote their version in TensorFlow. For practitioners … Read more

Foiled again! A brief discussion on folium

When you’re thinking about visualizations, there’s lots and lots of good choices out there: Bar charts for when you’re bar-hoppin’, scatter plots for when you’re scatter-brained, histograms for those days when you just wanna throw a histy-fit, and of course, pie charts for dessert. Today though, I’m gonna say a few words about a helpful … Read more

Risk and return for B3

One of the subjects that I teach in my undergraduate finance class is the relationship between risk and expected returns. In short, the riskier the investment, more returns should be expected by the investor. It is not a difficult argument to make. All that you need to understand is to remember that people are not … Read more

Categories R Tags ExcerptFavorite

Aviron, course des impressionnistes

Le 1er mai 2019, j’ai participé à la course des impressionnistes sous les couleursde notre club [CERAMM]( Nous sommes engagés en 4 de couplesans barreur, homme. # Notre équipage Il est composé de 4 hommes, répartis comme suit sur le navire n° équipier| prénom | rôles particuliers:———:|:———–:|:———————————–1 | Fabrice | à la nage2 | Fabien … Read more

Categories R Tags ExcerptFavorite