Data Science for Startups: Containers

Source: https://commons.wikimedia.org/wiki/File:CMA_CGM_Benjamin_Franklin.jpeg Building reproducible setups for machine learning One of the skills that is becoming more in demand for data scientists is the ability to reproduce analyses. Having code and scripts that only work on your machine is no longer sustainable. You need to be able to share your work and have other teams be able … Read moreData Science for Startups: Containers

Earthquake Analysis (2/4): Categorical Variables Exploratory Analysis

Categories Basic Statistics Tags Data Visualisation Exploratory Analysis R Programming This is the second part of our post series about the exploratory analysis of a publicly available dataset reporting earthquakes and similar events within a specific time window of 30 days. In the following, we are going to analyze the categorical variables of our dataset. … Read moreEarthquake Analysis (2/4): Categorical Variables Exploratory Analysis

Process Mining (Part 3/3): More analysis and visualizations

The interruption index measures how much does a resource have to toggle between cases before completing the case instead of completing all the activities for a case before proceeding to the next case. The toggling between incomplete cases could be due to many reasons. However, if the reason is due to disruptions in the workflow, … Read moreProcess Mining (Part 3/3): More analysis and visualizations

Practical Introduction to Market Basket Analysis – Asociation Rules

Introduction Ever wondered why items are displayed in a particular way in retail/onlinestores. Why certain items are suggested to you based on what you have added tothe cart? Blame it on market basket analysis or association rule mining. Resources Below are the links to all the resources related to this post: What? Market basket analysis … Read morePractical Introduction to Market Basket Analysis – Asociation Rules

The Million-Dollar Neural Network, Part I: Understanding the Biological Basis

Learn How to Build a Neural Network & Enter to Win the $1.65M CMS AI Health Outcomes Challenge In This 3-Part Series What if I told you that you could learn to use machine learning — more specifically, neural networks — to tackle some of the biggest problems in healthcare? Some of you might be interested. Others, not so much. … Read moreThe Million-Dollar Neural Network, Part I: Understanding the Biological Basis

Computational Analysis of Big Pharma

Each year, amidst long-awaited commencement celebrations, the words of the Hippocratic Oath — an ancient promise that medical professionals have made for centuries — echo through the halls of medical schools across the countries. These hallowed, esteemed institutions send thousands of bright graduates across the United States to alleviate suffering, change lives, and in some cases, hopefully, save them. … Read moreComputational Analysis of Big Pharma

10 Machine Learning Methods that Every Data Scientist Should Know

Regression Regressions methods fall within the category of supervised ML. They help to predict or explain a particular numerical value based on a set of prior data, for example predicting the price of a property based on previous pricing data for similar properties. The simplest method is linear regression where we use the mathematical equation … Read more10 Machine Learning Methods that Every Data Scientist Should Know

Python Pro Tip: Want to use R/Java/C or Any Language in Python?

Python Shorts Python provides a basic and simple way to handle such requirements where we have to switch to and fro between multiple languages Python is great. Really great. But the field is getting/will get language agnostic with time. And a lot of great work is being done in many other languages. While I still … Read morePython Pro Tip: Want to use R/Java/C or Any Language in Python?

Web Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Introduction I was learning about web scraping recently and thought of sharing my experience in scraping using beautifulsoup, scrapy,selenium and also using Twitter API’s and pandas datareader.Web scraping is fun and very useful tool.Python language made web scraping much easier. With less than 100 lines of code you can extract the data. Web scraping is … Read moreWeb Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Demystifying Quantum Machine Learning

Quantum Mechanics History Quantum Mechanics is a collection of scientific laws that describe the behaviour of subatomic particles an endlessly fascinating subject that has been notoriously difficult to master and somewhat controversial, even for physicists. It all started in the 17th century when scientists were trying to figure out the properties of light. At first, … Read moreDemystifying Quantum Machine Learning

Visualizing beyond 3 Dimensions

Peeking into unseeably complex data Vision is arguably one of our greatest strengths as humans. Even the most tremendously complex ideas tend to become easy (or at least, much easier) to understand as soon as you find a way to visualize them — our occipital lobes have done us well. That’s why when you’re working with datasets that … Read moreVisualizing beyond 3 Dimensions

The Hitchhiker’s Guide to AI Ethics

A 3-part series exploring ethics issues in Artificial Intelligence But what is RIGHT? And is that enough? (Image: Machine Learning, XKCD) Don’t Panic “The Hitchhiker’s Guide to AI Ethics is a must read for anyone interested in the ethics of AI. The book is written in the style and spirit that has inspired many sci … Read moreThe Hitchhiker’s Guide to AI Ethics

So, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds

In this article I will demystify the term Artificial Intelligence, I will reveal where and how it is used. Lastly, using basic programming techniques I will provide a simple proof of concept that AI can be applicable in uncomplicated business processes. Do not fear, you do not have to be a tech guru to understand … Read moreSo, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds

Bayesian models in R

Greater Ani (Crotophaga major) is a cuckoo species whose females occasionally lay eggs in conspecific nests, a form of parasitism recently explored [source] If there was something that always frustrated me was not fully understanding Bayesian inference. Sometime last year, I came across an article about a TensorFlow-supported R package for Bayesian analysis, called greta. … Read moreBayesian models in R

May Edition: Careers in Data Science

It’s already been almost nine long years since the famous declaration by the Harvard Business Review on Data Scientist being the “Sexiest Job of the 21st Century”. Since then, the data science field as a whole has matured in rapid ways. Notable among these developments is in the careers, from the rise of data science … Read moreMay Edition: Careers in Data Science

Congress shut down, so she became a data scientist at Netflix

Learn how Becky Tucker overcame a setback in her academic career, what she does at Netflix and the importance of listening with empathy Netflix has revolutionized the way we watch and has turned subscription-based streaming content into the norm. With over 139 million global subscribers and a US market share of 51%, Netflix is a dominant … Read moreCongress shut down, so she became a data scientist at Netflix

Getting started with Visualizations in Python

Line Plots: A line chart or line graph is a type of chart which displays information as a series of data points called ‘markers’ connected by straight line segments. This is more of a scatter plot, just lines in between, it is used to display data points with respect to an ongoing or moving factor. … Read moreGetting started with Visualizations in Python

Deep Learning Book Series 3.1 to 3.3 Probability Mass and Density Functions

This content is part of a series about Chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. Github: the corresponding Python notebook can be found here. I’m happy … Read moreDeep Learning Book Series 3.1 to 3.3 Probability Mass and Density Functions

Python for Finance: Robo Advisor Edition

Extending Stock Portfolio Analyses and Dash by Plotly to track Robo Advisor-like Portfolios. Photo by Aditya Vyas on Unsplash. Part 3 of Leveraging Python for Stock Portfolio Analyses. Introduction. This post is the third installment in my series on leveraging Python for finance, specifically stock portfolio analyses. In part 1, I reviewed a Jupyter notebook … Read morePython for Finance: Robo Advisor Edition

Reflecting on 6 Months of Leveraging Tech & Data in Theater

METHOD: STRUCTURING OUR WORKSHOPS We were concerned that our audience would be overwhelmed by our content. Therefore, we designed our workshops around lectureship (75% of the event) before concluding our events with hands-on exercises. Our first events along with their taglines are listed below: Websites & SEO How search engines find your site and how you can … Read moreReflecting on 6 Months of Leveraging Tech & Data in Theater

An intuitive understanding of the LAMB optimizer

The latest technique for distributed training of large deep learning models In software engineering, decreasing cycle time has a super-linear effect on progress. In modern deep learning, cycle time is often on the order of hours or days. The easiest way to speed up training, data parallelism, is to distribute copies of the model across GPUs … Read moreAn intuitive understanding of the LAMB optimizer

A gentle guide into Decision Trees with Python

Decision tree algorithm is a supervised learning model used in predicting a dependent variable with a series of training variables. Decision trees algorithms can be used for classification and regression purposes. In this particular project, I am going to illustrate it in the classification of a discrete random variable. Some questions decision tree can answer. Should … Read moreA gentle guide into Decision Trees with Python

K-Means Clustering in SAS

What is Clustering? “Clustering is the process of dividing the datasets into groups, consisting of similar data-points”. Clustering is a type of unsupervised machine learning, which is used when you have unlabeled data. Let’s understand in the real scenario, Group of diners sitting in a restaurant. Let’s say two tables in the restaurant called T1 … Read moreK-Means Clustering in SAS

Detailed Guide to the Bar Chart in R with ggplot

When it comes to data visualization, flashy graphs can be fun. Believe me, I’m as big a fan of flashy graphs as anybody. But if you’re trying to convey information, especially to a broad audience, flashy isn’t always the way to go. Whether it’s the line graph, scatter plot, or bar chart (the subject of … Read moreDetailed Guide to the Bar Chart in R with ggplot

Employee Turnover: a Risk Segmenting Investigation

In this post, I conduct a simple risk analysis of employee turnover using the Human Resources Analytics data set from Kaggle. I describe this analysis as an example of simple risk segmenting because I would like to have a general idea of which combination of employee characteristics can provide evidence towards higher employee turnover. To … Read moreEmployee Turnover: a Risk Segmenting Investigation

Analyzing the Titanic with a Business Analyst mindset using R (ggplot2)

Lately, i have been fascinated with R programming software and the fantastic data visualization package (ggplot2) created by Hadley Wickham. I am a Business analyst by profession with lots of experience in the e-payments industry but i have found passion with data analysis, visualizing and communicating data to stakeholders. Motivation for this post Well, why Titanic…one … Read moreAnalyzing the Titanic with a Business Analyst mindset using R (ggplot2)

Set Analysis: A face off between Venn diagrams and UpSet plots

It’s time for me to come clean about something; I think Venn diagrams are fun! Yes that’s right, I like them. They’re pretty, they’re often funny, and they convey the straight forward overlap between one or two sets somewhat easily. Because I like making nerd comedy graphs, I considered sharing with y’all how to create … Read moreSet Analysis: A face off between Venn diagrams and UpSet plots

Abstractive Summarization of Dialogues

“ There are 2.5 quintillion bytes of data created each day ” — That was 2017, and now we seem to have even lost the count. Putting things in a concise way has become very critical. No doubt, automation has reached the art of summarization. Recently, there has been a buzz about summarizing news articles using deep … Read moreAbstractive Summarization of Dialogues

Visualizing an NFL Big Board with Pandas and Plotly

Ah, spring. A time when a young person’s fancy turns away from being disappointed by theirNBA team and looks forward to being disappointed by their MLB team. Specifically for me however, spring brings another big sports phenomenon — the NFL Draft. Since I root for the Buffalo Bills, “winning” the Draft is as close as I can … Read moreVisualizing an NFL Big Board with Pandas and Plotly

State of the Art Audio Data Augmentation with Google Brain’s SpecAugment and Pytorch

Implementing SpecAugment with Pytorch & TorchAudio Zach CBlockedUnblockFollowFollowing Apr 30 Google Brain recently published SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, which achieved state of the art results on various speech recognition tasks. Unfortunately, Google Brain did not release code and it seems like they wrote their version in TensorFlow. For practitioners … Read moreState of the Art Audio Data Augmentation with Google Brain’s SpecAugment and Pytorch

Foiled again! A brief discussion on folium

When you’re thinking about visualizations, there’s lots and lots of good choices out there: Bar charts for when you’re bar-hoppin’, scatter plots for when you’re scatter-brained, histograms for those days when you just wanna throw a histy-fit, and of course, pie charts for dessert. Today though, I’m gonna say a few words about a helpful … Read moreFoiled again! A brief discussion on folium

Aviron, course des impressionnistes

Le 1er mai 2019, j’ai participé à la course des impressionnistes sous les couleursde notre club [CERAMM](https://www.ceramm.fr/). Nous sommes engagés en 4 de couplesans barreur, homme. # Notre équipage Il est composé de 4 hommes, répartis comme suit sur le navire n° équipier| prénom | rôles particuliers:———:|:———–:|:———————————–1 | Fabrice | à la nage2 | Fabien … Read moreAviron, course des impressionnistes