Steps to basic modern NN model from scratch

After we have defined the matrix multiplication strategy, its time to defined the ReLU function and the forward pass for the Neural Network. I would request the readers to go through the Part — 1 of the series to get the background of the data used below. The Neural Network is defined as below: output … Read more Steps to basic modern NN model from scratch

future 1.15.0 – Lazy Futures are Now Launched if Queried

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. No dogs were harmed while making this release future 1.15.0 is … Read more future 1.15.0 – Lazy Futures are Now Launched if Queried

A Quick Short Look Into Bootstrapping

Big Questions: After an A/B testing, to what extent can we trust our small sample can represent the entire population of our customers? If we repeatedly sample the same size, how would our estimates vary? If we obtain different estimators after repeated sampling, can we gauge the distribution of the population? If we don’t know … Read more A Quick Short Look Into Bootstrapping

Reading in Data

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here’s a common situation: you have a folder full of similarly-formatted … Read more Reading in Data

Central Limit & Large Numbers

If you’re into math equations, let us now turn to formal representations of the theorems in order to understand their claims and the relationship between the two a bit more precisely. Let be independent and identically distributed random variables with expected value μ and finite variance σ². Then converges towards the Standard Normal Distribution in … Read more Central Limit & Large Numbers

How to understand Numpy documentation

When we start to learn Data Science, Machine Learning, Deep Learning or any excited fields that will be using Python as programming language, most probably all of us will be using numpy as well. In this post, I will be writing numpy basics and how to read documentation properly based on my experience of using … Read more How to understand Numpy documentation

Web Scrape Twitter by Python Selenium (Part 1)

Begin of tutorial PS: For a new beginner, I would suggest you work in Jupyter Notebook first because you will face more errors than anytime before. By using Jupyter Notebook you can run the script step by step so that you know where the problem is. Access to twitter frontpage The first step is to … Read more Web Scrape Twitter by Python Selenium (Part 1)

Predicting Heart Disease Mortality

Building a machine learning model that can identify high-risk states in 2019. According to the Center for Disease Control, “About 610,000 people die of heart disease in the United States every year–that’s 1 in every 4 deaths.” It is unlikely anyone reading this hasn’t been affected by this disease in some way. I, myself, lost … Read more Predicting Heart Disease Mortality

Reduce Memory Usage and Make Your Python Code Faster Using Generators

A hands on guide to create iterators in a very pythonic manner Photo by Createria on Unsplash When I started learning about python generators, I had no idea how important it would turn out to be. It has helped me immensely while writing custom functions throughout my machine learning journey. Generator functions allow you to … Read more Reduce Memory Usage and Make Your Python Code Faster Using Generators

5 Minute Guide to Detecting Holidays in Python

With Pandas, it’s fairly straightforward to construct a list of dates, let’s say for the whole year of 2019: Great. Now we can construct a DataFrame object from those dates — let’s put them into Dates column: Now here comes a slight problem. The dates look to be stored in a string format, just like … Read more 5 Minute Guide to Detecting Holidays in Python

Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

[This article was first published on Jozef’s Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous parts of this series, we have shown how to … Read more Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

How To Use Deep Learning Even with Small Data

And why it is so important You’ve heard the news — deep learning is the hottest thing since sliced bread. It promises to solve your most complicated problems for the small price of an enormous amount of data. The only problem is you are not working at Google nor Facebook and data are scarce. So … Read more How To Use Deep Learning Even with Small Data

A Quick Primer on Databricks Koalas

Interact with Spark Dataframes with Pandas vocabulary Photo by Jordan Whitt on Unsplash In a project of mine, I extensively used Spark to manage working with some large data files. Though it is often known for the many benefits for use with large distributed systems, it works equally well locally for projects working with large … Read more A Quick Primer on Databricks Koalas

How to code effectively without dying in the attempt

1. Find a comfortable working space Most programming and coding jobs are flexible enough that allow to work from home, a common space, a library or even a coffee shop, without having to be at an office 8 hours per day 5 days per week. However, the working environment will always have a highly significant … Read more How to code effectively without dying in the attempt

Design of Experiments for Your Change Management

A step-by-step Guide to Design of Experiments Data science professionals, have you ever faced any of the following challenges? Story 1: Machine learning does not mean experimental design You are asked to design an experiment due to your statistical expertise, but realized your machine learning tools do not help you design an experiment. Story 2: … Read more Design of Experiments for Your Change Management

Let’s calculate Z-scores for Airbnb prices in New York

Z-score, also called standard score, according to wikipedia. In statistics, the standard score is the signed fractional number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured. Translation: a measure of how far a value is from its population … Read more Let’s calculate Z-scores for Airbnb prices in New York

Why Companies Are Using Data Science and Analytics to Inform Benefits Packages

Employee benefits packages can help candidates choose to take job offers or look elsewhere. They can also factor into how long a worker stays at a company and how happy they are while there. If they realize that other companies offer better benefits and they’re frustrated with their job already, they may decide it’s not … Read more Why Companies Are Using Data Science and Analytics to Inform Benefits Packages

Integrating Python & Tableau

Bring your analyses to life with engaging data visualizations. When performing in-depth analyses on large and unstructured datasets, the power of Python and relevant machine learning libraries cannot be understated. Matplotlib serves as a great tool to help us visualize results, but it’s stylization options are not always optimal for use in presentations and dashboards. … Read more Integrating Python & Tableau

Why your AI might be racist and what to do about it

Individually reasonable correlations can cause an AI to gain a racial bias Even well-designed AI systems can still end up with a bias. This bias can cause the AI to exhibit racism, sexism, or other types of discrimination. Entirely by accident. This is usually considered a political problem, and ignored by scientists. The result is … Read more Why your AI might be racist and what to do about it

An Alternative To Batch Normalization

The development of Batch Normalization(BN) as a normalization technique was a turning point in the development of deep learning models, it enabled various networks to train and converge. Despite its great success, BN exhibits drawbacks that are caused by its distinct behavior of normalizing along the batch dimension. One of the major disadvantages of BN … Read more An Alternative To Batch Normalization

Managing virtual environment with pyenv

Most Python developers and data scientist have already heard of virtual environments. However, managing tens of environments created for different projects can be daunting. pyenv will help you to streamline the creation, management and activating virtual environments. In the old days, before the virtualenv became popular, I would keep a single global workspace for all … Read more Managing virtual environment with pyenv

Intrumental variable regression and machine learning

Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. … Read more Intrumental variable regression and machine learning

Learning Linux – the wrong way – day 2

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Unborking the borked laptop – Recap I’m trying to learn some Linux. Ostensibly … Read more Learning Linux – the wrong way – day 2

AzureR updates: AzureStor, AzureVM, AzureGraph, AzureContainers

Some major updates to AzureR packages this week! As well as last week’s AzureRMR update, there are changes to AzureStor, AzureVM, AzureGraph and AzureContainers. All of these are live on CRAN. AzureStor 3.0.0 There are substantial enhancements to multiple-file transfers (up and down). You can supply a vector of pathnames to storage_upload/download as the source … Read more AzureR updates: AzureStor, AzureVM, AzureGraph, AzureContainers

Amazon EC2 now supports Microsoft SQL Server 2019

Amazon EC2 now supports Microsoft SQL Server 2019, the latest release of Microsoft SQL Server. When you run SQL Server 2019 on Amazon EC2, you benefit from the scale, performance, and elasticity of the AWS Cloud, while leveraging the latest features available in Microsoft SQL Server 2019 such as enhanced PolyBase and intelligent query processing. … Read more Amazon EC2 now supports Microsoft SQL Server 2019

Amazon CloudWatch launches cross-account cross-region dashboards

Amazon CloudWatch now includes cross-account cross-region dashboards, which enable you to create high level operational dashboards, and with one click, drill down into more specific dashboards in different AWS accounts without having to log in and out of different accounts or switch AWS Regions. It is intended for centralized operations teams, DevOps engineers, and service … Read more Amazon CloudWatch launches cross-account cross-region dashboards

Introduction to Spark NLP: Foundations and Basic Components

As a native extension of the Spark ML API, the library offers the capability to train, customize and save models so they can run on a cluster, other machines or saved for later. It is also easy to extend and customize models and pipelines, as we’ll get in detail during this article series. Spark NLP … Read more Introduction to Spark NLP: Foundations and Basic Components

Best Practices for NLP Classification in TensorFlow 2.0

Use Data Pipelines, Transfer Learning and BERT to achieve 85% accuracy in Sentiment Analysis Photo by Jirsak, courtesy of Shutterstock When I first started working with Deep Learning, I went through Coursera and fast.ai courses, but afterwards I wondered where to go from here. I started asking questions like “How do I develop a data … Read more Best Practices for NLP Classification in TensorFlow 2.0

Using K-Means Clustering Algorithm to Redefine NBA Positions and Explore Roster Construction

Conventional positions within the NBA do not accurately reflect the playing style or functional role a player provides to their team. The overall style of play has changed drastically and various era’s within the NBA indicate that. Similarly a player’s style of play is also reflective of this change. Currently the league is fast paced … Read more Using K-Means Clustering Algorithm to Redefine NBA Positions and Explore Roster Construction

The City of the Homeless: Humanitarian Crisis on the Streets of Los Angeles

The dominant narrative around who is living on the street — and why it is so difficult to help them — is that people experiencing homelessness are all drug addicts and/or severely mentally ill. This damaging narrative dehumanizes people experiencing homelessness in a cynical attempt to justify inaction. But it is also factually incorrect: according … Read more The City of the Homeless: Humanitarian Crisis on the Streets of Los Angeles

Cloud Risk Assessment through Data- log analysis in AWS

https://aws.amazon.com/getting-started/projects/analyze-big-data/ These are the high-level steps: (Note: An AWS account setup is a pre-requisite. If you try this out, ensure that clusters and buckets are deleted after use to avoid additional charges). Sample data is loaded; in real-life projects the relevant dataset would replace this. Launch a Hadoop cluster using Amazon EMR [Elastic Map Reduce], … Read more Cloud Risk Assessment through Data- log analysis in AWS

Machine Learning and Data Analysis — Inha University (Part-2)

Welcome to the second part of Machine learning and data analysis series based on a graduate course offered by Inha University, Rep. of Korea. In this part, we will discuss Data structures in python. However, if you are viewing this for the first time then we encourage you to follow the first part first where … Read more Machine Learning and Data Analysis — Inha University (Part-2)

Automatic Speech Recognition as a Microservice on AWS

Let’s quickly get back to our LAB work and implement this highly-complex piece of work in a few easy steps. At this point, you should have your EC2 up and be SSHed into it. Please refer the Github Repository for any missing resources/links In your home directory [/home/ec2-user], maintain the following directory structure D -> … Read more Automatic Speech Recognition as a Microservice on AWS

How to Write Python Command-Line Interfaces like a Pro

Photo by Kelly Sikkema on Unsplash We as Data Scientists face doing many repetitive and similar tasks. That includes creating weekly reports, executing extract, transform, load (ETL) jobs, or training models using different parameter sets. Often, we end up having a bunch of Python scripts, where we change parameters in code every time we run … Read more How to Write Python Command-Line Interfaces like a Pro

Let’s build an Intelligent chatbot

Modern chatbots do not rely solely on text, and will often show useful cards, images, links, and forms, providing an app-like experience. Depending on way bots are programmed, we can categorize them into two variants of chatbots: Rule-Based (dumb bots) & Self Learning (smart bots). Rule-Based Chatbots: This variety of bots answer questions based on … Read more Let’s build an Intelligent chatbot

A small simple random sample will often be better than a huge not-so-random one by @ellis2013nz

An interesting big data thought experiment The other day on Twitter I saw someone referencing a paper or a seminar or something that was reported to examine the following situation: if you have an urn with a million balls in it of two colours (say red and white) and you want to estimate the proportion … Read more A small simple random sample will often be better than a huge not-so-random one by @ellis2013nz

Sharing the DevOps journey at Microsoft

Today, more and more organizations are focused on delivering new digital solutions to customers and finding that the need for increased agility, improved processes, and collaboration between development and operation teams is becoming business-critical. For over a decade, DevOps has been the answer to these challenges. Understanding the need for DevOps is one thing, but … Read more Sharing the DevOps journey at Microsoft

Transfer Learning in NLP

Neural Transfer Learning for Natural Language Processing by Sebastian Ruder Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359. Xia, R., Zong, C., Hu, X., and Cambria, E. (2015). Feature Ensemble plus Sample Selection: A Comprehensive Approach to Domain Adaptation for Sentiment Classification. Proceedings … Read more Transfer Learning in NLP

How Data Creates a Collective Storytelling Voice on a Global Issue

20.6k upvotes. 1.9k comments. Cross-posted on 10 other subreddits. Not super impressive. But is there more to these figures which supposedly reflect the success (or popularity) of the bar chart I posted on the subreddit /r/dataisbeautiful? Recently I posted a race bar chart showing the evolution of the top 10 countries of origin of international … Read more How Data Creates a Collective Storytelling Voice on a Global Issue

Onboarding a New Data Scientist

Efficiently onboard your new data champions. Your new employee will be ready to save the world in no time! Onboarding is Hard Onboarding is so important but very difficult. Most traditional jobs have clear expectations, documentation, and processes for onboarding. Data Science roles are completely different! This, of course, is a result of being a … Read more Onboarding a New Data Scientist

Kalman Filter(2) — Grid World Localisation

Apply Basics to 2 Dimensional Space In last post, we have applied basic Bayes rules and total probability to localise a moving car in a 1-dimension world. Let’s reinforce our understanding and apply them to a 2-dimension world. Consider a 2 dimensional world, the robot can move only left, right, up, or down. It cannot … Read more Kalman Filter(2) — Grid World Localisation

Beat The Heat with Machine Learning Cheat Sheet

Make the Next-to-last mistake Supervised Learning Supervised learning algorithms involves direct supervision of operation. We teach or train the machine using data, which means that the data is labelled with the right answer. We use an algorithm to analyse the training data and learn the function that maps inputs with their outputs. The function can … Read more Beat The Heat with Machine Learning Cheat Sheet

Custom Transformers in Python — Part II

Data Cleaning is the most important part of any Machine Learning project. The fact that your data may be in multiple formats and spread across different systems makes it imperative that the data is properly massaged before it’s fed to an ML Model. Data preparation is one of the most tedious and time-consuming steps in … Read more Custom Transformers in Python — Part II

Elizabeth Warren is Leading the 2020 Presidential Race: An analysis in Python

In this post we will use the python google trends API, pytrends, to analyze which of the leading democratic candidates are being searched most. In order to install pytrends open up a command line and type: pip install pytrends Next open up an IDE (I use Spyder) and import pytrends: from pyrtends.request import TrendReq Next … Read more Elizabeth Warren is Leading the 2020 Presidential Race: An analysis in Python