Data Stories — Plane Flights

Unfortunately for me, it turns out that they’re right. Of a possible 365 days that I could’ve spent in Portland (where me spending more than 6 hours of awake time in a location counted in that locations favor), I spent 270 days in town (a solid 74%). Thinking about it another way, in an average … Read more

A.I Adoption in the Financial Sector

How A.I and Machine learning technologies are being accepted and embraced by capital market firms. Image Source: Pexels.com This piece, based on a survey carried out by Waters Technology, explores increased interest from firms on both sides of the industry around the development and deployment of AI technology across their back offices, some use-cases for … Read more

Making Deep Learning models ready for the worst-case scenario and cross-platform ready with…

As 2020 just arrived, the community of the deep learning experts and enthusiasts are looking forth to a significant year of innovation in the field. With a mounted figure of deep learning models being built every day around the world, the dependencies of the humankind on the Cloud and Network(especially TCP ) is expanding day-by-day. … Read more

The math behind GANs (Generative Adversarial Networks)

A detailed understanding of the math behind original GANs including their limitations The Generative Adversarial Network (GAN) comprises of two models: a generative model G and a discriminative model D. The generative model can be considered as a counterfeiter who is trying to generate fake currency and use it without being caught, whereas the discriminative … Read more

Mining Twitter on IBM Cloud Platform

When working on Watson you may need to install tweepy each time you reconnect. In order to retrieve tweets, those are the minimum necessary imports and installation: For connecting to twitter API, you’ll need a twitter developer account and your account credentials. A way to access your keys and tokens is through Apps and App … Read more

How least squares regression estimates are actually calculated

Understanding how model outputs are calculated improves understanding of their interpretation FWStudio from Pexels Ordinary least squares is a method used by linear regression to get parameter estimates. This entails fitting a line so that the sum of the squared distance from each point to the regression line (residual) is minimized. Let’s visualize this in … Read more

Important Basics About Crytopcurrency

Pump and dump: This is a type of influencing that has been going on with traditional currencies for generations and has, unsurprisingly, made its way into the cryptocurrency markets where it is frowned upon but not against the law. The pump and dump works when an individual or group of individuals purchases up as much … Read more

How well does a value-based regression model perform in the 2016 Presidential Election?

Using multiple regression to investigate values predictive of Donald Trump voters in the 2016 election A D3.js map comparing the actual 2016 election results to the model predictions. Visit www.data607projects.com/final-project to interact with the full visualization. For some, it may be hard to believe, but the 2020 presidential election is roughly 300 days away. In … Read more

Generalization, Regularization, Overfitting, Bias and Variance in Machine Learning

How do you know if a machine learning model is actually learning something useful? It’s not as plain as it may seem, and it’s definitely worth taking a closer look. To begin with, this post is about the kind of machine learning that is explained in, for example, the classic book Elements of Statistical Learning. … Read more

PCA Is Not Feature Selection

What it actually does and when you can and can’t use it. Almost no data scientist would ever ask for less data, but the curse of dimensionality necessitates that something must be done to manage the many variables in a data set. Principal Component Analysis (PCA) is a useful tool for doing just that, but … Read more

Closet Data Scientists — Who Are They?

Picking Up Skills in an Era of Compressing Technology Waves The field of Artificial Intelligence (AI) is progressing so fast that the list of data science skill sets is growing faster than the threat Trump is placing on the global economy. Google’s Featured Snippet from the search “Data Scientist skill set” yields Statistics, R/Python, ETL, … Read more

2019: The Year of BERT

This sort of information is often better interactive, so here is a GIF. You can also check out the Jupyter notebook to play with the plot yourself, and the raw data is here. Mousing over a mass of BERT papers. Now that’s a lot of BERT papers. Some notes on this plot: It’s interesting to … Read more

Predicting the next decade in the stock market

Making accurate predictions using the vast amount of data produced by the stock markets and the economy itself is difficult. In this post we will examine the performance of five different machine learning models and predict the future ten-year returns for the S&P 500 using state of the art libraries such as caret, xgboostExplainer and … Read more

Categories R Tags ExcerptFavorite

von Bertalanffy Growth Plots I

[This article was first published on fishR Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction library(FSAdata) # for data library(FSA) # for vbFuns(), vbStarts(), confint.bootCase() library(car) … Read more

Categories R Tags ExcerptFavorite

An introduction to working with R and Python

This article is intended to be an introduction for working with R within Python Image by Mitchell Luo When I was a university student, the statistics courses (Survival Analysis, Multivariate Analysis, etc…) were taught in R. Nevertheless, as I wished to learn Data Science, I choose Python because it seemed “spooky” to me. By working … Read more

Javascript performance test – for vs for each vs (map, reduce, filter, find).

These results are from small examples and may vary as per the operation performed, the choice of execution env. and choice of VM. // calculated the sum of upVotesconst posts = [ {id: 1, upVotes: 2},{id: 2, upVotes: 18}, {id: 3, upVotes: 1}, {id: 4, upVotes: 30}, {id: 5, upVotes: 50} ];let sum = 0;console.time(‘reduce’);sum … Read more

Why Going from Implementing Q-learning to Deep Q-learning Can Be Difficult

3 Questions I was Afraid to Ask (and my Tensorflow 2.0 Template) Photo by JESHOOTS.COM on Unsplash For many people, myself included, Q-learning serves as an introduction to the world of reinforcement learning. It gets us neatly accustomed to the core ideas of states, actions and rewards in a way that is intuitive and not … Read more

A Great Way to DeDupe Image Datasets

Command-line tools in production. As far as time in manual labor, preparing data for an ML pipeline more often than not takes the majority. Furthermore, building or extending a database usually cost astronomical amounts of time, subtasks, and attention to detail. The latter led me to find a great command-line tool for cleaning out duplicates … Read more

The fulfilling Journey of Auria Kathi — The AI Poet Artist living in the clouds

On 1st January 2019, we (Fabin Rasheed and I) had introduced to the world, a side project we’ve been working on for months. An artificial poet-artist, who doesn’t physically exist in this world but writes a poem, draws an abstract art based on the poem and finally color the art based on emotion. We called … Read more

New Year’s Resolutions for Data Scientists

No matter what you want to get better at this year, here are some essential resolutions to lay the groundwork for your success: I will prioritize sleeping at least 8 hours a night, because sleep is essential to learning. I will dedicate time every day to moving my body and eating nourishing food, because my … Read more

How to Build an Object Detection Model using Watson AutoAI

Build an Object Detection Model for Brain Tumors without Coding Image by Robina Weermeijer — Unsplash Watson Studio — AutoAI AutoAI is a software platform produced by IBM and available within Watson Studio, which allows development, training, and deployment of models with ONE-CLIKC through Watson Machine Learning. Particularly, it helps to simplify the AI lifecycle … Read more

What Makes AI Intelligent?

A Guide to Artificial Networks, Deep Learning, AI and How They are Enhancing Solar Asset Management. Image Source: Pexels Artificial intelligence is at the peak of its hype curve, and its applications in the solar energy sector are amid a surge in popularity. Once upon a time confined solely to the domains of science fiction, … Read more

Working with Hive using AWS S3 and Python

The main objective of this article is to provide a guide to connect Hive through python and execute queries. I’m using “Pyhive” library for that. I’m creating my connection class as “HiveConnection” and Hive queries will be passed into the functions. AWS S3 will be used as the file storage for Hive tables. import pandas … Read more

Reasoning With Probability — Is My Model Good Enough?

# Check the model accuracy score againaccuracy = my_tree.score(X_100_test, y_100_test)print(f’Accuracy is {“{0:.2f}”.format(accuracy*100)}%’)>>> Accuracy is 86.00% We got a slightly lower number, but still comparable. That said, while the numbers are close, having faith in the model based on 100 examples is harder than having faith based on 14,000 examples. Intuitively, our confidence in the test … Read more

RStudio Blogs 2019

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you are lucky enough to have some extra time for discretionary … Read more

Categories R Tags ExcerptFavorite

Modeling salary and gender in the tech industry

[This article was first published on Rstats on Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One of the biggest projects I have worked on over … Read more

Categories R Tags ExcerptFavorite

Cluster analysis: theory and implementation of unsupervised algorithms

Industry applications Why is clustering so popular in statistics and machine learning fields? This is because cluster analysis is a powerful data mining tool in a wide range of business application cases. Here are just a few of many applications: Exploratory data analysis (EDA): Clustering is part of the most basic data analysis techniques employed … Read more

An Implementation of Distributed ACID Transactions

By Daniel Goméz Ferro and Monte Zweben Splice Machine is a Hybrid Transactional/Analytical Processing database (HTAP) that is designed to modernize legacy applications. By combining aspects of a traditional RDBMS database, such as ANSI SQL support and ACID (Atomicity, Consistency, Isolation, Durability) transactions, with the scalability, efficiency, and availability of in-memory analytics and machine learning, … Read more

Real Estate Investing in Texas

Can you predict housing prices in Texas using time series analysis? Photo by M. B. M. on Unsplash Introduction For this project, I performed a time series analysis using Zillow’s historical median housing prices for the United States. The aim was to provide investors with the best zip codes to buy and develop homes in … Read more

Write Clean and SOLID Scala Spark Jobs

Creating data pipelines by writing spark jobs is nowadays easier due to the growth of new tools and data platforms that allow multiple data parties (analysts, engineers, scientists, etc.) to focus on understanding data and writing logic to get insights. Nevertheless, new tools like notebooks that allow easy scripting, sometimes are not well used and … Read more

Introduction to Data Science in R, Free for 3 days

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. To celebrate the new year and the recent release of … Read more

Categories R Tags ExcerptFavorite

A quick and dirty guide to visualization in Plotly for Python

There are endless options to create plots in Python, but one of the libraries that I have started to gravitate towards is Plotly. It is a solid option to create beautiful and presentable plots easily within your Jupyter workflow. It has vast selection of customizable plots that you can use for your visualizations (the entire … Read more

Data Science Methodology 101

How can a Data Scientist organize his work? Every Data Scientist needs a methodology to solve data science’s problems. For example, let’s suppose that you are a Data Scientist and your first job is to increase sales for a company, they want to know what product they should sell on what period. You will need … Read more

Animated Information Graphics

with Python and Plotly Animated information graphics of various datasets are a popular topic on youtube, for example, the channel Data Is Beautiful This channel is my passion project taking us on a fun trip down memory lane together so we can relive the colorful… www.youtube.com has almost a million subscribers. I will show in … Read more

How to Create a Simple Cancer Survival Prediction Model with EDA

A Thorough Walkthrough of Exploratory Data Analysis Techniques with Haberman’s Cancer Survival Dataset using Python to Create a Simple Predictive Cancer Survival Model Illustration by John Flores The impetus for this blog and the resultant cancer survival prediction model is to provide a glimpse into the potential of the healthcare industry. Healthcare continues to learn … Read more

Everything you need to know about “Activation Functions” in Deep learning models

This article is your one-stop solution to every possible question related to activation functions that can come into your mind that are used in deep learning models. These are basically my notes on activation functions and all the knowledge that I have about this topic summed together in one place. So, without going into any … Read more

A Beginner’s Guide to Preprocessing Text Data Using NLP Tools

Code used to preprocess Tweets obtained from Twitter Source: https://www.blumeglobal.com/learning/natural-language-processing/ Below I’ve outlined the code I used to preprocess my Natural Language Processing projects. This code has mostly been used on tweets obtained from Twitter for classifers. My hope is that this guide assists aspiring data scientists and machine learning engineers by familiarizing them with … Read more

How Machine Learning Enhances Business Automation

From Predictive Analytics to predictive HR Support Image Source: Pexels.com The machines are here. They’re learning. And they’re coming for your business — with the power to build or destroy your ability to compete in the near future. As Margaret Laffan, Machine Learning Business Development Director for SAP says in Forbes, “Those companies not considering … Read more

Redesigning a Bad Graph — Spaghetti to Micromaps

Redesigning a Bad Graph — Spaghetti to Micromaps Authors: Chaithanya Pramodh Kasula and Aishwarya Varala Introduction The primary focus of the current report is redesigning a bad graph depicting Opioid overdose death rates per 100,000 (Age-Adjusted), in different states of the US. It also discusses detailed approaches and techniques used to obtain meaningful insights form … Read more

Costa Rican Household Poverty Level Prediction in R

A comprehensive data analysis and prediction in R using Machine Learning Authors: Chaithanya Pramodh Kasula and Aishwarya Varala A map of Costa Rica Introduction: The current report details the process of answering several research questions related to the poverty levels of Costa Rican households. It is comprised of data sources, exploratory data analysis through visualization, … Read more

The Biggest AI Risk of the Next Decade is not a Robot Uprising

Grand generalisations about the future impacts of artificial general intelligence overshadow the more pressing issues we face today. Finally, robotic beings rule the world — pictures of the Terminator and HAL are just played out at this point. (Flight of The Conchords, Robots) The past decade has seen many interesting and impressive developments in tech … Read more

Multi-Armed Bandits and Reinforcement Learning

A Gentle Introduction to the Classic Problem with Python Examples Photo by Carl Raw on Unsplash Multi-armed bandit problems are some of the simplest reinforcement learning (RL) problems to solve. We have an agent which we allow to choose actions, and each action has a reward that is returned according to a given, underlying probability … Read more

The Competition Mindset: where Kaggle and Industry diverge

There are three major reasons why Kaggle is not Industry Data Science: The Objective: Kaggle and Industry Data Science share radically different objectives and aims. The Techniques: Kaggle competitions prioritize techniques not readily utilized in industry. The Data: Kaggle provides you the data; the real world doesn’t. 1. The Objective of Kaggle You might have … Read more

Goal Setting for Data Scientists

A framework for setting and achieving career goals Setting goals, particularly around the end of December can be a great way to reflect and think about what you would like to get out of your career in the new year. In this article, we’ll discuss a framework for data scientists to use to achieve their … Read more

mapply and Map in R

An older post on this blog talked about several alternative base apply functions. This post will talk about how to apply a function across multiple vectors or lists with Map and mapply in R. These functions are generalizations of sapply and lapply, which allow you to more easily loop over multiple vectors or lists simultaneously. … Read more

Categories R Tags ExcerptFavorite