A.I Adoption in the Financial Sector

How A.I and Machine learning technologies are being accepted and embraced by capital market firms. Image Source: Pexels.com This piece, based on a survey carried out by Waters Technology, explores increased interest from firms on both sides of the industry around the development and deployment of AI technology across their back offices, some use-cases for … Read more A.I Adoption in the Financial Sector

Making Deep Learning models ready for the worst-case scenario and cross-platform ready with…

As 2020 just arrived, the community of the deep learning experts and enthusiasts are looking forth to a significant year of innovation in the field. With a mounted figure of deep learning models being built every day around the world, the dependencies of the humankind on the Cloud and Network(especially TCP ) is expanding day-by-day. … Read more Making Deep Learning models ready for the worst-case scenario and cross-platform ready with…

The math behind GANs (Generative Adversarial Networks)

A detailed understanding of the math behind original GANs including their limitations The Generative Adversarial Network (GAN) comprises of two models: a generative model G and a discriminative model D. The generative model can be considered as a counterfeiter who is trying to generate fake currency and use it without being caught, whereas the discriminative … Read more The math behind GANs (Generative Adversarial Networks)

Mining Twitter on IBM Cloud Platform

When working on Watson you may need to install tweepy each time you reconnect. In order to retrieve tweets, those are the minimum necessary imports and installation: For connecting to twitter API, you’ll need a twitter developer account and your account credentials. A way to access your keys and tokens is through Apps and App … Read more Mining Twitter on IBM Cloud Platform

How least squares regression estimates are actually calculated

Understanding how model outputs are calculated improves understanding of their interpretation FWStudio from Pexels Ordinary least squares is a method used by linear regression to get parameter estimates. This entails fitting a line so that the sum of the squared distance from each point to the regression line (residual) is minimized. Let’s visualize this in … Read more How least squares regression estimates are actually calculated

Important Basics About Crytopcurrency

Pump and dump: This is a type of influencing that has been going on with traditional currencies for generations and has, unsurprisingly, made its way into the cryptocurrency markets where it is frowned upon but not against the law. The pump and dump works when an individual or group of individuals purchases up as much … Read more Important Basics About Crytopcurrency

How well does a value-based regression model perform in the 2016 Presidential Election?

Using multiple regression to investigate values predictive of Donald Trump voters in the 2016 election A D3.js map comparing the actual 2016 election results to the model predictions. Visit www.data607projects.com/final-project to interact with the full visualization. For some, it may be hard to believe, but the 2020 presidential election is roughly 300 days away. In … Read more How well does a value-based regression model perform in the 2016 Presidential Election?

Generalization, Regularization, Overfitting, Bias and Variance in Machine Learning

How do you know if a machine learning model is actually learning something useful? It’s not as plain as it may seem, and it’s definitely worth taking a closer look. To begin with, this post is about the kind of machine learning that is explained in, for example, the classic book Elements of Statistical Learning. … Read more Generalization, Regularization, Overfitting, Bias and Variance in Machine Learning

Closet Data Scientists — Who Are They?

Picking Up Skills in an Era of Compressing Technology Waves The field of Artificial Intelligence (AI) is progressing so fast that the list of data science skill sets is growing faster than the threat Trump is placing on the global economy. Google’s Featured Snippet from the search “Data Scientist skill set” yields Statistics, R/Python, ETL, … Read more Closet Data Scientists — Who Are They?

Finding your dream home within hours — the data driven way

Finding your dream home is one of the more exciting pursuits in life. But it is also an incredibly daunting task. Many of us spend a huge amount of time, thought, energy, and — of course — money, to find a place we can call home. I know this because it’s all I ever hear … Read more Finding your dream home within hours — the data driven way

Predicting the next decade in the stock market

Making accurate predictions using the vast amount of data produced by the stock markets and the economy itself is difficult. In this post we will examine the performance of five different machine learning models and predict the future ten-year returns for the S&P 500 using state of the art libraries such as caret, xgboostExplainer and … Read more Predicting the next decade in the stock market

von Bertalanffy Growth Plots I

[This article was first published on fishR Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction library(FSAdata) # for data library(FSA) # for vbFuns(), vbStarts(), confint.bootCase() library(car) … Read more von Bertalanffy Growth Plots I

An introduction to working with R and Python

This article is intended to be an introduction for working with R within Python Image by Mitchell Luo When I was a university student, the statistics courses (Survival Analysis, Multivariate Analysis, etc…) were taught in R. Nevertheless, as I wished to learn Data Science, I choose Python because it seemed “spooky” to me. By working … Read more An introduction to working with R and Python

Javascript performance test – for vs for each vs (map, reduce, filter, find).

These results are from small examples and may vary as per the operation performed, the choice of execution env. and choice of VM. // calculated the sum of upVotesconst posts = [ {id: 1, upVotes: 2},{id: 2, upVotes: 18}, {id: 3, upVotes: 1}, {id: 4, upVotes: 30}, {id: 5, upVotes: 50} ];let sum = 0;console.time(‘reduce’);sum … Read more Javascript performance test – for vs for each vs (map, reduce, filter, find).

Why Going from Implementing Q-learning to Deep Q-learning Can Be Difficult

3 Questions I was Afraid to Ask (and my Tensorflow 2.0 Template) Photo by JESHOOTS.COM on Unsplash For many people, myself included, Q-learning serves as an introduction to the world of reinforcement learning. It gets us neatly accustomed to the core ideas of states, actions and rewards in a way that is intuitive and not … Read more Why Going from Implementing Q-learning to Deep Q-learning Can Be Difficult

A Great Way to DeDupe Image Datasets

Command-line tools in production. As far as time in manual labor, preparing data for an ML pipeline more often than not takes the majority. Furthermore, building or extending a database usually cost astronomical amounts of time, subtasks, and attention to detail. The latter led me to find a great command-line tool for cleaning out duplicates … Read more A Great Way to DeDupe Image Datasets

The fulfilling Journey of Auria Kathi — The AI Poet Artist living in the clouds

On 1st January 2019, we (Fabin Rasheed and I) had introduced to the world, a side project we’ve been working on for months. An artificial poet-artist, who doesn’t physically exist in this world but writes a poem, draws an abstract art based on the poem and finally color the art based on emotion. We called … Read more The fulfilling Journey of Auria Kathi — The AI Poet Artist living in the clouds

New Year’s Resolutions for Data Scientists

No matter what you want to get better at this year, here are some essential resolutions to lay the groundwork for your success: I will prioritize sleeping at least 8 hours a night, because sleep is essential to learning. I will dedicate time every day to moving my body and eating nourishing food, because my … Read more New Year’s Resolutions for Data Scientists

How to Build an Object Detection Model using Watson AutoAI

Build an Object Detection Model for Brain Tumors without Coding Image by Robina Weermeijer — Unsplash Watson Studio — AutoAI AutoAI is a software platform produced by IBM and available within Watson Studio, which allows development, training, and deployment of models with ONE-CLIKC through Watson Machine Learning. Particularly, it helps to simplify the AI lifecycle … Read more How to Build an Object Detection Model using Watson AutoAI

Working with Hive using AWS S3 and Python

The main objective of this article is to provide a guide to connect Hive through python and execute queries. I’m using “Pyhive” library for that. I’m creating my connection class as “HiveConnection” and Hive queries will be passed into the functions. AWS S3 will be used as the file storage for Hive tables. import pandas … Read more Working with Hive using AWS S3 and Python

Reasoning With Probability — Is My Model Good Enough?

# Check the model accuracy score againaccuracy = my_tree.score(X_100_test, y_100_test)print(f’Accuracy is {“{0:.2f}”.format(accuracy*100)}%’)>>> Accuracy is 86.00% We got a slightly lower number, but still comparable. That said, while the numbers are close, having faith in the model based on 100 examples is harder than having faith based on 14,000 examples. Intuitively, our confidence in the test … Read more Reasoning With Probability — Is My Model Good Enough?

RStudio Blogs 2019

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you are lucky enough to have some extra time for discretionary … Read more RStudio Blogs 2019

Training on batch: how to split data effectively?

As for the spectrogram, you can think of it as a way of describing how much of each “tune” is present within the audio track. For instance, when a bass guitar is being played, the spectrogram would reveal high intensity more concentrated on the lower side of the spectrum. Conversely, with a soprano singer, we … Read more Training on batch: how to split data effectively?

Cluster analysis: theory and implementation of unsupervised algorithms

Industry applications Why is clustering so popular in statistics and machine learning fields? This is because cluster analysis is a powerful data mining tool in a wide range of business application cases. Here are just a few of many applications: Exploratory data analysis (EDA): Clustering is part of the most basic data analysis techniques employed … Read more Cluster analysis: theory and implementation of unsupervised algorithms

An Implementation of Distributed ACID Transactions

By Daniel Goméz Ferro and Monte Zweben Splice Machine is a Hybrid Transactional/Analytical Processing database (HTAP) that is designed to modernize legacy applications. By combining aspects of a traditional RDBMS database, such as ANSI SQL support and ACID (Atomicity, Consistency, Isolation, Durability) transactions, with the scalability, efficiency, and availability of in-memory analytics and machine learning, … Read more An Implementation of Distributed ACID Transactions

The Importance of Ethics in Artificial Intelligence

(Or any other form of technology for that matter) “Just because we can, doesn’t mean we should” could be something to keep in mind when it comes to innovating with technology. The arrival of The Internet has 10x’d the speed of innovation and allows us to pretty much create anything we can think of. Artificial … Read more The Importance of Ethics in Artificial Intelligence

Let me recall you this: accuracy isn’t everything in Machine Learning.

Why recall is so important when evaluating your Machine Learning model? source Everyday, data science professionals can’t stop thinking one thing: is that model really working? Data is like a live creature, and it changes and get messy almost everyday. At the end, all we want is to find a way to handle it and … Read more Let me recall you this: accuracy isn’t everything in Machine Learning.

Write Clean and SOLID Scala Spark Jobs

Creating data pipelines by writing spark jobs is nowadays easier due to the growth of new tools and data platforms that allow multiple data parties (analysts, engineers, scientists, etc.) to focus on understanding data and writing logic to get insights. Nevertheless, new tools like notebooks that allow easy scripting, sometimes are not well used and … Read more Write Clean and SOLID Scala Spark Jobs

Introduction to Data Science in R, Free for 3 days

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. To celebrate the new year and the recent release of … Read more Introduction to Data Science in R, Free for 3 days

A quick and dirty guide to visualization in Plotly for Python

There are endless options to create plots in Python, but one of the libraries that I have started to gravitate towards is Plotly. It is a solid option to create beautiful and presentable plots easily within your Jupyter workflow. It has vast selection of customizable plots that you can use for your visualizations (the entire … Read more A quick and dirty guide to visualization in Plotly for Python

Animated Information Graphics

with Python and Plotly Animated information graphics of various datasets are a popular topic on youtube, for example, the channel Data Is Beautiful This channel is my passion project taking us on a fun trip down memory lane together so we can relive the colorful… www.youtube.com has almost a million subscribers. I will show in … Read more Animated Information Graphics

How to Create a Simple Cancer Survival Prediction Model with EDA

A Thorough Walkthrough of Exploratory Data Analysis Techniques with Haberman’s Cancer Survival Dataset using Python to Create a Simple Predictive Cancer Survival Model Illustration by John Flores The impetus for this blog and the resultant cancer survival prediction model is to provide a glimpse into the potential of the healthcare industry. Healthcare continues to learn … Read more How to Create a Simple Cancer Survival Prediction Model with EDA

Everything you need to know about “Activation Functions” in Deep learning models

This article is your one-stop solution to every possible question related to activation functions that can come into your mind that are used in deep learning models. These are basically my notes on activation functions and all the knowledge that I have about this topic summed together in one place. So, without going into any … Read more Everything you need to know about “Activation Functions” in Deep learning models

A Beginner’s Guide to Preprocessing Text Data Using NLP Tools

Code used to preprocess Tweets obtained from Twitter Source: https://www.blumeglobal.com/learning/natural-language-processing/ Below I’ve outlined the code I used to preprocess my Natural Language Processing projects. This code has mostly been used on tweets obtained from Twitter for classifers. My hope is that this guide assists aspiring data scientists and machine learning engineers by familiarizing them with … Read more A Beginner’s Guide to Preprocessing Text Data Using NLP Tools

How Machine Learning Enhances Business Automation

From Predictive Analytics to predictive HR Support Image Source: Pexels.com The machines are here. They’re learning. And they’re coming for your business — with the power to build or destroy your ability to compete in the near future. As Margaret Laffan, Machine Learning Business Development Director for SAP says in Forbes, “Those companies not considering … Read more How Machine Learning Enhances Business Automation

Redesigning a Bad Graph — Spaghetti to Micromaps

Redesigning a Bad Graph — Spaghetti to Micromaps Authors: Chaithanya Pramodh Kasula and Aishwarya Varala Introduction The primary focus of the current report is redesigning a bad graph depicting Opioid overdose death rates per 100,000 (Age-Adjusted), in different states of the US. It also discusses detailed approaches and techniques used to obtain meaningful insights form … Read more Redesigning a Bad Graph — Spaghetti to Micromaps

Costa Rican Household Poverty Level Prediction in R

A comprehensive data analysis and prediction in R using Machine Learning Authors: Chaithanya Pramodh Kasula and Aishwarya Varala A map of Costa Rica Introduction: The current report details the process of answering several research questions related to the poverty levels of Costa Rican households. It is comprised of data sources, exploratory data analysis through visualization, … Read more Costa Rican Household Poverty Level Prediction in R

The Biggest AI Risk of the Next Decade is not a Robot Uprising

Grand generalisations about the future impacts of artificial general intelligence overshadow the more pressing issues we face today. Finally, robotic beings rule the world — pictures of the Terminator and HAL are just played out at this point. (Flight of The Conchords, Robots) The past decade has seen many interesting and impressive developments in tech … Read more The Biggest AI Risk of the Next Decade is not a Robot Uprising

Multi-Armed Bandits and Reinforcement Learning

A Gentle Introduction to the Classic Problem with Python Examples Photo by Carl Raw on Unsplash Multi-armed bandit problems are some of the simplest reinforcement learning (RL) problems to solve. We have an agent which we allow to choose actions, and each action has a reward that is returned according to a given, underlying probability … Read more Multi-Armed Bandits and Reinforcement Learning

The Competition Mindset: where Kaggle and Industry diverge

There are three major reasons why Kaggle is not Industry Data Science: The Objective: Kaggle and Industry Data Science share radically different objectives and aims. The Techniques: Kaggle competitions prioritize techniques not readily utilized in industry. The Data: Kaggle provides you the data; the real world doesn’t. 1. The Objective of Kaggle You might have … Read more The Competition Mindset: where Kaggle and Industry diverge

mapply and Map in R

An older post on this blog talked about several alternative base apply functions. This post will talk about how to apply a function across multiple vectors or lists with Map and mapply in R. These functions are generalizations of sapply and lapply, which allow you to more easily loop over multiple vectors or lists simultaneously. … Read more mapply and Map in R