Machine Learning and Data Analysis — Inha University (Part-4)

In this part of the series Machine learning and Data analysis offer by Inha University, Rep. of Korea I’ll try to narrate Built-in and User-defined Functions and Modules in Python. From my point of view, it’ll be helpful for the new learners in python to understand it clearly. If you like to start from the … Read more Machine Learning and Data Analysis — Inha University (Part-4)

Leading a Data Science Team when you are not a Data Scientist

This was me when I took a position managing a data science team. My background is in social science and policy. I never coded before outside of some Stata for my masters degree (and the use of “coding” in Stata is arguable) and most of my math and statistics ware based on social science research. … Read more Leading a Data Science Team when you are not a Data Scientist

Detecting pedestrians and bikers on a drone with Jetson Xavier

Drones are one of the coolest technologies every maker and enthusiast wants to lay their hands on. At the same time as drones are becoming common, AI is rapidly advancing and we are now in a state where object detection and semantic segmentation are possible right onboard the drone. In this blog post, I will … Read more Detecting pedestrians and bikers on a drone with Jetson Xavier

Walkthrough: Mapping Basics with bokeh and GeoPandas in Python

Goal Create a map of the contiguous US that shows the state population. Within each state, show where lead was found in 2018. Creating The Contiguous USA In order to create a map, you will need a shapefile (.shp). In this case, I downloaded a shapefile from the US Census Bureau here. The file tells … Read more Walkthrough: Mapping Basics with bokeh and GeoPandas in Python

A.I Adoption for Capital Market Technologists

How A.I and Machine learning are being accepted and embraced by capital market firms. Image Source: UnSplash This piece, based on a survey carried out by Waters Technology, explores increased interest from firms on both sides of the industry around the development and deployment of AI technology across their back offices, some use-cases for the … Read more A.I Adoption for Capital Market Technologists

Global Warming and Malaria in Developing Regions: An Analysis in Python

The data set only includes 4 points per region and there are 127 regions. There isn’t much data per region so any analysis should be taken with a grain of salt. Knowing that developing regions are more vulnerable to the risks that climate change pose, it would be useful to narrow our scope. Time Magazine … Read more Global Warming and Malaria in Developing Regions: An Analysis in Python

Using Airflow and Spark To Crunch US Immigration Data

For example, prior to any triggering of Spark Jobs, the data-sets needed to be downloaded from s3 and unzipped. Using Airflow documents this dependency, and if a downstream task fails (syncing of files from s3) then the dependent upstream task won’t be invoked. This improved the stability of the pipeline and prevented runaway code from … Read more Using Airflow and Spark To Crunch US Immigration Data

IaaS vs PaaS: Infrastructure as a Service VS Platform as a Service

To begin with, many businesses are going online. They are relying heavily on the cloud to facilitate their clients, which demands to collect, storing, and processing a vast amount of data before it can be presented to the end-user as information. This is where cloud-based web applications come in to play. In this article, we’re … Read more IaaS vs PaaS: Infrastructure as a Service VS Platform as a Service

Feature Engineering Techniques

Feature Engineering is one of the most important steps to complete before starting a Machine Learning analysis. Creating the best possible Machine Learning/Deep Learning model can certainly help to achieve good results, but choosing the right features in the right format to feed in a model can by far boost performances leading to the following … Read more Feature Engineering Techniques

Key points when using Reddit as a source of data

Main citations Adams, N., Artigiani, E.E. and Wish, E.D. (2019) ‘Choosing Your Platform for Social Media Drug Research and Improving Your Keyword Filter List’, Journal of Drug Issues, 49(3), pp. 477–492. doi: 10.1177/0022042619833911. Jamnik, M.R. and Lane, D.J. (2017) ‘The Use of Reddit as an Inexpensive Source for High-Quality Data’, Practical Assessment, Research & Evaluation, … Read more Key points when using Reddit as a source of data

Email Automation with Python

Automate emails with attachments in python https://www.helpsystems.com/resources/guides/automated-operations-5-benefits-your-organization When I first started using python I saw it as an upgrade to excel. A tool I could use to improve my work in data analysis. The better I got at python the more streamlined my analysis became, and I started to realize python was more than a … Read more Email Automation with Python

Democracy’s Unsettling Future in the Age of AI

Unprecedented challenges and new opportunities — information bubbles, the tyranny of the minority, and future of liberty and equality The dark forces of global politics — illiberalism, populist nationalism and protectionism — have recently reasserted themselves and threaten to subvert liberal democracy and the rules-based world order. These fundamental sociopolitical currents pose challenges to the … Read more Democracy’s Unsettling Future in the Age of AI

Building Actuarial Functions in Python

These annuity formulas can help us to find any unknown in an annuity problem. In the calculator file I mainly use these annuity functions to help solve for unknown payment amounts. While I could have also utilized these formulas for present value and future value calculations, I decided to take a more intuitive programming approach … Read more Building Actuarial Functions in Python

Amazon Elastic Container Service publishes multiple GitHub Actions

Development teams collaborate on GitHub to share their code and commit changes quickly, but actually getting the code to run in the cloud is seen as a multi-step error-prone task. For containerized applications, a developer needs to build an image, publish it to a registry, create a ‘manifest’ type file describing the application for the … Read more Amazon Elastic Container Service publishes multiple GitHub Actions

Doing Yelp Better —

Using NLP to provide quantitative metrics for restaurants. Welcome to Yelp Report Cards. In this project, if is my aim to produce a business value-add to Yelp’s current business owner services. Using NLTK’s Vader Sentiment Analysis and combining it with Genism’s LDA library, I created a quantitative report card for restaurants to determine areas in … Read more Doing Yelp Better —

Cloud Run, a managed Knative service, is GACloud Run, a managed Knative service, is GADirector of Product Management, Google CloudVice President of Product & Design

We want to empower developers no matter where their businesses are in their cloud journey, whether that’s on-prem, operating in a managed Kubernetes environment, or running on a fully managed serverless computing platform. Today, we’re announcing that Cloud Run is generally available, helping developers focus on writing high-value code, regardless of where their organizations are … Read more Cloud Run, a managed Knative service, is GACloud Run, a managed Knative service, is GADirector of Product Management, Google CloudVice President of Product & Design

Shifting gears: How the cloud drives digital transformation in the automotive industryShifting gears: How the cloud drives digital transformation in the automotive industryManaging Director Manufacturing and Transportation, Google Cloud

It’s undeniable that technology has improved many facets of modern life. Transportation and mobility, however, continues to be an area where more can be done. For example, look no further than your daily commute. Research by INRIX shows that time spent in traffic has more than doubled in many major cities around the world since … Read more Shifting gears: How the cloud drives digital transformation in the automotive industryShifting gears: How the cloud drives digital transformation in the automotive industryManaging Director Manufacturing and Transportation, Google Cloud

How to Use Airflow without Headaches

Photo by Sebastian Herrmann on Unsplash Data pipelines and/or batch jobs that process and move data on a scheduled basis are well known to all us data folks. The de-facto standard tool to orchestrate all that is Apache Airflow. It is a platform to programmatically author, schedule, and monitor workflows. A workflow is a sequence … Read more How to Use Airflow without Headaches

Difference Between NFD, NFC, NFKD, and NFKC Explained with Python Code

The difference between Unicode normalization forms Photo by Joel Filipe on Unsplash Recently I am working on an NLP task in Japanese, one problem is to convert special characters to a normalized form. So I have done a little research and write this post for anyone who has the same need. Japanese contains different forms … Read more Difference Between NFD, NFC, NFKD, and NFKC Explained with Python Code

My Favorite Machine-Learning Models ALL Data-Scientists Should Know

These are my subjective favorite models, and the ones that I find myself turning to most often. There is a tool for every job just as there is a model for every job, just as there are features for every job. With time comes the wonderful Data Science Superpower that is knowing the best way … Read more My Favorite Machine-Learning Models ALL Data-Scientists Should Know

A 7-step guide for effective survey recruitment on Reddit

A helpful how-to guide on how to go about conducting online surveys Making surveys can be deceptively easy. Platforms like Google Forms, Surveymonkey and university bought services like, Qualtrics makes surveys cheap, quick and easy to write. However, doing research online comes with its own potential methodological problems. And doing research on specific platforms come … Read more A 7-step guide for effective survey recruitment on Reddit

What Online Poker Players Can Teach Us About AI

The Human Side of Things Successful Poker Player (Background) and Me (Foreground) Poker is considered a good challenge for AI, as it is seen as combination of mathematical/strategic play, and human intuition, especially about the strategies of others. I would consider the game a cross between the two extremes of technical vs. human skill: chess … Read more What Online Poker Players Can Teach Us About AI

Forrester names Microsoft a leader in Wave report for Industrial IoT Software Platforms

As a company, we work every day to empower every person on the planet to achieve more. As part of that, we’re committed to investing in IoT and intelligent edge, two technology trends accelerating ubiquitous computing and bringing unparalleled opportunity for transformation across industries. We’ve been working hard to make our Azure IoT platform more … Read more Forrester names Microsoft a leader in Wave report for Industrial IoT Software Platforms

How to build globally distributed applications with Azure Cosmos DB and Pulumi

This post was co-authored by Mikhail Shilkov, Software Engineer, Pulumi. Pulumi is reinventing how people build modern cloud applications, with a unique platform that combines deep systems and infrastructure innovation with elegant programming models and developer tools. We live in amazing times when people and businesses on different continents can interact at the speed of … Read more How to build globally distributed applications with Azure Cosmos DB and Pulumi

Democratizing Smart City solutions with Azure IoT Central

One of the most dynamic landscapes embracing Internet of Things (IoT) is the modern city. As urbanization grows, city leaders are under increasing pressure to make cities safer, accessible, sustainable, and prosperous. Underlying all these important goals is the bedrock that makes a city run: infrastructure. Whether it be water, electricity, streets, traffic lights, cities … Read more Democratizing Smart City solutions with Azure IoT Central

Predict figure skating world championship ranking from season performances

Finally, the latent scores for the 2 factors across all skaters can be retrieved at the end in a pandas DataFrame, with the previously-stored skater names added back in as row index, and the factors (0 and 1) as columns: pd.DataFrame(skater_scores, index=skater_names)# 0 1# Alexander, MAJOROV 10.226857 3.122723# Javier, FERNANDEZ 16.009246 4.594796# Misha, GE 11.919684 … Read more Predict figure skating world championship ranking from season performances

Building Analytics and Data Lake Capabilities with Limited Data

Nothing ever exists entirely alone. Everything is in relation to everything else. Buddha A very common problem many organizations, especially startups, encounters is having little or no data. Unfortunately, operating blind and without data insights is no longer an option. Operating with data is simply table stakes; all companies leverage data to make strategic shifts … Read more Building Analytics and Data Lake Capabilities with Limited Data

Random Search vs Grid Search for hyperparameter optimization

I’ll follow my jupyter notebook to make things easier to show. Feel free to either run it or implement the code on your own. Keep in mind that some code snippets use code implemented in previous snippets, therefore the order of occurrence matters. All mentioned files in this post are available in my GitHub. Check … Read more Random Search vs Grid Search for hyperparameter optimization

Resampling Methods for Unbalanced Datasets — Fraudulent Transactions

Suppose you are tasked with developing a simple machine learning algorithm, whether supervised (with a clear target variable) or unsupervised (with no predefined outcome variable).But looking at your data, it seems that one class dominates the other. In such a case, your model will have a hard time learning from your data to predict future … Read more Resampling Methods for Unbalanced Datasets — Fraudulent Transactions

How to add machine learning-powered text summarization to any project

Have you ever wondered how media organizations are able to produce the raw volume of content they output? How is that the Associated Press, in addition to all of their other coverage, is able to cover 4,400 quarterly earning reports each year? How does The Washington Post run such hyperlocal coverage — like covering every … Read more How to add machine learning-powered text summarization to any project

How Machine Learning Improves Marketing Automation

From Product Pricing to Content Research Image Source: UnSplash Even before the digital age, marketing professionals have been among the most eager and early adopters of emerging technologies. AI and machine learning are already popular tools in digital marketing. This evolution is likely to evolve and expand in the next several years as data multiplies, … Read more How Machine Learning Improves Marketing Automation

Automatic Machine Learning in Fraud Detection Using H2O AutoML

Machine Learning Automation in Finance Machine learning has many applications in finance such as security, process automation, loan/insurance underwriting, credit scoring, trading, etc. [1][2]. Financial fraud is one of the major concerns in financial security [1][2]. To fight the increasing risk of financial fraud, machine learning has been actively applied to fraud detection [3][4]. There … Read more Automatic Machine Learning in Fraud Detection Using H2O AutoML

AI, Social Data Science and the Climate Crisis

Bridging Social Science and Technology For our Planet There is still no Wikipedia explanation on Social Data Science, not that it would make it established as a field, but it is more of a side note in these beginnings. The last few days I have been considering how to put together a program that would … Read more AI, Social Data Science and the Climate Crisis

workloopR: Analysis of work loops and other data from muscle physiology experiments in R

Studies of muscle physiology often rely on closed-source, proprietary software for not only recording data but also for data wrangling and analyses. Although specialized software might be necessary to record data from highly-specialized equipment, data wrangling and analyses should be free from this constraint. It’s becoming more common for researchers to provide code along with … Read more workloopR: Analysis of work loops and other data from muscle physiology experiments in R

Easily deploy SQL Server Always On solutions using the AWS Launch Wizard for SQL Server

AWS Launch Wizard for SQL Server reduces the time it takes to deploy SQL Server Always On solutions on Amazon EC2, accelerating your journey to the cloud. You simply input your SQL Server requirements including performance, number of nodes, and connectivity on the service console, and AWS Launch Wizard identifies the right AWS resources to … Read more Easily deploy SQL Server Always On solutions using the AWS Launch Wizard for SQL Server

Exploiting Multi-Categorical Features Using Deep Interest

How can your deep learning model get the most out of features with varying length? Originally published at Taboola Engineering Blog on September 4, 2019. At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most important ones … Read more Exploiting Multi-Categorical Features Using Deep Interest

Review: Andrew Ng’s Machine Learning Course

Auditory learner? Listen to the post instead! Stanford’s Machine Learning course taught by Andrew Ng was released in 2011. 8 years after publication, Andrew Ng’s course is still ranked as one of the top machine learning courses. This has become a staple course of Coursera and, to be honest, in machine learning. As of this … Read more Review: Andrew Ng’s Machine Learning Course

Generating Modern Arts using Generative Adversarial Network(GAN) on Spell

After initializing both generator and discriminator model, let’s write a helper function to save the image after some iteration. def save_images(cnt, noise):image_array = np.full((PREVIEW_MARGIN + (PREVIEW_ROWS * (IMAGE_SIZE + PREVIEW_MARGIN)),PREVIEW_MARGIN + (PREVIEW_COLS * (IMAGE_SIZE + PREVIEW_MARGIN)), 3),255, dtype=np.uint8)generated_images = generator.predict(noise)generated_images = 0.5 * generated_images + 0.5image_count = 0for row in range(PREVIEW_ROWS):for col in range(PREVIEW_COLS):r = … Read more Generating Modern Arts using
Generative Adversarial Network(GAN) on Spell

“Survey Says…” — Why Information Based on Surveys Is Not Always Trustworthy

There are good reasons why you need to think twice about survey results Photo: bruce mars/Unsplash As much as I love words, I love numbers. I’m always fascinated by the stories and pictures behind statistics. I am particularly interested in surveys — maybe because I am a big fan of Family Feud. A survey is … Read more “Survey Says…” — Why Information Based on Surveys Is Not Always Trustworthy

Vector Autoregressions & Vector Error Correction Multivariate Model

VECTOR autoregressions (VAR) integrated model comprises multiple time series and is quite a useful tool for forecasting. It can be considered an extension of the autoregressive (AR part of ARIMA) model. VAR model involves multiple independent variables and therefore has more than one equations. Each equation uses as its explanatory variables lags of all the … Read more Vector Autoregressions & Vector Error Correction Multivariate Model