## Practical Statistics & Visualization With Python & Plotly

Photo credit: Pixabay How to use Python and Plotly for statistical visualization, inference, and modeling One day last week, I was googling “statistics with Python”, the results were somewhat unfruitful. Most literature, tutorials and articles focus on statistics with R, because R is a language dedicated to statistics and has more statistical analysis features than Python. In … Read more

## Genetic algorithm vs. Backtracking: N-Queen Problem

A few months ago, I got familiar with genetic algorithms. I started to read about it and I was pretty amazed by it. One of the most famous problems solved by genetic algorithms is the n-queen problem. I implemented my genetic solver, plus the famous old backtracking solver using python 3. I implemented a Chess … Read more

## Finding Bayesian Legos

Photo credit: Frédérique Voisin-Demery/Flickr (CC BY 2.0) Joe, a good family friend, dropped by earlier this week. As we do often, we discussed the weather (seems to be hotter than normal already here in the Pacific Northwest), the news (mostly about how we are both taking actions to avoid the news), and our kids. Both of … Read more

## Get started with Object Oriented Programming in Python: Classes and Instances

New to OOP? Learn how to write a class and create instances in Python There are a lot of articles popping up on object-oriented programming in Python at the moment. Many data scientists, myself included, find ourselves in roles that focus on writing functional code, often in small scripts or prototypes. I’ve been working as a … Read more

## Challenges in sentiment analysis: a case for word clouds (for now)

Exploring simple python code visualizations for marketing Machine understanding and capability get merged together in popular culture. When I think about artificial intelligence, I get into this tricky habit of mixing understanding with capability. I imagine that there are ways we can tell how much a machine knows by what it can produce. However, the … Read more

## Four ways to quantify synchrony between time series data

1. Pearson correlation — simple is best The Pearson correlation measures how two continuous signals co-vary over time and indicate the linear relationship as a number between -1 (negatively correlated) to 0 (not correlated) to 1 (perfectly correlated). It is intuitive, easy to understand, and easy to interpret. Two things to be cautious when using Pearson correlation is … Read more

## Plotly Express Yourself

Prep! (and prefacing thoughts) For our test data I found this fun dataset on Kaggle on superheroes (hey, I just saw Avengers:Endgame!): Multiple Spider-Men and Captains America? Yes, the multiverse exists! 2. Code for getting and scrubbing the data, as well as the snippets below can be found in this jupyter notebook here. 3. If … Read more

## How To Make Pi

An Infinite Series Approach This infinite sum idea seems to be working so we’ll continue down that path. From trigonometry, we know tan(pi / 4) = 1. We can now use the inverse tangent function, arctan(x), to calculate arctan(1) = pi / 4. And luckily, we have a simple and easy formula for arctan(x). This method … Read more

## How To Create Simple Keyword-based Movie Recommender Models From Scratch

Introduction Have you ever tried to use a movie recommender? In theory, it is something useful that can help figure out what to watch next instead of browsing through Netflix for a few hours, but their results tend to be hit-or-miss. This is a problem that most people can relate to, so I decided to … Read more

## A primer on *args, **kwargs, decorators for Data Scientists

What are **kwargs? In simple terms, you can use **kwargs to give an arbitrary number of Keyworded inputs to your function and access them using a dictionary. A simple example: Let’s say you want to create a print function that can take a name and age as input and print that. def myprint(name,age):print(f'{name} is {age} years … Read more

## Fun with analyzing @BillGates tweets Twitter API’s-Step by Step analysis

This is the 2nd post of the web scraping and API’s series. The first post is here. Please check it out. In this post, we can see how to extract the twitter data using Twitter API’s and then do some basic visualization using word cloud, pie charts and then sentiment analysis using Textblob and Vader. … Read more

## Happiness & GDP per capita in Africa

1. Import Libraries First off, let’s import the required libraries: Pandas for data structuring, Matplotlib and Seaborn for graph plotting and statistics, and GeoPandas for geographical map plotting. 2. Import the Data Let’s import and clean it in order to remove any unwanted variables and to organize the ones we want. , where life ladder is … Read more

## Data Engineering — the Cousin of Data Science, is Troublesome

How to get your analysts realize the importance of expanding their toolkit? I guess I’ve found the answer. We always deem data science as the “sexiest job of the 21st century”. When it comes to the transformation from a traditional company to an analytical company, either the company or the data scientists would expect to dive … Read more

## A Bird’s Eye View: How Machine Learning Can Help You Charge Your E-Scooters

Log-Scale Transformation For each feature, I plotted the distribution to explore the data for feature engineering opportunities. For features with a right-skewed distribution, where the mean is typically greater than the median, I applied these log transformations to normalize the distribution and reduce the variability of outlier observations. This approach was used to generate a … Read more

## Let’s Build a Streaming Data Pipeline

Apache Beam and DataFlow for real-time data pipelines Today’s post is based on a project I recently did in work. I was really excited to implement it and to write it up as a blog post as it gave me a chance to do some data engineering and also do something that was quite valuable … Read more

## When Job Hunting Meets Data Science (Part 1)

Endless challenges. That’s how we grow. In our Data Science Immersive program, the last major project before the Capstone is to build predictive models for various aspects of job hunting, such as salary and job categories. The project resembles the real-world scenario: Your boss gives you a target and/or a problem statement and you find … Read more

## Insight to the Fourier Transform and The Simple Implementation of It

source: https://pa1.narvii.com/6397/fbeec74f0468cf51eb46f4f869190563cf50829b_hq.gif In this post, I will not give you a detail about the derivation of the Fourier transform or Fourier series, etc. Instead, we will explore what the output and how it works from this transformation. So, the formula of Fourier transform we will discuss in this story is called Discrete Fourier Transform (DFT). … Read more

## A Simple Breast Cancer Classifier using ANN

I won’t repeat cliche statements like “… deep learning is the next big thing.” No. If you are here reading this article, you most certainly know what Deep Leaning or Neural Network is and how it is going to evolve. Let’s cut to the chest and build a classifier using a Neural Network that will … Read more

## Scraping the Top 5 Tech Company Job Boards

How to scrape Facebook’s job board, along with Apple, Amazon, Google and Netflix. Gustave Caillebotte [Public domain] In this project, I wanted to scrape the job search results from Apple, Amazon, Facebook, Google, and Netflix to help expedite my job search. It is a tedious thing to go to each site to get all the jobs results … Read more

## How I Improved Accuracy Of My Machine Learning Project?

Follow these tips to get better results Working on a machine learning project can be a tedious task, in particular when you have gathered all of the available data and yet the model yields poor results. This article should provide you with the tips that you can follow to improve the accuracy of your machine learning … Read more

## Do Hit Songs Have Anything in Common?

If you log in to Spotify.me, you will get a personalized summary of how Spotify understands you through the music you listen to on Spotify. It is pretty cool! As someone who listens to music a lot and who likes to play around with data, this inspired me to see if I could analyze my … Read more

## A different way to deploy a Python model over Spark

Separate the prediction method from the rest of the Python class and then implement in Scala Instead of using the whole thing, just take the pieces you need. A while ago, I wrote a post about how to deploy a Python model over Spark. The approach was roughly as follows: Train the model in Python on a … Read more

## An Analysis of Airbnb in San Francisco

Airbnb launched in San Francisco in August of 2008. Since then, it has grown to more than 81,000 cities in 191 countries with a total of six million listings¹. In the process, it has generated an immense amount of context-specific data about the cities in which it operates. As a resident of San Francisco for … Read more

## Publish Data Science Articles to the Web using Jupyter, Github and Kyso

Combine these 3 tools to supercharge your DS workflow KyleBlockedUnblockFollowFollowing May 6 Data science is exploding, more and more organizations are using data to power, well, everything. But it can sometimes still be a little difficult to publish data-science based reports. This might be because the charts are interactive, or because you want a reproducible document, … Read more

## Data Driven Growth with Python  — Part 1: Know Your Metrics

Data Driven Growth with Python Learn what and how to track with Python Introduction This series of articles were designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis and machine learning. I … Read more

## Towards Well Being, with Data Science (part 2)

Credit: EUFIC Please refer to (part 1) to see what we covered in the previous story. Now… where were we? Last time, we left off at visualizing the data with matplotlib. We also covered some introductory material, explored the data on a surface level, and started a deeper analysis with the help of visualization. We … Read more

## Comparing and Matching Column Values in Different Excel Files using Pandas

Pandas for column matching Often, we may want to compare column values in different Excel files against one another to search for matches and/or similarity. Using the Pandas library from Python, this is made an easy task. To demonstrate how this is possible, this tutorial will focus on a simple genetic example. No genetic knowledge … Read more

## Speedup your CNN using Fast Dense Feature Extraction and PyTorch

What are patch based methods? and what is the problem? Patch based CNN’s usually applied on single patches of an image, where each patch is classified separately. This approach is often used when trying to execute the same CNN several times on neighboring, overlapping patches in an image. This includes tasks based feature extraction like camera … Read more

## If you like to travel, let Python help you scrape the best fares!

Well, every Selenium project starts with a webdriver. I’m using Chromedriver, but there are other alternatives. PhantomJS or Firefox are also popular. After downloading it, place it in a folder and that’s it. These first lines will open a blank Chrome tab. Please bear in mind I’m not breaking new ground here. There are way … Read more

## Make your own Super Pandas using Multiproc

Parallelization is awesome. We data scientists have got laptops with quad-core, octa-core, turbo-boost. We work with servers with even more cores and computing power. But do we really utilize the raw power we have at hand? Instead, we wait for time taking processes to finish. Sometimes for hours, when urgent deliverables are at hand. Can … Read more

## Python vs Excel — Compound Annual Growth Rate (CAGR)

One of my greatest frustrations with Microsoft Excel (or Google Sheets) is the lack of an inbuilt function to calculate the compound annual growth rate or CAGR (XIRR is the closest but it’s not the same). This means that in every case where I needed to conduct a quick Excel CAGR analysis, I would need … Read more

## Zalando Dress Recomendation and Tagging

In Artificial Intelligence, Computer Vision techniques are massively applied. A nice field of application (one of my favourite) is fashion industry. The availability of resources in term of raw images allows to develop interesting use cases. Zalando knows this (I suggest to take a look at their GitHub repository) and frequently develops amazing AI solutions, … Read more

## Advanced candlesticks for machine learning (ii): volume and dollar bars

In this article we will learn how to build volume and dollar bars and we will explore what advantages they offer in respect to traditional time-based candlesticks and tick-bars. Finally, we will analyze two of their statistical properties — autocorrelation and normality of returns — in a large dataset of 16 cryptocurrency trading pairs Introduction In a previous post we … Read more

## Data Science for Startups: Containers

Source: https://commons.wikimedia.org/wiki/File:CMA_CGM_Benjamin_Franklin.jpeg Building reproducible setups for machine learning One of the skills that is becoming more in demand for data scientists is the ability to reproduce analyses. Having code and scripts that only work on your machine is no longer sustainable. You need to be able to share your work and have other teams be able … Read more

## Web Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Introduction I was learning about web scraping recently and thought of sharing my experience in scraping using beautifulsoup, scrapy,selenium and also using Twitter API’s and pandas datareader.Web scraping is fun and very useful tool.Python language made web scraping much easier. With less than 100 lines of code you can extract the data. Web scraping is … Read more

## Python for Finance: Robo Advisor Edition

Extending Stock Portfolio Analyses and Dash by Plotly to track Robo Advisor-like Portfolios. Photo by Aditya Vyas on Unsplash. Part 3 of Leveraging Python for Stock Portfolio Analyses. Introduction. This post is the third installment in my series on leveraging Python for finance, specifically stock portfolio analyses. In part 1, I reviewed a Jupyter notebook … Read more

## K-Means Clustering in SAS

What is Clustering? “Clustering is the process of dividing the datasets into groups, consisting of similar data-points”. Clustering is a type of unsupervised machine learning, which is used when you have unlabeled data. Let’s understand in the real scenario, Group of diners sitting in a restaurant. Let’s say two tables in the restaurant called T1 … Read more

## What’s new in TensorFlow 2.0?

The machine learning library TensorFlow has had a long history of releases starting from the initial open-source release from the Google Brain team back in November 2015. Initially developed internally under the name DistBelief, TensorFlow quickly rose to become the most widely used machine learning library today. And not without reason. Number of repository stars … Read more

## Detecting faces with Python and OpenCV Face Detection Neural Network

Cool Kids of Death Off Festival Now, we all know that Artificial Intelligence is becoming more and more real and its filling the gaps between capabilities of humans and machines day by day. It’s not just a fancy word anymore. It has had many advancements over the years in many fields and one of such areas … Read more

## Separating mixed signals with Independent Component Analysis

Image modified from garageband The world around is a dynamic mixture of signals from various sources. Just like the colors in the above picture blend into one another, giving rise to new shades and tones, everything we perceive is a fusion of simpler components. Most of the time we are not even aware that the … Read more

## Feature engineering

Feature engineering is the process of transforming raw, unprocessed data into a set of targeted features that best represent your underlying machine learning problem. Engineering thoughtful, optimized data is the vital first step. In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition. This … Read more

## Analyzing CNET’s Headlines

Exploring the news published on CNET using Python and Pandas Photo by M. B. M. on Unsplash I wrote a crawler to scrape the news headlines from CNET’s sitemap and decided to perform some exploratory analysis on it. In this post, I will walk you through my findings, some anomalies and some interesting insights. You … Read more

## 3 Machine Learning Books that Helped me Level Up

Source: Pixabay There is a Japanese word, tsundoku (積ん読), which means buying and keeping a growing collection of books, even though you don’t really read them all. I think we Developers and Data Scientists are particularly prone to falling into this trap. Personally, I even hoard bookmarks: my phone’s Chrome browser has so many open … Read more

## Exploring the Tokyo Neighborhoods: Data-Science in Real Life

3. Visualization and Data Exploration: 3.1. Folium Library and Leaflet Map: Folium is a python library that can create interactive leaflet map using coordinate data. Since I am interested in restaurants as popular spots first I create a data-frame where the ‘Venue_Category’ column in previous data-frame contains the word ‘Restaurant’. I used the following snippet of … Read more

## Analyzing Employee Reviews: Google vs Amazon vs Apple vs Microsoft

Which company is it worth working for? Overview Whether it is for their ability to offer high salaries, extravagant perks, or their exciting mission statements, it is clear that top companies like Google and Microsoft have become talent magnets. To put it into perspective, Google alone receives more than two million job applications each year. Working … Read more

## Understand the problem statement to optimize your code

Python Shorts How Understanding the problem statement could help you to optimize your code Photo by Helloquence on Unsplash Whenever we talk about optimizing code we always discuss the computational complexity of the code. Is it O(n) or O(n-squared)? But, sometimes we need to look beyond the algorithm and look at how the algorithm is going to … Read more

## Speed Up Your Exploratory Data Analysis With Pandas-Profiling

Get an intuition of your data’s structure with just one line of code Source: https://unsplash.com/photos/gts_Eh4g1lk Introduction When importing a new data set for the very first time, the first thing to do is to get an understanding of the data. This includes steps like determining the range of specific predictors, identifying each predictor’s data type, as … Read more

## How to use Python features in your data analytics project

Python tutorial in Azure using OO, NumPy, pandas, SQL, PySpark 1. Introduction A lot of companies are moving to cloud and consider what tooling shall be used for data analytics. On-premises, companies mostly use propriety software for advanced analytics, BI and reporting. However, this tooling may not be the most logical choice in a cloud environment. … Read more

## Classifying Products as Banned Or Approved using Text Mining- Part II

In this part, we will explain how to optimize the existing Machine Learning model in Part I and the deployment of this ML model using Flask. Connecting the dots -moving from M to L in Machine Learning In the previous article of this series, We have discussed the business problem, shown how to train the model using … Read more

## Importance of Choosing the Correct Hyper-parameters while defining a model

Often considered the trickiest part of optimizing the Machine Learning Algorithm, Correct Hyperparameter tuning can save a lot of time and help deploy the model faster We all Machine Learning aficionados must have participated in hackathons to test our skills in Machine Learning sometime or the other. Well, Some problem statement that we need to solve … Read more