Practical Statistics & Visualization With Python & Plotly

Photo credit: Pixabay How to use Python and Plotly for statistical visualization, inference, and modeling One day last week, I was googling “statistics with Python”, the results were somewhat unfruitful. Most literature, tutorials and articles focus on statistics with R, because R is a language dedicated to statistics and has more statistical analysis features than Python. In … Read more

Finding Bayesian Legos

Photo credit: Frédérique Voisin-Demery/Flickr (CC BY 2.0) Joe, a good family friend, dropped by earlier this week. As we do often, we discussed the weather (seems to be hotter than normal already here in the Pacific Northwest), the news (mostly about how we are both taking actions to avoid the news), and our kids. Both of … Read more

Get started with Object Oriented Programming in Python: Classes and Instances

New to OOP? Learn how to write a class and create instances in Python There are a lot of articles popping up on object-oriented programming in Python at the moment. Many data scientists, myself included, find ourselves in roles that focus on writing functional code, often in small scripts or prototypes. I’ve been working as a … Read more

Challenges in sentiment analysis: a case for word clouds (for now)

Exploring simple python code visualizations for marketing Machine understanding and capability get merged together in popular culture. When I think about artificial intelligence, I get into this tricky habit of mixing understanding with capability. I imagine that there are ways we can tell how much a machine knows by what it can produce. However, the … Read more

Four ways to quantify synchrony between time series data

1. Pearson correlation — simple is best The Pearson correlation measures how two continuous signals co-vary over time and indicate the linear relationship as a number between -1 (negatively correlated) to 0 (not correlated) to 1 (perfectly correlated). It is intuitive, easy to understand, and easy to interpret. Two things to be cautious when using Pearson correlation is … Read more

Plotly Express Yourself

Prep! (and prefacing thoughts) For our test data I found this fun dataset on Kaggle on superheroes (hey, I just saw Avengers:Endgame!): Multiple Spider-Men and Captains America? Yes, the multiverse exists! 2. Code for getting and scrubbing the data, as well as the snippets below can be found in this jupyter notebook here. 3. If … Read more

How To Make Pi

An Infinite Series Approach This infinite sum idea seems to be working so we’ll continue down that path. From trigonometry, we know tan(pi / 4) = 1. We can now use the inverse tangent function, arctan(x), to calculate arctan(1) = pi / 4. And luckily, we have a simple and easy formula for arctan(x). This method … Read more

A primer on *args, **kwargs, decorators for Data Scientists

What are **kwargs? In simple terms, you can use **kwargs to give an arbitrary number of Keyworded inputs to your function and access them using a dictionary. A simple example: Let’s say you want to create a print function that can take a name and age as input and print that. def myprint(name,age):print(f'{name} is {age} years … Read more

Happiness & GDP per capita in Africa

1. Import Libraries First off, let’s import the required libraries: Pandas for data structuring, Matplotlib and Seaborn for graph plotting and statistics, and GeoPandas for geographical map plotting. 2. Import the Data Let’s import and clean it in order to remove any unwanted variables and to organize the ones we want. , where life ladder is … Read more

Data Engineering — the Cousin of Data Science, is Troublesome

How to get your analysts realize the importance of expanding their toolkit? I guess I’ve found the answer. We always deem data science as the “sexiest job of the 21st century”. When it comes to the transformation from a traditional company to an analytical company, either the company or the data scientists would expect to dive … Read more

A Bird’s Eye View: How Machine Learning Can Help You Charge Your E-Scooters

Log-Scale Transformation For each feature, I plotted the distribution to explore the data for feature engineering opportunities. For features with a right-skewed distribution, where the mean is typically greater than the median, I applied these log transformations to normalize the distribution and reduce the variability of outlier observations. This approach was used to generate a … Read more

When Job Hunting Meets Data Science (Part 1)

Endless challenges. That’s how we grow. In our Data Science Immersive program, the last major project before the Capstone is to build predictive models for various aspects of job hunting, such as salary and job categories. The project resembles the real-world scenario: Your boss gives you a target and/or a problem statement and you find … Read more

Insight to the Fourier Transform and The Simple Implementation of It

source: In this post, I will not give you a detail about the derivation of the Fourier transform or Fourier series, etc. Instead, we will explore what the output and how it works from this transformation. So, the formula of Fourier transform we will discuss in this story is called Discrete Fourier Transform (DFT). … Read more

Scraping the Top 5 Tech Company Job Boards

How to scrape Facebook’s job board, along with Apple, Amazon, Google and Netflix. Gustave Caillebotte [Public domain] In this project, I wanted to scrape the job search results from Apple, Amazon, Facebook, Google, and Netflix to help expedite my job search. It is a tedious thing to go to each site to get all the jobs results … Read more

An Analysis of Airbnb in San Francisco

Airbnb launched in San Francisco in August of 2008. Since then, it has grown to more than 81,000 cities in 191 countries with a total of six million listings¹. In the process, it has generated an immense amount of context-specific data about the cities in which it operates. As a resident of San Francisco for … Read more

Publish Data Science Articles to the Web using Jupyter, Github and Kyso

Combine these 3 tools to supercharge your DS workflow KyleBlockedUnblockFollowFollowing May 6 Data science is exploding, more and more organizations are using data to power, well, everything. But it can sometimes still be a little difficult to publish data-science based reports. This might be because the charts are interactive, or because you want a reproducible document, … Read more

Data Driven Growth with Python  — Part 1: Know Your Metrics

Data Driven Growth with Python Learn what and how to track with Python Introduction This series of articles were designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis and machine learning. I … Read more

Comparing and Matching Column Values in Different Excel Files using Pandas

Pandas for column matching Often, we may want to compare column values in different Excel files against one another to search for matches and/or similarity. Using the Pandas library from Python, this is made an easy task. To demonstrate how this is possible, this tutorial will focus on a simple genetic example. No genetic knowledge … Read more

Speedup your CNN using Fast Dense Feature Extraction and PyTorch

What are patch based methods? and what is the problem? Patch based CNN’s usually applied on single patches of an image, where each patch is classified separately. This approach is often used when trying to execute the same CNN several times on neighboring, overlapping patches in an image. This includes tasks based feature extraction like camera … Read more

If you like to travel, let Python help you scrape the best fares!

Well, every Selenium project starts with a webdriver. I’m using Chromedriver, but there are other alternatives. PhantomJS or Firefox are also popular. After downloading it, place it in a folder and that’s it. These first lines will open a blank Chrome tab. Please bear in mind I’m not breaking new ground here. There are way … Read more

Make your own Super Pandas using Multiproc

Parallelization is awesome. We data scientists have got laptops with quad-core, octa-core, turbo-boost. We work with servers with even more cores and computing power. But do we really utilize the raw power we have at hand? Instead, we wait for time taking processes to finish. Sometimes for hours, when urgent deliverables are at hand. Can … Read more

Zalando Dress Recomendation and Tagging

In Artificial Intelligence, Computer Vision techniques are massively applied. A nice field of application (one of my favourite) is fashion industry. The availability of resources in term of raw images allows to develop interesting use cases. Zalando knows this (I suggest to take a look at their GitHub repository) and frequently develops amazing AI solutions, … Read more

Advanced candlesticks for machine learning (ii): volume and dollar bars

In this article we will learn how to build volume and dollar bars and we will explore what advantages they offer in respect to traditional time-based candlesticks and tick-bars. Finally, we will analyze two of their statistical properties — autocorrelation and normality of returns — in a large dataset of 16 cryptocurrency trading pairs Introduction In a previous post we … Read more

Data Science for Startups: Containers

Source: Building reproducible setups for machine learning One of the skills that is becoming more in demand for data scientists is the ability to reproduce analyses. Having code and scripts that only work on your machine is no longer sustainable. You need to be able to share your work and have other teams be able … Read more

Web Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Introduction I was learning about web scraping recently and thought of sharing my experience in scraping using beautifulsoup, scrapy,selenium and also using Twitter API’s and pandas datareader.Web scraping is fun and very useful tool.Python language made web scraping much easier. With less than 100 lines of code you can extract the data. Web scraping is … Read more

Python for Finance: Robo Advisor Edition

Extending Stock Portfolio Analyses and Dash by Plotly to track Robo Advisor-like Portfolios. Photo by Aditya Vyas on Unsplash. Part 3 of Leveraging Python for Stock Portfolio Analyses. Introduction. This post is the third installment in my series on leveraging Python for finance, specifically stock portfolio analyses. In part 1, I reviewed a Jupyter notebook … Read more

K-Means Clustering in SAS

What is Clustering? “Clustering is the process of dividing the datasets into groups, consisting of similar data-points”. Clustering is a type of unsupervised machine learning, which is used when you have unlabeled data. Let’s understand in the real scenario, Group of diners sitting in a restaurant. Let’s say two tables in the restaurant called T1 … Read more

What’s new in TensorFlow 2.0?

The machine learning library TensorFlow has had a long history of releases starting from the initial open-source release from the Google Brain team back in November 2015. Initially developed internally under the name DistBelief, TensorFlow quickly rose to become the most widely used machine learning library today. And not without reason. Number of repository stars … Read more

Feature engineering

Feature engineering is the process of transforming raw, unprocessed data into a set of targeted features that best represent your underlying machine learning problem. Engineering thoughtful, optimized data is the vital first step. In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition. This … Read more

Analyzing CNET’s Headlines

Exploring the news published on CNET using Python and Pandas Photo by M. B. M. on Unsplash I wrote a crawler to scrape the news headlines from CNET’s sitemap and decided to perform some exploratory analysis on it. In this post, I will walk you through my findings, some anomalies and some interesting insights. You … Read more

3 Machine Learning Books that Helped me Level Up

Source: Pixabay There is a Japanese word, tsundoku (積ん読), which means buying and keeping a growing collection of books, even though you don’t really read them all. I think we Developers and Data Scientists are particularly prone to falling into this trap. Personally, I even hoard bookmarks: my phone’s Chrome browser has so many open … Read more

Exploring the Tokyo Neighborhoods: Data-Science in Real Life

3. Visualization and Data Exploration: 3.1. Folium Library and Leaflet Map: Folium is a python library that can create interactive leaflet map using coordinate data. Since I am interested in restaurants as popular spots first I create a data-frame where the ‘Venue_Category’ column in previous data-frame contains the word ‘Restaurant’. I used the following snippet of … Read more

Analyzing Employee Reviews: Google vs Amazon vs Apple vs Microsoft

Which company is it worth working for? Overview Whether it is for their ability to offer high salaries, extravagant perks, or their exciting mission statements, it is clear that top companies like Google and Microsoft have become talent magnets. To put it into perspective, Google alone receives more than two million job applications each year. Working … Read more

Understand the problem statement to optimize your code

Python Shorts How Understanding the problem statement could help you to optimize your code Photo by Helloquence on Unsplash Whenever we talk about optimizing code we always discuss the computational complexity of the code. Is it O(n) or O(n-squared)? But, sometimes we need to look beyond the algorithm and look at how the algorithm is going to … Read more

Speed Up Your Exploratory Data Analysis With Pandas-Profiling

Get an intuition of your data’s structure with just one line of code Source: Introduction When importing a new data set for the very first time, the first thing to do is to get an understanding of the data. This includes steps like determining the range of specific predictors, identifying each predictor’s data type, as … Read more

How to use Python features in your data analytics project

Python tutorial in Azure using OO, NumPy, pandas, SQL, PySpark 1. Introduction A lot of companies are moving to cloud and consider what tooling shall be used for data analytics. On-premises, companies mostly use propriety software for advanced analytics, BI and reporting. However, this tooling may not be the most logical choice in a cloud environment. … Read more

Classifying Products as Banned Or Approved using Text Mining- Part II

In this part, we will explain how to optimize the existing Machine Learning model in Part I and the deployment of this ML model using Flask. Connecting the dots -moving from M to L in Machine Learning In the previous article of this series, We have discussed the business problem, shown how to train the model using … Read more

Importance of Choosing the Correct Hyper-parameters while defining a model

Often considered the trickiest part of optimizing the Machine Learning Algorithm, Correct Hyperparameter tuning can save a lot of time and help deploy the model faster We all Machine Learning aficionados must have participated in hackathons to test our skills in Machine Learning sometime or the other. Well, Some problem statement that we need to solve … Read more