What can Data Analytics tell us about Flower Boy?

An analysis of Tyler the Creator’s discography Flower Boy has been praised by critics as Tyler the Creator’s best and most mature album. He received a grade of 8.5 by Pitchfork who called this album “transformational, lovestruck and penetrating” and was nominated for the Grammy Awards in 2018 in the best rap album category. While … Read more

Time Series Analysis, Visualization & Forecasting with LSTM

Statistics normality test, Dickey–Fuller test for stationarity, Long short-term memory The title says it all. Without further ado, let’s roll! The Data The data is the measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years that can be downloaded from here. Different electrical quantities … Read more

10x Faster Parallel Python Without Python Multiprocessing

Faster Python without restructuring your code While Python’s multiprocessing library has been used successfully for a wide range of applications, in this blog post, we show that it falls short for several important classes of applications including numerical data processing, stateful computation, and computation with expensive initialization. There are two main reasons: Inefficient handling of numerical … Read more

Exploring Toronto Bike Share Ridership using Python

Analysis and Visualization Personally, I prefer to create a new Jupyter notebook for analysis only. In the new notebook, I first imported the libraries and the cleaned data, then created new pandas Categorical datatypes for the day of the week (Monday, Tuesday, etc.) and month names to ensure fixed sorting order (useful when visualizing the … Read more

Interview Coding Problems: 1.

1. Return if any two numbers from a list add up to a number 2. Transform a list such that each element to be the product of all the rest numbers in the original list 3. Serialise and Deserialise a binary tree. A great way to improve your coding skills is by solving coding challenges. Solving … Read more

Predicting Airbnb prices with deep learning, part 1: how to clean up Airbnb data

Project aims and background Airbnb is a home-sharing platform that allows home-owners and renters (‘hosts’) to put their properties (‘listings’) online, so that guests can pay to stay in them. Hosts are expected to set their own prices for their listings. Although Airbnb and other sites provide some general guidance, there are currently no free … Read more

Scalable Python Code with Pandas UDFs: A Data Science Application

Source: https://pxhere.com/en/photo/1417846 Making Python code run at massive scale in the cloud PySpark is a really powerful tool, because it enables writing Python code that can scale from a single machine to a large cluster. While libraries such as MLlib provide good coverage of the standard tasks that a data scientists may want to perform in … Read more

Classification and Regression Analysis with Decision Trees

The Fundamentals of Decision Trees A decision tree is constructed by recursive partitioning — starting from the root node (known as the first parent), each node can be split into left and right child nodes. These nodes can then be further split and they themselves become parent nodes of their resulting children nodes. For example, looking at the … Read more

Practical Statistics & Visualization With Python & Plotly

Photo credit: Pixabay How to use Python and Plotly for statistical visualization, inference, and modeling One day last week, I was googling “statistics with Python”, the results were somewhat unfruitful. Most literature, tutorials and articles focus on statistics with R, because R is a language dedicated to statistics and has more statistical analysis features than Python. In … Read more

Finding Bayesian Legos

Photo credit: Frédérique Voisin-Demery/Flickr (CC BY 2.0) Joe, a good family friend, dropped by earlier this week. As we do often, we discussed the weather (seems to be hotter than normal already here in the Pacific Northwest), the news (mostly about how we are both taking actions to avoid the news), and our kids. Both of … Read more

Get started with Object Oriented Programming in Python: Classes and Instances

New to OOP? Learn how to write a class and create instances in Python There are a lot of articles popping up on object-oriented programming in Python at the moment. Many data scientists, myself included, find ourselves in roles that focus on writing functional code, often in small scripts or prototypes. I’ve been working as a … Read more

Challenges in sentiment analysis: a case for word clouds (for now)

Exploring simple python code visualizations for marketing Machine understanding and capability get merged together in popular culture. When I think about artificial intelligence, I get into this tricky habit of mixing understanding with capability. I imagine that there are ways we can tell how much a machine knows by what it can produce. However, the … Read more

Four ways to quantify synchrony between time series data

1. Pearson correlation — simple is best The Pearson correlation measures how two continuous signals co-vary over time and indicate the linear relationship as a number between -1 (negatively correlated) to 0 (not correlated) to 1 (perfectly correlated). It is intuitive, easy to understand, and easy to interpret. Two things to be cautious when using Pearson correlation is … Read more

Plotly Express Yourself

Prep! (and prefacing thoughts) For our test data I found this fun dataset on Kaggle on superheroes (hey, I just saw Avengers:Endgame!): Multiple Spider-Men and Captains America? Yes, the multiverse exists! 2. Code for getting and scrubbing the data, as well as the snippets below can be found in this jupyter notebook here. 3. If … Read more

How To Make Pi

An Infinite Series Approach This infinite sum idea seems to be working so we’ll continue down that path. From trigonometry, we know tan(pi / 4) = 1. We can now use the inverse tangent function, arctan(x), to calculate arctan(1) = pi / 4. And luckily, we have a simple and easy formula for arctan(x). This method … Read more

A primer on *args, **kwargs, decorators for Data Scientists

What are **kwargs? In simple terms, you can use **kwargs to give an arbitrary number of Keyworded inputs to your function and access them using a dictionary. A simple example: Let’s say you want to create a print function that can take a name and age as input and print that. def myprint(name,age):print(f'{name} is {age} years … Read more

Happiness & GDP per capita in Africa

1. Import Libraries First off, let’s import the required libraries: Pandas for data structuring, Matplotlib and Seaborn for graph plotting and statistics, and GeoPandas for geographical map plotting. 2. Import the Data Let’s import and clean it in order to remove any unwanted variables and to organize the ones we want. , where life ladder is … Read more

Data Engineering — the Cousin of Data Science, is Troublesome

How to get your analysts realize the importance of expanding their toolkit? I guess I’ve found the answer. We always deem data science as the “sexiest job of the 21st century”. When it comes to the transformation from a traditional company to an analytical company, either the company or the data scientists would expect to dive … Read more

A Bird’s Eye View: How Machine Learning Can Help You Charge Your E-Scooters

Log-Scale Transformation For each feature, I plotted the distribution to explore the data for feature engineering opportunities. For features with a right-skewed distribution, where the mean is typically greater than the median, I applied these log transformations to normalize the distribution and reduce the variability of outlier observations. This approach was used to generate a … Read more

When Job Hunting Meets Data Science (Part 1)

Endless challenges. That’s how we grow. In our Data Science Immersive program, the last major project before the Capstone is to build predictive models for various aspects of job hunting, such as salary and job categories. The project resembles the real-world scenario: Your boss gives you a target and/or a problem statement and you find … Read more

Insight to the Fourier Transform and The Simple Implementation of It

source: https://pa1.narvii.com/6397/fbeec74f0468cf51eb46f4f869190563cf50829b_hq.gif In this post, I will not give you a detail about the derivation of the Fourier transform or Fourier series, etc. Instead, we will explore what the output and how it works from this transformation. So, the formula of Fourier transform we will discuss in this story is called Discrete Fourier Transform (DFT). … Read more

Scraping the Top 5 Tech Company Job Boards

How to scrape Facebook’s job board, along with Apple, Amazon, Google and Netflix. Gustave Caillebotte [Public domain] In this project, I wanted to scrape the job search results from Apple, Amazon, Facebook, Google, and Netflix to help expedite my job search. It is a tedious thing to go to each site to get all the jobs results … Read more

An Analysis of Airbnb in San Francisco

Airbnb launched in San Francisco in August of 2008. Since then, it has grown to more than 81,000 cities in 191 countries with a total of six million listings¹. In the process, it has generated an immense amount of context-specific data about the cities in which it operates. As a resident of San Francisco for … Read more

Publish Data Science Articles to the Web using Jupyter, Github and Kyso

Combine these 3 tools to supercharge your DS workflow KyleBlockedUnblockFollowFollowing May 6 Data science is exploding, more and more organizations are using data to power, well, everything. But it can sometimes still be a little difficult to publish data-science based reports. This might be because the charts are interactive, or because you want a reproducible document, … Read more

Data Driven Growth with Python  — Part 1: Know Your Metrics

Data Driven Growth with Python Learn what and how to track with Python Introduction This series of articles were designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis and machine learning. I … Read more

Comparing and Matching Column Values in Different Excel Files using Pandas

Pandas for column matching Often, we may want to compare column values in different Excel files against one another to search for matches and/or similarity. Using the Pandas library from Python, this is made an easy task. To demonstrate how this is possible, this tutorial will focus on a simple genetic example. No genetic knowledge … Read more

Speedup your CNN using Fast Dense Feature Extraction and PyTorch

What are patch based methods? and what is the problem? Patch based CNN’s usually applied on single patches of an image, where each patch is classified separately. This approach is often used when trying to execute the same CNN several times on neighboring, overlapping patches in an image. This includes tasks based feature extraction like camera … Read more

If you like to travel, let Python help you scrape the best fares!

Well, every Selenium project starts with a webdriver. I’m using Chromedriver, but there are other alternatives. PhantomJS or Firefox are also popular. After downloading it, place it in a folder and that’s it. These first lines will open a blank Chrome tab. Please bear in mind I’m not breaking new ground here. There are way … Read more

Make your own Super Pandas using Multiproc

Parallelization is awesome. We data scientists have got laptops with quad-core, octa-core, turbo-boost. We work with servers with even more cores and computing power. But do we really utilize the raw power we have at hand? Instead, we wait for time taking processes to finish. Sometimes for hours, when urgent deliverables are at hand. Can … Read more

Zalando Dress Recomendation and Tagging

In Artificial Intelligence, Computer Vision techniques are massively applied. A nice field of application (one of my favourite) is fashion industry. The availability of resources in term of raw images allows to develop interesting use cases. Zalando knows this (I suggest to take a look at their GitHub repository) and frequently develops amazing AI solutions, … Read more

Advanced candlesticks for machine learning (ii): volume and dollar bars

In this article we will learn how to build volume and dollar bars and we will explore what advantages they offer in respect to traditional time-based candlesticks and tick-bars. Finally, we will analyze two of their statistical properties — autocorrelation and normality of returns — in a large dataset of 16 cryptocurrency trading pairs Introduction In a previous post we … Read more

Data Science for Startups: Containers

Source: https://commons.wikimedia.org/wiki/File:CMA_CGM_Benjamin_Franklin.jpeg Building reproducible setups for machine learning One of the skills that is becoming more in demand for data scientists is the ability to reproduce analyses. Having code and scripts that only work on your machine is no longer sustainable. You need to be able to share your work and have other teams be able … Read more

Web Scraping For Beginners Beautifulsoup,Scrapy,Selenium & Twitter API

Introduction I was learning about web scraping recently and thought of sharing my experience in scraping using beautifulsoup, scrapy,selenium and also using Twitter API’s and pandas datareader.Web scraping is fun and very useful tool.Python language made web scraping much easier. With less than 100 lines of code you can extract the data. Web scraping is … Read more

Python for Finance: Robo Advisor Edition

Extending Stock Portfolio Analyses and Dash by Plotly to track Robo Advisor-like Portfolios. Photo by Aditya Vyas on Unsplash. Part 3 of Leveraging Python for Stock Portfolio Analyses. Introduction. This post is the third installment in my series on leveraging Python for finance, specifically stock portfolio analyses. In part 1, I reviewed a Jupyter notebook … Read more

K-Means Clustering in SAS

What is Clustering? “Clustering is the process of dividing the datasets into groups, consisting of similar data-points”. Clustering is a type of unsupervised machine learning, which is used when you have unlabeled data. Let’s understand in the real scenario, Group of diners sitting in a restaurant. Let’s say two tables in the restaurant called T1 … Read more

What’s new in TensorFlow 2.0?

The machine learning library TensorFlow has had a long history of releases starting from the initial open-source release from the Google Brain team back in November 2015. Initially developed internally under the name DistBelief, TensorFlow quickly rose to become the most widely used machine learning library today. And not without reason. Number of repository stars … Read more