Keras data generators and how to use them

You probably encountered a situation where you try to load a dataset but there is not enough memory in your machine. As the field of machine learning progresses, this problem becomes more and more common. Today this is already one of the challenges in the field of vision where large datasets of images and video … Read moreKeras data generators and how to use them

How to use Selenium as life-saver when dealing with boring tasks?

Automate never-ending repetitive tasks the Selenium way photo by elmnet If you are a developer then probably you do not need an intro to selenium. Selenium is a powerful tool built to interact with the web server for processing requests in a programmatic way. It is used in automating a wide variety of tasks involving … Read moreHow to use Selenium as life-saver when dealing with boring tasks?

8 Useful Pandas Features for Data-Set Handling

This article presents 8 simple, but useful Pandas operations which showcase how the Python’s Pandas library can be usefully used for data-set exploration. The Data-set I will use for this tutorial piece is entitled ‘International football results from 1872 to 2019’ and can be sourced here, in case any of the code snippet examples presented … Read more8 Useful Pandas Features for Data-Set Handling

FastText sentiment analysis for tweets: A straightforward guide.

FastText is an open-source NLP library developed by facebook AI and initially released in 2016. Its goal is to provide word embedding and text classification efficiently. According to their authors, it is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. [1] This … Read moreFastText sentiment analysis for tweets: A straightforward guide.

Recommender System in Python — Part 2 (Content-Based System)

The process of getting recommendations is now as simple as a function call. The only parameter you need to pass in is the movie title, and it has to be the same as the one present in the dataset, every little spelling mistake will break everything. Feel free to play around with the function to … Read moreRecommender System in Python — Part 2 (Content-Based System)

Beginner’s Guide to Python Quirks and Jargon

But while these courses and tutorials can quickly get you up to speed with the basics of the language and the relevant data science libraries — pandas, numpy, matplotlib, and, sklearn, to name a few — most barely scratch the intricacies of Python. Despite its simplicity, Python is a vast and rich language, and it … Read moreBeginner’s Guide to Python Quirks and Jargon

How to get started with Data Science : A brief tutorial on using Anaconda, Python, Jupyter…

In this article, I wanted to write about my experience of overcoming the initial hurdle and getting started with learning Data Science. Learning data science is a journey and you will keep learning once you get started. In this article we will go through following 5 starting steps for getting into the field of learning … Read moreHow to get started with Data Science : A brief tutorial on using Anaconda, Python, Jupyter…

Python Tips and Tricks, You Haven’t Already Seen, Part 2

Note: This was originally posted at martinheinz.dev Few weeks ago I posted an article (here) about some not so commonly known Python features and quite a few people seemed to like it, so here comes another round of Python features that you hopefully haven’t seen yet. Using lots of hardcoded index values can quickly become … Read morePython Tips and Tricks, You Haven’t Already Seen, Part 2

Pedestrian detection using Non Maximum Suppression

A complete pipeline for detecting pedestrians on the road Pedestrian detection is still an unsolved problem in computer science. While many object detection algorithms like YOLO, SSD, RCNN, Fast R-CNN and Faster R-CNN have been researched a lot to great success but still pedestrian detection in crowded scenes remains an open challenge. In recent years, … Read morePedestrian detection using Non Maximum Suppression

A closer look into the Spanish railway passenger transportation pricing

As someone who lives and works in a Spanish city 400km away from home, I have found that the most convenient way to travel back and forth is to resort to the train. As a frequent user I have grown baffled of the pricing pattern upon buying the tickets, moving sometimes along the same levels, … Read moreA closer look into the Spanish railway passenger transportation pricing

A demonstration of carrying data analysis (Crimes in Denver EDA)

This is my second demonstration of carrying data analysis using Python. My previous article is about New York City Airbnb Open Data. Please have a look and give me your comments and thoughts so I can keep improving. A demonstration of carrying data analysis (New York City Airbnb Open Data) In this article, I will … Read moreA demonstration of carrying data analysis (Crimes in Denver EDA)

6 of the Best Niche Platforms to Learn SQL and Python

w3schools is a simple, no-frills tool for learning web development skills, including SQL and Python Depending on your preferences, you will probably either love or hate w3school’s approach to learning. w3schools claims to be the world’s largest web developer site, so their methods clearly work for many people. Essentially, the method of teaching here is … Read more6 of the Best Niche Platforms to Learn SQL and Python

Parquet conversion in AWS using Airflow (Part 2)

In this post, we will deep dive into the custom Airflow operators and see how to easily handle the parquet conversion in Airflow. If you are on AWS there are primarily three ways by which you can convert the data in Redshift/S3 into parquet file format: Using Pyarrow which might take a bit of time … Read moreParquet conversion in AWS using Airflow (Part 2)

4 Tips to Get the Best Out of PyCharm

The choice of editor usually does not matter much when you are simply experimenting with Machine Learning or coding for short projects that do not require complex folder structures or scripts organized in modules. The problems and preferences usually come up when projects become larger, with several scripts, modules, tests and programmers collaborating on the … Read more4 Tips to Get the Best Out of PyCharm

Recommender System in Python — Part 1 (Preparation and Analysis)

You’ve made it to the last part, good work! Let’s now dive further into the number of rating visualization. It will also involve some prep work, but nothing demanding. You will have to: Create a DataFrame with movieId column grouped, and count the instances Merge it with the original dataset Rename columns that were messed … Read moreRecommender System in Python — Part 1 (Preparation and Analysis)

Give some semantic love to your keyword search!

Image source: https://ebiquity.umbc.edu At first, search engines (Google, Bing, Yahoo, etc.) were lexical: the search engine looked for literal matches of the query words, without an understanding of the query’s meaning and only returning links that contained the exact query. But, with the advent of machine learning and new techniques in the field of Natural … Read moreGive some semantic love to your keyword search!

3 Essential Python Skills for Data Scientists

Lambda functions are just so powerful. Yeah, you won’t use them when you have to clean multiple columns the same way — but that’s not something that happened to me very often — more often than not, each attribute will require its own logic behind cleaning. Lambda functions allow you to create ‘anonymous’ functions. This … Read more3 Essential Python Skills for Data Scientists

Working with VSCode and Jupyter Notebook Style

If you are getting started with machine learning algorithms, you will come across Jupyter Notebook. To maximize efficiency you can integrate its concept with VS Code. As this requires some understanding on how to set up a Python environment this article shall provide an introduction. There a few reasons why it makes sense to develop … Read moreWorking with VSCode and Jupyter Notebook Style

Making a Game for Kids to Learn English and Have Fun with Python

There are a few techniques and then you can learn and create your own game. Game Initialization import pygame# Game Initpygame.init()win = pygame.display.set_mode((640, 480))pygame.display.set_caption(“KidsWord presented by cyda”)run = Truewhile run:pygame.time.delay(100)for event in pygame.event.get():if event.type == pygame.QUIT:run = Falsepygame.display.update()pygame.quit() To start the game, we need a game window. There are two things to set. Window Size … Read moreMaking a Game for Kids to Learn English and Have Fun with Python

Working On a Databricks Cluster From A Remote Machine

Setting up Configuring the databricks-connect client will be pretty easy, you will need to accept the agreement, enter the url (including the https://), enter the token, enter the cluster ID and push enter twice to accept the default values for the Org ID and Port questions. Do you accept the above agreement? [y/N] ySet new … Read moreWorking On a Databricks Cluster From A Remote Machine

Python Tips and Trick, You Haven’t Already Seen

Note: This was originally posted at martinheinz.dev There are plenty of articles written about lots of cool features in Python such as variable unpacking, partial functions, enumerating iterables, but there is much more to talk about when it comes to Python, so here I will try to show some of the features I know and … Read morePython Tips and Trick, You Haven’t Already Seen

A Framework to Distribute Data Projects Across Teams

Next step is to set up the project and the environment in PyCharm. There are two possible scenarios: Starting a new project Cloning a project from Gitlab Setting up a new project with an existing environment is very straightforward in PyCharm, once you open the initial window you will see the option: + Create a … Read moreA Framework to Distribute Data Projects Across Teams

Getting Stuff Done at Hackathons for Rookies

I thoroughly enjoyed my first hackathon (you can read about my experience about scope from a previous post). The opportunity arose through BetaNYC to participate in the Mobility for All Abilities Hackathon, part of the larger National Day of Civic Hacking of 2019. I was on the Reliable Access to Subways team, partnered with TransitCenter … Read moreGetting Stuff Done at Hackathons for Rookies

My First Data Science Project — Family-Friendly Neighborhoods in London

It’s great news to see that there are more family-friendly neighborhoods in London than there are neighborhoods to avoid. In fact, there are 136 neighborhoods to choose from. Here is a simple breakdown: So for any families like my own who are looking for the best family-friendly neighborhoods in London, England. I suggest you start … Read moreMy First Data Science Project — Family-Friendly Neighborhoods in London

Where should you go for college?

What your expected salary will be after graduating based on college degree and college region. Teenagers reach that point in their life where they need to pursue their goals in life. Some have ambitions that require college education. Some are still unsure about their goals or ambitions so they go to college to find them. … Read moreWhere should you go for college?

A Non-Confusing Guide to Confusion Matrix

After reading all of that stuff about positive and negatives (a couple of times preferably), you now have a basic idea and intuition about confusion matrix, and you see that it’s not that confusing after all — it just needs to “sink in” properly. But is that all about confusion matrix? I hope you’re kidding. … Read moreA Non-Confusing Guide to Confusion Matrix

10,000 Ways That Won’t Work

Lesson 3 of “Practical Deep Learning for Coders” by fast.ai “I have not failed. I’ve just found 10,000 ways that won’t work.” ~Thomas Edison I’m a math adjunct working my way through Lesson 3 of “Practical Deep Learning for Coders” by fast.ai, and this week has been a major pride-swallower for me. At the end … Read more10,000 Ways That Won’t Work

Visualizing Tesla Superchargers in France

Learn visualization using Python and Folium, from scratch Data visualization is not merely science, it is an art. The way our human brain works, it is really easy to process information in the form of visualization. After almost 25 years into digital mapping and many companies using machine learning to collect mass amounts of data, … Read moreVisualizing Tesla Superchargers in France

Get Involved With SciPy!

SciPy wants your ideas to help it become more user-friendly You’ve heard of SciPy. You’ve probably used it. You might have looked through some of the technical documentation and user guides. You might even have an opinion of the documentation… But have you given any thought to actually getting involved and letting SciPy know how … Read moreGet Involved With SciPy!

Image Scraping with Python

Photo by Ross Findon on Unsplash However, most modern web pages are quite interactive. The concept of “single-page application” means that the web page itself will change without the user having to reload or getting redirected from page to page all the time. Because this happens only after specific user interactions, there are few options … Read moreImage Scraping with Python

Predicting Prices of Bitcoin with Machine Learning

By knowing the PACF and ACF, we now better understand our dataset and the parameters to potentially choose. Now, we can move on to modeling our data by using the SARIMA model. Optimizing Parameters In order to get the best performance out of the model, we must find the optimum parameters. We do this by … Read morePredicting Prices of Bitcoin with Machine Learning

SudachiPy: A Japanese Morphological Analyzer in Python

Import First, let’s import the tokenizer and dictionary module. from sudachipy import tokenizerfrom sudachipy import dictionary Tokenizer Object Both of the modules are required to create a tokenizer object. Continue to add the following code: tokenizer_obj = dictionary.Dictionary().create() Mode Next up, we will need to define the mode for the tokenizer. Mode is used to … Read moreSudachiPy: A Japanese Morphological Analyzer in Python

Exploring your data with just 1 line of Python

This is just the beginning of the report. How would you like it if I told you I could produce the following statistics with just 3 lines of Python..? Actually just 1 line if we don’t count our imports. Essentials: type, unique values, missing values Quantile statistics like minimum value, Q1, median, Q3, maximum, range, … Read moreExploring your data with just 1 line of Python

Faster ALS recommendations with feature extraction (and Muppets!)

CC In another recent post, I went over how I created a comic book recommendation system for non-comic readers, IntoComics. I went over some of the steps of how I did it, including the creation of an Alternating Least Squares (ALS) model which breaks users and items into their own matrices, which can be visualized … Read moreFaster ALS recommendations with feature extraction (and Muppets!)

Connecting Python to Oracle, SQL Server, MySQL, and PostgreSQL

To connect to the Oracle database you will, of course, need the database installed on your machine. My machine has 12c version, so there are no guarantees everything will work on older or newer versions. To test everything I’ve unlocked the famous HR schema and set the password to hr. Once you do so too, … Read moreConnecting Python to Oracle, SQL Server, MySQL, and PostgreSQL

6 Basic Pandas Techniques You Need to Know

Here are 6 basic pandas techniques you need to know to deal with data in python Pandas python library for data science, for data manipulation and data analysis. It is one of the most commonly used python libraries in data science. In this post, I will guide you through the six basic techniques you need … Read more6 Basic Pandas Techniques You Need to Know

Scikit-Learn Design Principles

Estimators Estimators represent the core interface in Scikit-Learn. All learning algorithms, whether supervised or unsupervised, classification, regression, or clustering, implement the Estimator interface and expose a fit method. An Estimator’s fit method takes as input a (training) feature vector (“samples” or “predictors”) as well as (training) target labels (in the case of supervised learning), and … Read moreScikit-Learn Design Principles

Predicting Micronutrients using Neural Networks and Random Forest (Part 3)

UNICEF wants you to help them to predict important nutritions within foods using the power of machine learning. Photo by Zoltan Tasi on Unsplash Greetings! Welcome to part 3 of the “Predicting Micronutrients using Neural Networks and Random Forest” blog series. In the previous blog post, we took a little bit of look at how … Read morePredicting Micronutrients using Neural Networks and Random Forest (Part 3)

Linear Algebra Essentials with Numpy (part 2)

Here comes a topic that I would say is slightly more complex to grasp on then the others encountered so far. It isn’t as hard as it might seem at first, but you’ll need to solve a couple of examples to get the gist fully. For the following examples in matrix multiplication section, two matrices … Read moreLinear Algebra Essentials with Numpy (part 2)

Social Media Sentiment Analysis using Machine Learning : Part — II

Hello everyone, so let’s start right where we left off in Part — I. In this post we will discuss how we can extract features from our textual dataset by using Bag-of-Words and TF-IDF. Then we will see how we can apply Machine Learning models using these features to predict whether a tweet falls into … Read moreSocial Media Sentiment Analysis using Machine Learning : Part — II

Oktoberfest : Quick analysis using Pandas, Matplotlib, and Plotly

Oktoberfest 2019 has started! Oktoberfest is the world’s largest beer festival and is held annually in Munich since 1810. It lasts between 16 and 18, running from mid or late September to the first Sunday in October, with more than 6 million visitors every year. 🍺 🍺 Munchen.de is the official portal of the city … Read moreOktoberfest : Quick analysis using Pandas, Matplotlib, and Plotly

NumPy and SciPy and Google Season of Docs, Oh My: Meet Christina Lee

Learn more about the technical writers paired with NumPy and SciPy during Google Season of Docs From September through November, our little corner of the open-source world is going to involve technical documentation updates at NumPy and SciPy! You’re going behind the scenes to meet the people and learn about some of the work we’re … Read moreNumPy and SciPy and Google Season of Docs, Oh My: Meet Christina Lee

SOLID Programming (Part 1): Single Responsibility Principle

SOLID principles are among the most valuable in Software Engineering. They allow to write code that is clean, scalable and easy to extend. In this series of posts I will explain what each of the principles is and why it is important to apply. Some people believe that SOLID is only applicable to OOP, while … Read moreSOLID Programming (Part 1): Single Responsibility Principle

Linear Algebra Essentials with Numpy (part 1)

Ah, math. You can’t avoid it forever. You can try, and then try harder, but sooner or later some basic intuition behind it will be needed, provided that you are serious in your endeavors to advance your career in data science. When it comes to linear algebra, I really like this quote: If Data Science … Read moreLinear Algebra Essentials with Numpy (part 1)

Tags recommendation algorithm using Latent Dirichlet Allocation (LDA)

Our dataset comes from stackexchange explorer, to export posts from the website you will have to make an SQL query like this:SELECT * FROM posts WHERE Id < 50000 By default, there is a time limit on the execution time of each SQL query, which can make it difficult to recover all the data at … Read moreTags recommendation algorithm using Latent Dirichlet Allocation (LDA)