The things you need to know to maximize your data science meetup experience! Photo by Austin Distel on Unsplash As an aspiring data scientist, I have often come across plenty of advice by seasoned data practitioners who write about their journeys and projects on platforms like Medium, Reddit, Analytics Vidhya and even on their own … Read more The Most Important Lessons my First Data Science Meetup taught me
Image Source: https://www.digitalocean.com/ Around the Universe Typing “what is machine learning?” into a Google search opens up a pandora’s box of forums, academic research, and here-say — and the purpose of this article in the 101 for Dummies like Me series is to simplify the definition and understanding of machine learning. In addition to an … Read more Machine Learning 101 for Dummies like Me
How to choose the right data structure from Python list, Numpy array, and Pandas DataFrame There are multiple data structures to work with a sequence of data in Python. The available data structures include lists, NumPy arrays, and Pandas dataframes. Oftentimes it is not easy for the beginners to choose from these data structures. In … Read more Python List, NumPy, and Pandas
Hanabi is a new game that combines cooperation between players with imperfect information. Earlier this year, researchers from DeepMind and Google published a paper proposing the game of Hanabi as the new frontier for artificial intelligence(AI) agents. The reason for the designation is that Hanabi combines many of the most difficult challenges for AI models … Read more Facebook Builds an AI Agent to Master the Hanabi Challenge: The New Frontier for AI Research
Using Go and PostgreSQL to analyze a year of streaming. By now, Spotify users everywhere have gotten their 2019 Wrapped results. I wasn’t too surprised by mine, partly because I knew I’d spent a lot of time exercising to music by a soon-to-be-returning band from the 90s, but also because I’d recently used Spotify’s privacy … Read more I wrapped my Spotify history the hard way.
From linear regression to unsupervised learning, this guide covers everything you need to know to get started in machine learning. Theory and practical exercises are covered for each topic! Linear regression — theory Linear regression — practice Logistic regression — theory Linear discriminant analysis (LDA) — theory Quadratic discriminant analysis (QDA)— theory Logistic regression, LDA … Read more The Complete Hands-On Machine Learning Crash Course
Again, the main objective of this dataset is to study what are the factors that affect the survivability of a person onboard the titanic. First thing that comes to my mind is to display how many passengers survived the titanic crash. Hence, visualizing the Survived column itself will be a good start. In Plotly, we … Read more Python For Data Science — A Guide to Data Visualization with Plotly
Power BI is really growing on me. Using this software everyday, I am amazed by the capabilities of this product and the way it improves month after month by adding new features or optimizing current capabilities. However, like any software solution, there is still room for improvement and a lack of some features that would … Read more Power BI: Implement AND/OR Selection
Image Source Chapter 6 of Data Science in Production Demonstrated experience in PySpark is one of the most desirable competencies that employers are looking for when building data science teams, because it enables these teams to own live data products. While I’ve previously blogged about PySpark, Parallelization, and UDFs, I wanted to provide a proper … Read more PySpark for Data Science Workflows
A human-friendly longest contiguous & junk-free sequence comparator SequenceMatcher is a class available in python module named “difflib”. It can be used for comparing pairs of input sequences. The objective of this article is to explain the SequenceMatcher algorithm through an illustrative example. Due to the limited docs available, I thought to share the concept … Read more SequenceMatcher in Python
Should we use the mean or median? After seeing my beloved Hawthorn Hawks tweet out an article on their website regarding player ages for each team, it got me riled up that the media love to cite Champion Data’s “average age” as their measure. As can be seen with the age distribution of players as … Read more When the statistic used changes the AFL age narrative for 2020
Enterprise Data Science is much more than Data Science Aerial Target Interdiction Archives: California Tactical Academy Who is your customer? Earlier this year I sat in on a presentation featuring an aircraft that will “possibly” replace the Boeing V-22 Osprey. The V-22 Osprey is a multi-mission military aircraft that has been in active military service … Read more The Single Strategy You Need to find your Development Target
Thinking further ahead, I found that the logistic regression model would be unable to utilize incremental learning later in the project, while the linear SVM (powered by sklearn’s SGDClassifier()) would. The accuracy trade-off was small with the logistic regression having an average F1-score of 93%, while the SVM having an average F1-score of 92%. Thus, … Read more Building and Deploying a Data Science Project in Two weeks
Finding Spearman’s rank correlation coefficient using Python for IB Diploma Mathematics Photo by Mikael Kristenson on Unsplash In this article, I will show the necessary steps using Python to find the Spearman’s rank correlation coefficient. Monotonic function shows the relation between ordered sets. The Spearman’s rank correlation is useful to find this relation of ordinal … Read more Discover the strength of monotonic relation
The Advantages and Disadvantages of Deep Learning Implementation Faced by Today’s Businesses and Organizations. Image Source: Pixabay Deep learning has two key strengths that set it apart from other machine learning techniques. The first of these is feature learning. With other techniques, data scientists manually transform features to get the best results with a particular … Read more Eight Deep Learning Pros and Cons
How to deploy a trained ML model behind a Flask API on the internet. Coming from a software development background I never thought twice about productionizing my models. But for anyone just getting into coding, as a data scientist, it may not be obvious. For a model to be used in the real world, you’ll … Read more Productionize a Machine Learning model with Flask and Heroku
Briefly explaining — the program first extracts 68 landmark points from each face. Of those 68 points, points 37–42 belong to the left eye and points 43–48 belong to the right eye — see picture below. Visualisation of 68 landmark points. (Image from pyimagesearch) Because our goal is to apply eyeliner, we are only interested … Read more Artificial Eyeliner on LIVE Feed using Python, OpenCV and Dlib
I love Physics. And if only I hadn’t developed the taste to live my professional life outside of university campus labs, I would have continued on the course. At some point in my life, however, I sensed I would enjoy the lifestyle that many software developers have. Naturally, Data Science seemed like the best of … Read more Can Data Science get inspiration from Physics?
Detecting regions of Neanderthal ancestry with Deep Learning This is the seventh post of my column Deep Learning for Life Sciences where I give concrete examples of how Deep Learning can already now be applied in Computational Biology, Genetics and Bioinformatics. In the previous posts, I demonstrated how to use Deep Learning for Ancient DNA, … Read more Deep Learning on Neanderthal Genes
A Framework & Package That Ease the Pain Source: ThremoFisher Scientific Three weeks into my journey to become a data scientist and I’ve officially been baptized… by fire, that is! I chose to attend Flatiron’s Data Science 15-week bootcamp to transition out of finance. So far, the program has exceeded expectations (and my expectations were … Read more Missing Data?
R for Industrial Engineers Exploring the “SixSigma” R package Image by Lenny Kuhne available at Unsplash Process capability analysis represents a significant component of the Measure phase from the DMAIC (Define, Measure, Analysis, Improve, Control) cycle during a Six Sigma project. This analysis measures how a process performance fits the customer’s requirements, which are translated … Read more Process Capability Analysis with R
Image adapted from shap.readthedocs.io For insight into Shapley values and the SHAP tool. Most other sources on these topics are explanations ased on existing primary sources (e.g. academic papers and the SHAP documentation). This post is an attempt to gain some understanding through an empirical approach. To learn about an alternative approach to computing Shapley … Read more A new perspective on Shapley values: the Radical Shapley method
Understanding the effect of random states on model outcome Tuning hyperparameters, performing the right kind of feature engineering, feature selection etc are all part of the data science flow for building your machine learning model. Hours are spent tweaking and modifying each part of the process to improve the outcome of our model. However, there … Read more Manipulating machine learning results with random state
Now, let’s have a bird’s-eye view of the listings distribution across Kuala Lumpur. As you can see, most of the Airbnb listings are accumulated in the city center area of Kuala Lumpur. That’s why the price of the listings are lower in the city area as supply is much higher. Feel free to navigate the … Read more How to invest in the best Airbnb rental property
What if I ask you to think about data science work? Chances are that images of notebooks, data plots, algorithms and programming snippets pop into your mind. When talking about this sort of work very often we focus on tools and techniques useful for specific tasks ranging from things like classification or NLP, through cool … Read more From Business Needs to Data Science Tasks
Improving model sensitivity and accuracy by attaching attention gates on top of the standard U-Net Medical image segmentation has been actively studied to automate clinical analysis. Deep learning models generally require a large amount of data, but acquiring medical images is tedious and error-prone. Attention U-Net aims to automatically learn to focus on target structures … Read more Biomedical Image Segmentation: Attention U-Net
Michael Kratsios CTO of the United States and Regulating AI An initiative that I think is great and that must be commended by the Institute for Human-Centered Artificial Intelligence at Stanford University is the sharing of videos spanning the entirety of their fall conference on AI Ethics, Policy and Governance. The two long videos, each … Read more A Cleaner Regulatory Approach to AI?
Photo by Philipp Lublasser on Unsplash With the 2020 Presidential election being within a year away, all the chatter on the news that’s not about impeachment has been about who’s going to be the Democratic Candidate to face President Trump. Existing information sources that include news outlets and polls exist but to me personally, I … Read more Does YouTube’s Data Tell Us A Story About The 2020 Democratic Candidates?
With the map shapes prepped, I’m ready to put that pet data to use! Take away #3: Geopandas.plot() uses pretty much all the argument’s you know and love from matplotlib.pyplot First, I added the new intersection geometries determined in the last step to my GeoDataFrame. Note: Geopandas always plots the “geometry” column by default to … Read more Puppies & Python: Analyzing Geospatial Data
Businesses should be using data to optimize processes, increase efficiency, and accelerate innovation. Image Source: UnSplash According to Gartner, In 2010, the number of large enterprises with a Chief Data Officer (CDO) was 15. By 2017, it was up to 4,000. In 2020, it’ll be over 10,000. Those numbers are remarkable, but does the sheer … Read more The Importance of Data-Intelligence for Businesses
Now, what if I wanted to compare James Harden’s forecast to Giannis Antetokounmpo, LeBron James, and Kawhi Leonard. To do this, I’ll take all of the pre-processing and put it in a single function, so I can pre-process the data without repeating lines of code. Below, I’ve supplied a gist that contains the required functions. … Read more The Fastest and Easiest Way to Forecast Data on Python
Transformation, oversight, governance. security and transparency are all paramount as the industry continues to evolve. Image Source: Pixabay This piece will inform policies and actions for AI to be fully productive and beneficial, and enable your organization to build and deploy AI models that have integrity and transparency. At stake are business outcomes — and … Read more Five Steps to Ethical AI for Businesses
The convolution for RGB images is quite similar to the grey-scale case. The equation 4 can be adapted to RGB image adding another loop to iterate over the RGB channels as follows: The additional loop over the variable 𝑐 allows the iteration on the channels RBG. As a result, the sum is done over three-dimensional … Read more Visualizing the Fundamentals of Convolutional Neural Networks
Exploratory Data Analysis of the bicyclists counts data set The following table contains counts of bicyclists traveling over various NYC bridges. The counts were measured daily from 01 April 2017 to 31 October 2017. We’ll focus on the counts on the Brooklyn bridge. Source: Bicycle Counts for East River Bridges (NYC OpenData) I have cut … Read more Fitting Linear Regression Models on Counts Based Data
Resolving a fundamental conflict between classical statistics and modern ML advice Photo by Franki Chamaki on Unsplash I recently came across a very interesting paper written at OpenAI on the topic of deep double descent. The paper touches at the very nature of training machine learning systems and model complexity. I hope to summarize the … Read more Deep Double Descent: when more data and bigger models are a bad thing
The year is coming to an end, and if any of you work in the relevant departments of the company’s financial accounting, you must be busy preparing various annual financial statements. Especially for beginners who have just entered this field, how to design reports to clearly show the financial analysis and business operation status is … Read more 6 Data Analysis Methods to Help You Make Great Financial Statements
What I learned from analyzing 300K German online deals Every year on Black Friday, people try to find the best deals hoping to save a ton of money. In the U.S. the Black Friday craziness has even lead to 12 deaths and 117 injuries thus far. If you are like me and prefer to do … Read more Data Science and Black Friday: When, how and where to find the best deal?
The where() function will return elements from an array that satisfy a certain condition. Let’s explore it with an example. I’ll declare an array of grades of some sort (really arbitrary): You can now use where() to find, let’s say, all grades that are greater than 3: Note how it returns the index position. The … Read more Top 4 Numpy Functions You Don’t Know About (Probably)
You should have a directory for every project, and a virtual environment for every directory. This structure does two important things: It keeps your stuff organized appropriately, which makes it easier to keep projects separate, manage dependencies, and keep out things that shouldn’t be there. (Who likes having to undo git commits?) It lets you … Read more Power up your Python Projects with Visual Studio Code
Data science is a very hands-on and practical field. Data science requires a solid foundation in mathematics and programming. As a data scientist, it is essential that you understand the theoretical and mathematical foundations of data science in order to be able to build reliable models with real-world applications. In data science and machine learning, … Read more Theoretical Foundations of Data Science— Should I Care or Simply Focus on Hands-on Skills?
Comparing models in a social media NLP challenge. Zen and the Art of Motorcycle Maintenance was one of my favorite books in college. Set amidst a father-son motorcycle journey across the United States, the book considers how to lead a meaningful life. Arguably, the key message expounded by the author, Robert Pirsig, is that we … Read more Zen and the Art of Model Optimization: Comparing models in a social media NLP challenge
Artificial Intelligence (AI) is a complex and evolving field. The first challenge an AI aspirant faces is understanding the landscape and how he could navigate through it. Consider this, if you are travelling to a new city, and if you don’t have the map, you will have trouble to navigate the city and you will … Read more How to Navigate Artificial Intelligence Landscape?
Exploring the Definition of Bias in AI When the word ‘data’ is talked about in artificial intelligence ‘bias’ is often mentioned alongside the given discussion. I will have a general discussion of bias first that will be related to subjectivity and objectivity. Whereafter I will return to a summary of things to consider in relation … Read more Subjective and Objective in the Development of Artificial Intelligence
Mean Reversion with Bollinger bands gone wrong Photo by Rick Tap on Unsplash A few years ago we did some work with a Trading simulation. Our strategy was Mean Reversion with Bollinger Bands. Thankfully, it was only a simulation as the losses from share variations and software bugs were horrific. In this article we offer … Read more Algorithmic trading
And build a great mathematical foundation Photo by Antoine Dautry on Unsplash Awhile back I wrote an article on the top books to get started with data science: The Top 3 Books to Get Started with Data Science Right Now And build a great foundation of knowledge towardsdatascience.com One of the readers left a comment … Read more The Top 3 Books to Learn Math for Data Science Right Now
Why does management need to observe data-science-related KPIs? Binoculars Man, Pixabay. Observability for data-science (DS) is a new and emerging field, which is sometimes mentioned in tandem with MLOps or AIOps. New offerings are being developed by young startups to address the lack of monitoring and alerts for everything data-science. However, they are mostly addressing … Read more Data-Science Observability For Executives
Yes. It does. We get a 2x improvement in run time vs. just using the function as it is. So what exactly is happening here? Source: How increasing data size effects performances for Dask, Pandas and Swifter? Swifter chooses the best way to implement the apply possible for your function by either vectorizing your function … Read more Add this single word to make your Pandas Apply faster
In the current business climate, application modernization represents both a significant opportunity and a technological challenge. Image Source: Pixabay In today’s digital economy, organizations that run legacy applications run the risk of being disrupted and putting themselves in a profoundly uncompetitive position. Those with an agile environment, hosting core apps on the hybrid Cloud, are … Read more How and Why Businesses Should Modernize Applications
As usual, let’s first load the data. For this exercise, we will use Boston housing data available in sklearn.datasets. Loading the Boston data and splitting it into training and testing dataframe Now, to accomplish model averaging, we will use RMSE (root mean squared error) as the model fitness. Then we will average them using the … Read more Model Averaging: A Robust Way to Deal with Model Uncertainty
What you must put in place to achieve tangible success using machine learning and artificial intelligence technologies. Image Source: Pixabay AI can provide a host of benefits for the enterprises and organizations of today. From understanding customer behavior to fraud detection, visualizing analytical sentiments, and predicting machine failure. Machine learning, if implemented efficiently, promises to … Read more 4 Ways to Successfully Scale AI and Machine Learning for Businesses