How I Got My Articles Featured on TDS

Here’s what worked for me, in case you’re interested Photo by Thought Catalog on Unsplash I’m about a year and a half into writing articles on Medium and it has been a very fun journey for me. I get questions on LinkedIn about how I have gotten published on Towards Data Science and also how … Read more

Cloud NAT explained!Cloud NAT explained!Developer Advocate, GoogleProduct Manager

How is Cloud NAT different from typical NAT proxies Cloud NAT is a distributed, software-defined managed service, not based on proxy VMs or appliances. This proxyless architecture means higher scalability (no single choke point) and lower latency. Cloud NAT configures the Andromeda software that powers your Virtual Private Cloud (VPC) network so that it provides … Read more

Spotify API and Audio Features

One gal’s journey to make a playlist her mom can dance to Tableau Public Dashboard (only looks good on desktop, honestly)Jupyter Notebook This is a follow-up to my previous post, Visualizing Spotify Data with Python and Tableau. Dashboard filtered by “Danceability” — just out of view in the bottom right chart is Fergalicious taking the … Read more

From Prototype to Production for Data Scientists

Bringing ideas and models to production for data scientists You might be thinking “yeah, I know what a prototype is, what else is there to know?” You might be right, but do you know some of the different methodologies and terminology? These reduce friction when you’re trying to implement something new. Do you know the … Read more

AWS Data Exchange now supports automatic exports of third-party data updates

AWS Data Exchange subscribers can now use auto-export to automatically copy newly published revisions from their 3rd party data subscriptions to an Amazon S3 bucket of their choice in just a few clicks. With auto-export, subscribers no longer have to manually export new revisions or dedicate engineering resources to build ingestion pipelines that export new … Read more

Categories AWS ExcerptFavorite

People and planet AI: How to build a Time Series Model to classify fishing activities in the seaPeople and planet AI: How to build a Time Series Model to classify fishing activities in the seaCloud Developer Advocate

One tricky part is that the MMSI data location signal (which includes a timestamp, latitude, longitude, distance from port, and more) is not emitted at regular intervals. AIS broadcast frequency changes with vessel speed (faster at higher speeds), and not all AIS messages that are broadcast are received – terrestrial receivers require line-of-sight, satellites must … Read more

Introduction to Artificial Intelligence, Machine Learning, and Deep Learning with Tensorflow

This is the first part of a long series I hope to do on advanced Machine Learning. Tensorflow recently came out with 2.x and its integration with Keras makes it a really easy-to-use and functional language to learn. At this point, Tensorflow and PyTorch are pretty comparable, so learning either will serve you really well … Read more

Building a Machine Learning Model for Lead Decisions using the Tree Ensemble Learner in KNIME

Practical guide for building a classification-based predictive model for Lead Decisions in the KNIME Platform Image by Author Having the opportunity to utilize the Machine Learning algorithms in the Lead Generation field, I encountered the possibility to design and implement various lead scoring and lead decision predictive models. Previously, while covering the technical part of … Read more

The solution of the Heat equation

Two methods: Separation of variables & Fourier transform Image by the author. For more details, scroll down to the supplement:). The heat equation is one of the most famous partial differential equations. It has great importance not only in physics but also in many other fields. Sometimes a seemingly unsolvable partial differential equation can be … Read more

Cities, Maps, Dashboards

Humans have been recording and analysing data for centuries. Writing, for instance, was developed in ancient Mesopotamia around 3100BC because bureaucrats needed an efficient tool to record and track citizen information. Since The Babylonian Empire governments have held censuses to gather huge datasets on their citizenry, livestock and resources for taxation purposes. Putting available resources … Read more

error: JAVA_HOME cannot be determined from the Registry

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. error: JAVA_HOME cannot be determined from the Registry, This error notice … Read more

Categories R Tags ExcerptFavorite

How NOT to Analyze Time Series

Photo by Isaac Smith on Unsplash One of the most common time series data mistakes I see junior data scientists and interview candidates make is to assume that the data has regular ticks and has no gaps. This is a bad assumption. Take for example an interview exercise where I provide candidates with a dataset … Read more

Competition to win free training closes today

To celebrate EARL taking place this September we launched a competition to win a free online training course and the competition closes today at midnight – Submit your competition entry here. You have the chance to win a 2-day training course for you and up to 9 other attendees from your company. The winner can … Read more

Categories R Tags ExcerptFavorite

Advancing reliability through a resilient cloud supply chain

“My Advancing Reliability series has explored many initiatives we’ve put in place to improve the reliability of the Azure platform. Today, I’d like to explore our cloud supply chain, as a resilient hardware infrastructure is critical to provide reliable capacity to our customers reliably. Our supply chain is hard at work to deliver that infrastructure … Read more

Streamline your DDoS management with new Azure Firewall Manager capabilities

This post was co-authored by Alethea Toh, Program Manager, Azure Networking. As customers continue to adopt a Zero Trust security approach in their digital transformation, they often prefer a way to manage their network security policies and resources in one central place. Today, we are announcing that Azure Firewall Manager now supports managing Azure DDoS … Read more

Solving Einstein’s Puzzle with Constraint Programming

The following puzzle is a well-known meme in social networks. It is said to have been invented by young Einstein and back in the days I was ambitious enough to solve it by hand (you should try too!). Yet, even simpler is to use Constraint Programming (CP). An excellent choice for doing that is MiniZinc, … Read more

Categories R Tags ExcerptFavorite

Towards Removing Gender Bias in Writing

Learn how to understand the amount of gender bias in your daily consumption of literature and avoid it in your own writing. Photo by Sandy Millar on Unsplash Why gender bias? I grew up in a culture in which gender bias was pervasive. As a writer, I found this concerning and tried to mitigate the … Read more

Speeding up data analysis with TimescaleDB and PostgreSQL

Common pain points while evaluating, cleaning, and transforming data & how PostgreSQL and TimescaleDB helped me to fix them. created by Markus Winkler via Unsplash Common data analysis tools and “the problem” Data analysis issue #1: storing and accessing data Data analysis issue #2: maximizing analysis speed and computation efficiency (the bigger the dataset, the … Read more

An Intro to Machine Learning for Biomedical Scientists- part II: Hands-on Tutorial

An intro to ML for biomedical scientists: hands-on coding tutorial Image from Unsplash This is part II of the post An Intro to Machine Learning for Biomedical Scientists, which introduces the concept of Machine Learning (ML) for biomedical scientists. In this post, we will get to a hands-on coding tutorial using biomedical data. You can … Read more

rOpenSci News Digest, September 2021

Dear rOpenSci friends, it’s time for our monthly news roundup! You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci! 🔗 rOpenSci HQ A first package was submitted to rOpenSci Statistical Software Peer Review, two months after its opening: the tsbox package by Christoph Sax. We are … Read more

Categories R Tags ExcerptFavorite

Optimal disclosure risk assessment

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Protection against disclosure is a legal and ethical obligation for agencies releasing microdata … Read more

Categories R Tags ExcerptFavorite

Using bootstrapped sampling to assess variability in score predictions

[This article was first published on R’tichoke, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. When building risk scorecards, apart from the variety of performance metrics, analysts also … Read more

Categories R Tags ExcerptFavorite

Advances in Difference-in-Differences in Econometrics

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Advances in Difference-in-Differences in Econometrics The eighth “One World webinar” organized by YoungStatS … Read more

Categories R Tags ExcerptFavorite

An RStudio Table Contest for 2021

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We love tables here at RStudio. They serve as a fantastic means … Read more

Categories R Tags ExcerptFavorite

Advancements in Symbolic Data Analysis

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Advancements in Symbolic Data Analysis The sixth “One World webinar” organized by YoungStatS … Read more

Categories R Tags ExcerptFavorite

Integrating Scikit-learn Machine Learning models into the Microsoft .NET

Using the ONNX format for deploying trained Scikit-learn Lead Scoring predictive model into the .NET ecosystem Photo by Miguel Á. Padriñán from Pexels While being part of a team working on designing and developing a lead scoring system prototype, I faced the challenge of integrating machine learning models into the target environment built around the … Read more

F1 Strategy Analysis

[This article was first published on Sport Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was recently browsing reddit and found this AMA from a … Read more

Categories R Tags ExcerptFavorite

Basics of Markov Chain Monte Carlo Algorithms

The aim of this article is to give a conceptual understanding of Markov Chains and why we use them. Introduction Markov Chain Monte Carlo is a group of algorithms used to map out the posterior distribution by sampling from the posterior distribution. The reason we use this method instead of the quadratic approximation method is … Read more

Dimensionality Reduction cheatsheet

Dimensionality reduction algorithms represent techniques that reduce the number of features (not samples) in a dataset. In the example below the task is to reduce the number of input features (unroll swissroll from 3D to 2D) and save the largest ratio of information at the same time. This is the essence of the dimensionality reduction … Read more

Data Visualization in 2021 | An Overview of Dashboarding Technology in the Age of Big Data

Turning insights into impact, and impressing your boss along the way. Photo by Stephen Dawson on Unsplash TL;DR If your work is taking place in a Notebook, begin with a scheduled hosted drag and drop Notebook Dashboard like Deepnote, and then if you need distributed computing levels of scale move to Python + Streamlit or … Read more

How to Cut Your AWS ECS Costs with Fargate Spot and Prefect

The entire setup demonstrated below is available in this GitHub Gist as a bash script that you can adjust and run as: bash prefect_ecs_agent_deploy_script.bash Before you run it, you need to replace AWS_ACCOUNT_ID by your account ID, and set other variables, as described in the code comments. In the following sections, we’ll walk through all … Read more

3 Cool Features of Python Altair

It is more than a data visualization library Photo by Samantha Gades on Unsplash Data visualization is an integral part of data science. It expedites many tasks such as exploring data, delivering results, storytelling, and so on. Thankfully, there are great data visualization libraries for Python. Altair is a declarative statistical visualization library for Python. … Read more

Meeting Women Where They Are

Using NYC Subway Turnstile Data to Increase Representation in Tech Based on a prompt from the data science bootcamp METIS, I was recently challenged to help a Women in Tech organization increase exposure and reach by placing their staff teams at NYC subway stations to collect emails prior to their upcoming gala. Which stations to … Read more

Tricky Way of Using Dimensionality Reduction For Outlier Detection in Python

Before we move on, let’s establish a baseline performance. First, we will fit a CatBoostClassifier for a benchmark: We have got a ROC AUC score of 0.784. Now, let’s fit an Isolation Forest estimator to the data after imputing missing data: Even though powerful, Isolation Forest has only a few parameters to tune. The most … Read more

Hyperparameter Tuning with Grid Search and Random Search

Hyperparameters are parameters that are defined before training to specify how we want model training to happen. We have full control over hyperparameter settings and by doing that we control the learning process. For example in the random forest model n_estimators (number of decision trees we want to have) is a hyperparameter. It can be … Read more

Everything you need to know “activation functions” for deep learning models

We know that deep learning models are a combination of different components such as activation function, batch normalization, momentum, gradient descent, etc. So for this blog I picked up one part of DL and give a detailed explanation about activation function by answering the following questions-: What is an activation function? Why do we need … Read more

Announcing general availability of Amazon RDS for MySQL and Amazon Aurora MySQL databases as new data sources for federated querying

With Amazon Redshift federated query capability, many customers have been able to combine live data from operational databases with the data in Amazon Redshift data warehouse and the data in Amazon S3 data lake environment in order to get unified analytics view across all the data in the enterprise. Now Amazon Redshift federated query support … Read more

Categories AWS ExcerptFavorite

Build and run a Discord bot on top of Google CloudBuild and run a Discord bot on top of Google CloudDeveloper Relations EngineerDeveloper Relations Engineer

Stuck at home these past–checks calendar–732 months, I’ve been spending a lot more time on Discord (an online voice, video and text communications service) than I ever thought I would. Chatting with far-flung friends, playing games, exploring, finding community as I am able, and generally learning about a platform I had not used all that … Read more

Uncovering the Potential of Materials Data using Matminer and Pymatgen

Steps to decipher chemistry of materials by python-based materials libraries Materials Tetrahedron: Image by author, inspired from a photo by Markus Spiske on Unsplash Introduction The usage of different types of devices is an integral part of our everyday life. And these devices are made of myriad combinations of elements forming materials for daily applications. … Read more

Implementing Generative Adversarial Networks (GANs) for Increasing a Convolutional Neural Network’s…

Two experiments were conducted to analyze how a model’s performance can be affected by using a GAN for image data augmentation. Experiment 1: Training the CNN using the two different sized datasets with no augmented data. Figure: Accuracy and Loss of the Two Different Models with No Augmented Data While better models can be created … Read more

Introduction to Machine Learning with TensorFlow

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction to machine learning with TensorFlow! What is TensorFlow? The Google … Read more

Categories R Tags ExcerptFavorite

The Eclat algorithm

In this article, you will learn everything that you need to know about the Eclat algorithm. Eclat stands for Equivalence Class Clustering and Bottom-Up Lattice Traversal and it is an algorithm for association rule mining (which also regroups frequent itemset mining). Association rule mining and frequent itemset mining are easiest to understand in their applications … Read more

Decorators in R

Decorators have been made quite popular by Python but, did you know they also exist in R? Photo by Lenny Kuhne on Unsplash Decorators are typically used to extend the behaviour of a function in an elegant and minimally invasive way. Graphically, we can think of a decorator as: An analogy of a decorator. The original function … Read more

Categories R Tags ExcerptFavorite