How we Created an Open-Source COVID-19 Chatbot

As I just mentioned, we calculate the (cosine of the) angle between these dots (encodings), to compare how semantically equal the sentences are. Since each encoding already has length 1, we only need to calculate the internal product. The internal product calculates the cosine of the angle between the red and the blue dot, resulting … Read more How we Created an Open-Source COVID-19 Chatbot

Solutions and guidance to help content producers and creators work remotely

The global health pandemic has impacted every organization on the planet—no matter the size—their employees, and the customers they serve. The emphasis on social distancing and shelter in place orders have disrupted virtually every industry and form of business. The Media & Entertainment (M&E) industry is no exception. Most physical productions have been shut down … Read more Solutions and guidance to help content producers and creators work remotely

A simple guide to knowing your neural network and activation.

Analyzing which function seems to fit Photo by Steven Wright on Unsplash To better understand the intents and purposes of activation functions, let us analyze an equivalent model within ourselves — neurons ( the inspiration for neural networks in the first place). A biological neural network simply consists of a cell body, dendrites (inputs from … Read more A simple guide to knowing your neural network and activation.

tf.data: Creating data input pipelines

Are you not able to load your NumPy data into memory?Does your model have to wait for data to be loaded after each epoch?Is your Keras DataGenerator slow? TensorFlow tf.data API allows building complex input pipelines. It easily handles a large amount of data and can read different formats of data while allowing complex data … Read more tf.data: Creating data input pipelines

Will we ever solve the Shortage of Data in Medical Applications?

Medical data is a fundamental requirement for new deep learning applications in medicine. Photo by Pixabay from Pexels. In the age of deep learning, data became an important resource to build powerful smart systems. In several fields, we already see that the amount of data that is required to build competitive systems is so large … Read more Will we ever solve the Shortage of Data in Medical Applications?

Document search with fragment embeddings

COVID-19 questions — a use case for improving sentence fragment search Figure 1. Illustrates embeddings driven fragment search used to answer specific questions (left panel) as well broader questions(right panel). The highlighted text fragments in yellow are document matches to search input obtained using BERT embeddings. The right panel is a sample of animals with … Read more Document search with fragment embeddings

Memoization in Python

Memoization is a term introduced by Donald Michie in 1968, which comes from the latin word memorandum (to be remembered). Memoization is a method used in computer science to speed up calculations by storing (remembering) past calculations. If repeated function calls are made with the same parameters, we can store the previous values instead of … Read more Memoization in Python

Small motivation and easy start to learn SQL using Pandas and pandasql

From giphy.com (above) data scientists image in my head (below) me trying to be one in my good days Not because you must but because it makes your life easier When I started to learn data science alone there are so many things that I was asked to learn. I started with Python coding and … Read more Small motivation and easy start to learn SQL using Pandas and pandasql

What about Flattening the Infodemic Curve?

Leveraging ethical AI and human-centric product design to treat the chronic disease of the digital economy Photo by Elijah O’Donnell on Unsplash Intertwined with the rapidly unfolding Coronavirus epidemic is an insidious infodemic which may prove no less deadly. In February 2020, the director-general of the World Health Organization, Tedros Adhanom Ghebreyesus, made first coined … Read more What about Flattening the Infodemic Curve?

5 Soft Skills You Need As A Machine Learning Engineer (And Why)

Photo by Kevin Ku on Unsplash Time management is the process of delegating a defined amount of time to a specific task to achieve a defined measurement of success. A by-product of successful time management is efficient task completion and an increase in productivity. Although primarily an ML Engineer is expected to implement machine learning … Read more 5 Soft Skills You Need As A Machine Learning Engineer (And Why)

How To Model Time Series Data With Linear Regression

We all learnt linear regression in school, and the concept of linear regression seems quite simple. Given a scatter plot of the dependent variable y versus the independent variable x, we can find a line that fits the data well. But wait a moment, how can we measure whether a line fits the data well … Read more How To Model Time Series Data With Linear Regression

Find and plot your optimal path using Plotly and NetworkX in Python

Many libraries can be used to plot a path using Google Maps API, but this leads to reduced flexibility. Also, if you use a set of lines to draw a path then, in a lack of better words, it doesn’t look good. Let me give you an example: Generated using Plotly Also, on many occasions, … Read more Find and plot your optimal path using Plotly and NetworkX in Python

Build A Keyword Extraction API with Spacy, Flask, and FuzzyWuzzy

Often when dealing with long sequences of text you’ll want to break those sequences up and extract individual keywords to perform a search, or query a database. If the input text is natural language you most likely don’t want to query your database with every single word — instead, you probably want to choose a … Read more Build A Keyword Extraction API with Spacy, Flask, and FuzzyWuzzy

Scraping Gdpr Fines

The website Privacy Affairs keeps a list of fines related to GDPR.I heard * that this might be an interesting dataset for TidyTuesdays. The dataset contains at thismoment 250 fines given out for GDPR violations and is last updated (according to the website) on 31 March 2020. All data is from official government sources, such … Read more Scraping Gdpr Fines

capitals: Knowledge Quiz Question about Capitals around the World

[This article was first published on R/exams, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Exercise template for a multiple-choice knowledge quiz question with flexible shuffling of the … Read more capitals: Knowledge Quiz Question about Capitals around the World

Community detection of the countries of the world with Neo4j Graph Data Science

Network analysis containing feature reduction techniques, similarity network inference, and community detection with the Neo4j Graph Data Science library While I was waiting for my sourdough to rise, I thought, the best way to spend my time is to perform a network analysis with the Neo4j Graph data science library. Well, maybe also try to … Read more Community detection of the countries of the world with Neo4j Graph Data Science

Embracing Bayesian A/B Test Measurement (and ditching P values)

We’ll start with a traditional measurement approach. We fit a logistic regression model to our data to see if Group (our predictor) has a statistically significant relationship to Conversion (our outcome). Our null hypothesis is that there is no relationship between Group and Conversion, and we’ll use a standard P-value threshold of 0.05 to either … Read more Embracing Bayesian A/B Test Measurement (and ditching P values)

Dealing with Growing Impatience? Push your Real-Time ML Services to Production on AWS!

Building the right cloud infrastructure for your ML application is usually underrated. Here is a glimpse of how easy it can get thanks to widely democratized cloud provisioning. The number of times I have seen research groups, students or coworkers coming up with a model achieving great performances has been recently astonishing. (And I’m not … Read more Dealing with Growing Impatience? Push your Real-Time ML Services to Production on AWS!

Identify your Data’s Distribution

Is your Distribution’s assumption correct? Let’s find it out. Photo by Luke Cheeser on Unsplash Every day we come across a variety of Data like Sensor Data, Sales Data, Customer Data, Traffic Data, etc. Further, depending on the use case, we do a variety of processing and try out several algorithms on it. Have you … Read more Identify your Data’s Distribution

5 AI Pitfalls for Business & How to Avoid Them

If things go wrong with an AI project, the best case is usually wasted investment or missed opportunity. For example, failing to keep up with competitors. In worse cases, AI may inflict damage to some aspect of your business. Casualties can include sales growth, customer satisfaction, brand or operational efficiency. Business members of the AI … Read more 5 AI Pitfalls for Business & How to Avoid Them

Last month today: March in Google CloudLast month today: March in Google Cloud

While many of us had plans for March—including simply carrying out our normal routines—life as we know it has been upended by the global coronavirus pandemic. In a time of social distancing, technology has played a greater role in bringing us together. Here’s a look at stories from March that explored how cloud technology is … Read more Last month today: March in Google CloudLast month today: March in Google Cloud

How to use AWS Lambda and CloudWatch for beginners

Let’s build a simple serverless workflow using AWS services! I found a cool website (https://covid19api.com/) where we can easily access COVID19 data using free API. This gave me an idea to create simple function to grab the data using AWS Lambda and save it to S3. The script will be executed daily automatically using CloudWatch. … Read more How to use AWS Lambda and CloudWatch for beginners

How to Create a Data Science Portfolio — by a Data Scientist

Good day reader,I hope everyone is staying safe and washing their hands. It’s really during times like these where mental and physical health is extremely important to keep us going. As I mentioned in my previous article,data science does not crash with the economy.The data industry is still in demand, some would even argue it … Read more How to Create a Data Science Portfolio — by a Data Scientist

Amazon Managed Cassandra Service (preview) now helps you coordinate increments and decrements to column values by using counters

Counters make it easier to coordinate increments and decrements to column values in distributed systems, or in scenarios where increments happen rapidly. For example, you can use counters to track the number of entries in a log file or the number of times a post has been viewed on a social network. Using counters ensures … Read more Amazon Managed Cassandra Service (preview) now helps you coordinate increments and decrements to column values by using counters

Predictive Maintenance: Zero to Deployment in Manufacturing

[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Predictive maintenance has been seen as a holy … Read more Predictive Maintenance: Zero to Deployment in Manufacturing

Building a Career in Data Science with Emily Robinson

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Data science exists at the intersection of a number of genuinely technical topics, from statistics to programming to machine learning. … Read more Building a Career in Data Science with Emily Robinson

Fight COVID-19 with machine learning

9 ways machine learning helps us fight the viral pandemic Viral pandemics are a serious threat. COVID-19 is not the first, and it won’t be the last. But, like never before, we are collecting and sharing what we learn about the virus. Hundreds of research teams around the world are combining their efforts to collect … Read more Fight COVID-19 with machine learning

How to ULTRALEARN Data Science

Supercharge your data science learning journey I just finished Ultralearning by Scott Young, and I thought that the concepts in this book could help many people who are looking to learn data science. Scott used this approach to learn the entire MIT undergrad computer science coursework in a single year (it usually takes four) and … Read more How to ULTRALEARN Data Science

R Consortium Member Esri Empowers Informed Decision-Making Around COVID-19

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Esri, international supplier of geographic information system software, web GIS and geodatabase … Read more R Consortium Member Esri Empowers Informed Decision-Making Around COVID-19

F is for filter

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. For the letter F – filters! Filters are incredibly useful, especially when … Read more F is for filter

Italian covid-19 Analysis with python

Photo by Gerd Altmann from Pixabay This tutorial analyses data about COVID-19 released by the Italian Protezione Civile and builds a predictor for the end of the epidemics. The general concepts behind this predictor are described in the following article: https://medium.com/@angelica.loduca/predicting-the-end-of-the-coronavirus-epidemics-in-italy-8da9811f7740. The code can be downloaded from my github repository: https://github.com/alod83/data-science/tree/master/DataAnalysis/covid-19. The main objective of … Read more Italian covid-19 Analysis with python

Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

This series is not meant to quickly show you some plots with lots of colorful curves that are supposed to convince you that my model can perfectly predict coronavirus cases to a tee all over the world; Rather, I’ll explain all the background necessary for you to understand these models, form your own opinion of … Read more Infectious Disease Modelling, Part I: Understanding the models that are used to model Coronavirus

Introducing incremental enrichment in Azure Cognitive Search

Incremental enrichment is a new feature of Azure Cognitive Search that brings a declarative approach to indexing your data. When incremental enrichment is turned on, document enrichment is performed at the least cost, even as your skills continue to evolve. Indexers in Azure Cognitive Search add documents to your search index from a data source. … Read more Introducing incremental enrichment in Azure Cognitive Search

COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths

One of the biggest problems of the COVID-19 pandemic is that there are no reliable numbers of infections. This fact renders many model projections next to useless. If you want to get to know a simple method how to roughly estimate the real number of infections and expected deaths in the US, read on! As … Read more COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths

Tutorial: ggplot2 Heatmaps and Traffic Deaths in Thailand

Photo by Dan Freeman on Unsplash So in this tutorial, we’ll be making a heatmap of the most dangerous countries to drive in, as measured by the number of traffic deaths per 100,000 residents. We’ll use R and ggplot2 to visualize our results. Is It Really That Dangerous To Drive in Thailand? It’s often said … Read more Tutorial: ggplot2 Heatmaps and Traffic Deaths in Thailand

ImportError: No module named ‘XYZ’

The Inspection The thing to check is which python is the Jupyter Notebook using. So type the following command in the Jupyter notebook to pull out the executable paths. import syssys.path Here are what I got, ‘/Users/yufeng/anaconda3/envs/py33/lib/python36.zip’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6′,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/lib-dynload’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/aeosa’,’/Users/yufeng/anaconda3/envs/py33/lib/python3.6/site-packages/IPython/extensions’,’/Users/yufeng/.ipython’ However, if I type the same command in the system’s Python, here are what I got, ‘/Users/yufeng/anaconda3/lib/python37.zip’, ‘/Users/yufeng/anaconda3/lib/python3.7’, … Read more ImportError: No module named ‘XYZ’

Algorithmic Complexity

Community finding algorithms in real networks Zachary’s karate club network with 2 communities identified. Due to the size of real networks, it is sometimes unfeasible to use brute-force algorithms to define communities. Algorithms used to handle these problems, in the best-case scenario, run in polynomial time. Although, most of the times, it is necessary to … Read more Algorithmic Complexity

Reproducible Machine Learning

A step towards making ML research open and accessible Photo credit: geralt via Pixabay The NeurIPS (Neural Information Processing Systems) 2019 conference marked the third year of their annual reproducibility challenge and the first time with a reproducibility chair in their program committee. So, what is reproducibility in machine learning? Reproducibility is the ability to … Read more Reproducible Machine Learning

registration open for online NIMBLE short course, June 3-5, 2020

[This article was first published on R – NIMBLE, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Registration is now open for a three-day online training workshop on … Read more registration open for online NIMBLE short course, June 3-5, 2020

Retries in API packages and reinventing the wheel

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Web APIs can sometimes fail for no particular reason;therefore packages … Read more Retries in API packages and reinventing the wheel

10 Things We Learned in Creating the Blog Guide with bookdown

After soliciting, reviewing, and publishing over 100 blog posts and tech notes by rOpenSci community members, we have created the rOpenSci Blog Guide for Authors and Editors to address many frequently asked questions and frequently given suggestions. Technically, we structured the content as a bookdown gitbook. It was Stef’s first foray into the glorious process … Read more 10 Things We Learned in Creating the Blog Guide with bookdown