6 Data Science Certificates To Level Up Your Career

Because of the appeal of the field of data science and the premise of high incomes, more and more people decide to join the field every day. Some may come from a technical background, while others just join in due to curiosity; regardless of the reason you decide to join the field, your no.1 goal … Read more 6 Data Science Certificates To Level Up Your Career

Scikeras Tutorial: A MIMO Wrapper for CapsNet Hyperparameter Tuning with Keras

Building up on our discussion so far, the wrapper would need to override both BaseWrappers.feature_encoder() and BaseWrappers.target_encoder() . Depending on the type of transformation required, we could either resort to writing our custom transformer, or use one of the many transformers that are already offered in sklearn.preprocessing . For this tutorial, we will demonstrate both … Read more Scikeras Tutorial: A MIMO Wrapper for CapsNet Hyperparameter Tuning with Keras

2020 NFL Postseason Predictions from Machine Learning Model — Conference

Why does V 5.0 favor the Bills? Last week I described how the top contributors to the random forest algorithm were mostly power rankings [eatdrinkandsleepfootball], but also efficiency metrics [numberFire]and Elo scores [FiveThirtyEight]. For the Bills and the Chiefs, the power rankings and Elo scores are quite similar, but the efficiency metrics differ a bit, … Read more 2020 NFL Postseason Predictions from Machine Learning Model — Conference

Detection of DeepFakes and other facial image manipulations via AMTENnet

Accuracy and supremacy of AMTENnet AMTENnet was tested against the following state-of-the-art baseline models: In addition, other state-of-the-art modules were used to replace the AMTEN module, thus generating several hybrid models. These modules were SRM filter kernels by [Zhou et al., 2018], Constrained-Conv by [Bayar and Stamm, 2018] and hand-crafted feature extractor by [Mo et … Read more Detection of DeepFakes and other facial image manipulations via AMTENnet

Introducing OddFrames.jl: Data In One Dimension

With my approach to creating a great package to handle one-dimensional data, I wanted something that mixed a lot of concepts, but also really showed the power of the Dictionary datatype. There are many different advantages to using a basic datatype like a dictionary. One of the great things is that dispatch for the type … Read more Introducing OddFrames.jl: Data In One Dimension

How Precision and Recall Affect the Anti-COVID Measures

Now we understand the confusion matrix, we can then get into how we evaluate the results of the testing. For the sake of illustration, I’ll use the following scenario as an example. Instinctively, people would think the accuracy is the right results/the total results. So we have an accuracy of (90+1)/(90+1+1+8)=91%. Wow, that is impressive! … Read more How Precision and Recall Affect the Anti-COVID Measures

Machine Learning Model as a Serverless App using Google App Engine

Create a folder for the project and download the code files for this article from the repository here. Then navigate to this directory using terminal (cd <path_to_dir>) and make sure that the virtual environment is active (conda activate <env_name>). Navigating to the project directory and activating a virtual environment Obviously, you can do the same … Read more Machine Learning Model as a Serverless App using Google App Engine

Detecting Malaria with Deep Learning for Beginners

A beginner’s guide for Image Classification and Convolutional Neural Network (CNN) Image by Cassi Josh In this project, we will go through a dataset provided by the US’ National Institutes of Health for 27,558 different cell images from 150 patients that have been infected by parasites that cause Malaria called Plasmodium falciparum and mixed with … Read more Detecting Malaria with Deep Learning for Beginners

Reticulate webinar – R and Python – a happy union

Wednesday (20th January 2021) myself and a colleagues Andreas kicked off the first webinar of 2021 for the NHS-R Community with our look at the benefits of using reticulate for joining up R and Python. What was the webinar about? The webinar was split into two sections: The first session involved me taking the functionality … Read more Reticulate webinar – R and Python – a happy union

Announcing new Amazon EC2 T4g instances powered by AWS Graviton2 processors along with a T4g free trial in Asia Pacific (Sydney, Singapore), Europe (London), North Americas (Canada Central, San Francisco), and South Americas (Sao Paulo) regions

Starting today, the latest generation of burstable, general purpose Amazon EC2 T4g instances are now available in Asia Pacific (Singapore, Sydney), Europe (London), North Amreicas (Canada Central, San Francisco), and South Americas (Sao Paulo) regions. These instances are powered by Arm-based AWS Graviton2 processors and deliver up to 40% better price performance over T3 instances. … Read more Announcing new Amazon EC2 T4g instances powered by AWS Graviton2 processors along with a T4g free trial in Asia Pacific (Sydney, Singapore), Europe (London), North Americas (Canada Central, San Francisco), and South Americas (Sao Paulo) regions

Take the first step toward SRE with Cloud Operations SandboxTake the first step toward SRE with Cloud Operations SandboxDeveloper Programs EngineerDeveloper Programs Engineer

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues … Read more Take the first step toward SRE with Cloud Operations SandboxTake the first step toward SRE with Cloud Operations SandboxDeveloper Programs EngineerDeveloper Programs Engineer

A Zero-Maths Introduction to Bayesian Statistics

Decoding the crusades of the statistics world — Bayesian vs Frequentism This one doesn’t need much introduction. Thousands of articles, papers have been written and a few wars have been fought on Bayesian vs Frequentism. In my experience, most folks start with usual linear regression and work their way up to build more complex models … Read more A Zero-Maths Introduction to Bayesian Statistics

These are the top 10 skills you need to master in 2021

Photo by Dmitry Ratushny on Unsplash A new report by Skillsoft reveals the most useful skills to have in 2021 Suppose you are an employee, a job-seeker, or a manager. In that case, you have many opportunities to refresh your skills or add a new skill to your list at the beginning of the year. … Read more These are the top 10 skills you need to master in 2021

Transformation of a simple movie dataset into a functional Recommender System

The recommender system presented in this article was realized in 4 major steps:– Step 1: Calculation of the weighted average score of each movie in order to propose to the end-user a catalog of the 100 most popular movies of the Cinema– Step 2: Setting up the recommendation of 5 “popular” movies using a machine … Read more Transformation of a simple movie dataset into a functional Recommender System

3 Tips to Succeed as a Data Scientist

Analytical Workflows In this section, I want to describe some general principles for analytical work and how we can design workflows that are reproducible and more efficient. The tools and workflow I use will differ slightly depending on the task but I will walk through an example of the type of problems I have worked … Read more 3 Tips to Succeed as a Data Scientist

LondonR February 2021

In 2020 LondonR moved online and whilst we have missed the opportunity to connect face-to-face over a few drinks, it has been great to keep in touch with the R stats community virtually. The move to online meetups has enabled us to make LondonR more accessible – something we will take forward with us when … Read more LondonR February 2021

Sentiment Analysis of Surnames

Sentiment analysis is typically applied to connected text such as product reviews. However, it can also be extended to names, potentially delivering rich insights into psychology and culture. Globally and historically, names hold important familial, cultural, and religious significance. The foundation for much of this significance is a concept called nominal realism, which holds that … Read more Sentiment Analysis of Surnames

Key customer benefits of the expanded SAP and Microsoft partnership

This past year has shown us how important it is to be ready for the unexpected. We spent a lot of time talking with customers, and it was rewarding to hear their stories of using technology to respond quickly to changing business needs. This was especially true for those who were well down the path of … Read more Key customer benefits of the expanded SAP and Microsoft partnership

Prediction 2021: The Year AI Became Normal

A clear pattern of growth has already emerged in AI: in 2018–19, the phase of experimentation became mature; in 2020, adoptions began in a serious way and suddenly, COVID-19 gave the business leaders an opportunity and impetus to push automation and AI. In 2021, the fallout from a second wave of COVID-19 in the UK … Read more Prediction 2021: The Year AI Became Normal

A Complete Project on Image Classification with Logistic Regression From Scratch in Python

Detailed layout of a logistic regression algorithm with a project Logistic regression is very popular in machine learning and statistics. It can work on both binary and multiclass classification very well. I wrote tutorials on both binary and multiclass classification with logistic regression before. This article will be focused on image classification with logistic regression. … Read more A Complete Project on Image Classification with Logistic Regression From Scratch in Python

Python Beginner Breakthroughs (Functions)

The heart and soul of Python coding… Learning to make clean, simple, and easy to read functions within Python is Priceless. Photo by Shahadat Rahman on Unsplash I think that one could argue that the function programming construct is probably one of the most important concepts in coding. The concept of a function is an … Read more Python Beginner Breakthroughs (Functions)

Amazon MSK now supports the ability to change the size or family of your Apache Kafka brokers

You can now scale your Amazon Managed Streaming for Apache Kafka (MSK) clusters on demand by changing the size or family of your brokers without reassigning Apache Kafka partitions. Changing the size or family of your brokers gives you the flexibility to adjust your MSK cluster’s compute capacity based on changes in your workloads, without … Read more Amazon MSK now supports the ability to change the size or family of your Apache Kafka brokers

How to Create a Beautify Combo Chart in Python Plotly

Nobody would deny that line and bar combo chart is one of the most widely used combo charts. In Excel, there is a build-in feature of Combo chart. It is also one of the most popular charts to analyze financial data. In this tutorial, we are going to build a customized combo plot using plotly. … Read more How to Create a Beautify Combo Chart in Python Plotly

4 Machine Learning Concepts I Wish I Knew When I Built My First Model

Feature importance refers to a set of techniques for assigning scores to input variables based on how good they are at predicting the target variable. The higher the score, the more important the feature is in the model. Image created by Author For example, if I wanted to predict the price of a car using … Read more 4 Machine Learning Concepts I Wish I Knew When I Built My First Model

Advanced Options with Hyperopt for Tuning Hyperparameters in Neural Networks

Photo by C M on Unsplash If you’re anything like me, you spent the first several months looking at applications of machine learning and wondering how to get better performance out of the model. I would spend hours, if not days, making minor tweaks to the model, hoping for better performance. Surely, I thought, there … Read more Advanced Options with Hyperopt for Tuning Hyperparameters in Neural Networks

Financial Data from Yahoo Finance with Python

Retrieving company financials from Yahoo Finance In this post, we are going to learn about a super easy to use Python package to retrieve financial data from Yahoo Finance. We will cover the main functionalities of the yfinance library. This will lead us to retrieve both, company financial information (e.g. financial ratios), as well as … Read more Financial Data from Yahoo Finance with Python

Would Jack Realistically Have Died aboard the Titanic?

How machine learning answers the question A walkthrough of Logistic Regression and Naive Bayes. Image source The year was 1912, and the mighty Titanic set sail on her maiden voyage. Jack, a “20 year old” “third class” “male” passenger, won a hand of poker and his ticket to the land of the free. In the … Read more Would Jack Realistically Have Died aboard the Titanic?

A quick reflection on some ethical implications of creative AI

Photo by Nick Morrison on Unsplash AI is increasingly being applied to more creative areas, raising concerns about the protection of intellectual property. Disclaimer: I am not a lawyer, and therefore, this article should not be used as legal advice, so take it as a personal opinion of an experienced observer of emerging technologies and … Read more A quick reflection on some ethical implications of creative AI

How Cloud SQL freed Arcules to keep buildingHow Cloud SQL freed Arcules to keep buildingCloud & Security ArchitectHead of SRE, Arcules

Editor’s note: Arcules, a Canon Company, delivers the next generation of cloud-based video monitoring, access control, and video analytics—all in one unified, intuitive platform. Here, we look at how they turned to Google Cloud SQL’s fully managed services so they could focus more of their engineers’ time on improving their architecture. As the leading provider … Read more How Cloud SQL freed Arcules to keep buildingHow Cloud SQL freed Arcules to keep buildingCloud & Security ArchitectHead of SRE, Arcules

Develop a Language Translator System in Python

Wonder how a language detection and translation system works, Use open-sourced Python libraries to develop the same in few lines of code Image by Gerd Altmann from Pixabay Text Language Identification refers to the process of predicting the language of a given text, whereas Text Translation refers to the process of translating a given text … Read more Develop a Language Translator System in Python

Running an R Script on a Schedule: Azure Functions (Serverless)

In this post I will show how I run an R script on a schedule, by making use of ‘serverless’ computing service on the Microsoft Cloud called Azure Functions. In short I will use a custom docker container, install required software, install required r-packages using {renv} and deploy it in the Azure cloud. I program … Read more Running an R Script on a Schedule: Azure Functions (Serverless)

covidcast package for COVID-19-related data

(This is a PSA post, where I share a package that I think that might be of interest to the community but I haven’t looked too deeply into myself.) Today I learnt of the covidcast R package, which provides access to the COVIDcast Epidata API published by the Delphi group at Carnegie Mellon University. According to the … Read more covidcast package for COVID-19-related data

Exploratory Factor Analysis vs Principal Components: from concept to application

How to reduce parameters with Exploratory Factor Analysis Photo by Isaac Smith on Unsplash In data science, we often want to measure variables such as social-economic status (SES). Some variables have a lot of parameters (or items), for example, SES can be measured based on income, education, etc. Then, to proceed with the analysis, it … Read more Exploratory Factor Analysis vs Principal Components: from concept to application

Tableau’s relationships are pretty cool

Unlike joining tables into flat files, relationships preserve the native granularity of data. Joins are performed only as needed. Last summer, Tableau introduced a new way of combining data. It is called relationships. The old way of combining data using joins is still available, and I imagine that many of us might stick with the … Read more Tableau’s relationships are pretty cool

Deep learning with containers. Part 2

For siamese training with triplet loss, every training instance should be a group of three samples — anchor, negative, and positive — that is called triplet. We first pass these samples through the neural network to generate three embedding vectors. After that we calculate the Euclidean distance between anchor and negative vectors (D_neg) and anchor … Read more Deep learning with containers. Part 2

The Machine Learning Lifecycle in 2021

In reality, machine learning projects are not straightforward, they are a cycle iterating between improving the data, model, and evaluation that is never really finished. This cycle is crucial in developing an ML model because it focuses on using model results and evaluation to refine your dataset. A high-quality dataset is the most surefire way … Read more The Machine Learning Lifecycle in 2021

Bayesian optimization or how I carved boats from wood. Examples and code.

Examples and code As a kid, I spent my summers with my grandmother in a small village with no peers around and very little entertainment. While reading took up most of my time, I also used to carve small boats from wooden logs. My boats had three features to which I paid particular attention — … Read more Bayesian optimization or how I carved boats from wood. Examples and code.

Forecasting new COVID19 cases in Portugal using Gaussian Processes

Using Python and Bayesian Statistics to forecast 30 days of new cases I would prefer to do this analysis on a different subject and in a different context. Today, Portugal is the country with the biggest absolute number of new cases of COVID-19 per one million people [1]. I don’t want to make this a … Read more Forecasting new COVID19 cases in Portugal using Gaussian Processes

New features in Power BI for Data Analysts – Small multiples, Anomaly Detection and Zoom on visuals

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Great new features have bundled and are now available in Power … Read more New features in Power BI for Data Analysts – Small multiples, Anomaly Detection and Zoom on visuals

AzureCosmosR: interface to Azure Cosmos DB

by Hong Ooi Last week, I announced AzureCosmosR, an R interface to Azure Cosmos DB, a fully-managed NoSQL database service in Azure. This post gives a short rundown on the main features of AzureCosmosR. Explaining what Azure Cosmos DB is can be tricky, so here’s an excerpt from the official description: Azure Cosmos DB is … Read more AzureCosmosR: interface to Azure Cosmos DB

6 NLP Techniques Every Data Scientist Should Know

Towards more efficient natural language processing Photo by Sai Kiran Anagani on Unsplash Natural language processing is perhaps the most talked-about subfield of data science. It’s interesting, it’s promising, and it can transform the way we see technology today. Not just technology, but it can also transform the way we perceive human languages. Natural language … Read more 6 NLP Techniques Every Data Scientist Should Know

Crack Data Science Interviews: Essential Statistics Concepts

Interview questions about missing data look deceivingly easy but challenging. You have to tailor your answers according to the data type and the context. A lot of us, me included, fail to recognize the nature of missing data and tweak their responses accordingly. I’ve done deep research on this topic and come up with the … Read more Crack Data Science Interviews: Essential Statistics Concepts

Texts, Fonts, and Annotations with Python’s Matplotlib

Fonts It might look unimportant, and the default font of Matplotlib is not wrong by any means, so why would you need to change it? Well, one reason might be to conform with some other text of your report. It’s definitely not unusual to have a font family, size, and color pre-defined for a publication. … Read more Texts, Fonts, and Annotations with Python’s Matplotlib

Using Linear Programming to schedule Drivers.

Matrix containing difference in start times for every combination of route and driver. To solve this problem we need to create a decision variable for every single combination of route and driver. Setting up PuLP problem and creating binary decision variables. This sets up the decision variables to be binary only taking a value of … Read more Using Linear Programming to schedule Drivers.