Continuous quality evaluation for ML projects using GitHub Actions.

I would use three different models (+ baseline) to emulate step-by-step “work” on the task: Mean model (baseline) Random predictions Linear Regression Gradient Boosting over Decision Trees (LightGBM) In real-world problems, it is equivalent to continuously improving model whose changes are pushed to the repository. One should also define metrics that estimate how good the … Read more Continuous quality evaluation for ML projects using GitHub Actions.

How Deepfake Technology Can Become More Dangerous Than a Nuclear Weapon

“The powers that be no longer have to stifle information. They can now overload us with so much of it, there’s no way to know what’s factual or not. The ability to be an informed public is only going to worsen with advancing deep fake technology.” J. Andrew Schrecker All of us have heard Donald … Read more How Deepfake Technology Can Become More Dangerous Than a Nuclear Weapon

Training Object Detectors with No Real Data using Domain Randomization

Solving sim2real transfer for specialized object detectors with no budget Deep learning has recently become the favored approach to object detection problems. However, like with many other uses of this technology, annotating training data is cumbersome and time-consuming, especially if you are a small company with a specific use-case. In this article, I present some … Read more Training Object Detectors with No Real Data using Domain Randomization

Who’s smarter? An IQ test for both AI systems and humans

Girl at tablet: Photo by Hal Gatewood on Unsplash Cutting through the hype surrounding artificial intelligence, François Chollet, an AI researcher at Google, has proposed the Abstract and Reasoning Corpus (ARC), an intelligence test that could shape the course of future AI research. To date, there has been no satisfactory definition of artificial intelligence nor … Read more Who’s smarter? An IQ test for both AI systems and humans

Mastering the data science job hunt

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Getting hired as a data scientist, machine learning engineer or data analyst is hard. And … Read more Mastering the data science job hunt

Using deemed SLIs to measure customer reliabilityUsing deemed SLIs to measure customer reliabilityCustomer Reliability Engineer

Do you own and operate a software service? If so, is your service a ”platform”? In other words, does it run and manage applications of a wide range of users and/or companies? There are both simple and complex types of platforms, all of which serve customers. One example could be Google Cloud, which provides, among … Read more Using deemed SLIs to measure customer reliabilityUsing deemed SLIs to measure customer reliabilityCustomer Reliability Engineer

On Cochran Theorem (and Orthogonal Projections)

Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if … Read more On Cochran Theorem (and Orthogonal Projections)

Using neural networks for a functional connectivity classification of fMRI data

We can see that the connections are not particularly strong for either group (the diagonal line can be ignored as it shows correlation with itself and, thus, always equals to 1). To better visualize the connections and the differences, we can project these back onto the brain. #Getting the center coordinates from the component decomposition … Read more Using neural networks for a functional connectivity classification of fMRI data

Appsilon Data Science is now an RStudio Full Service Certified Partner

In November 2019, Appsilon Data Science became an RStudio Full Service Certified Partner. We now officially provide support services for existing Shiny applications, Plumber APIs, and RStudio infrastructure. Further, we now provide Managed Services for the R environment. This means that we can handle ongoing support for your servers and infrastructure to keep applications running … Read more Appsilon Data Science is now an RStudio Full Service Certified Partner

Ultimate Setup for Your Next Python Project

Starting any project from scratch can be a daunting task… But not if you have this ultimate Python project blueprint! Original image by @sxoxm on Unsplash Whether you are working on some machine learning/AI project, building web apps in Flask or just writing some quick Python script, it’s always useful to have some template for … Read more Ultimate Setup for Your Next Python Project

New Azure blueprint for CIS Benchmark

We’ve released our newest Azure blueprint that maps to another key industry-standard, the Center for Internet Security (CIS) Microsoft Azure Foundations Benchmark. This follows the recent announcement of our Azure blueprint for FedRAMP moderate and adds to the growing list of Azure blueprints for regulatory compliance, which now includes ISO 27001, NIST SP 800-53, PCI-DSS, … Read more New Azure blueprint for CIS Benchmark

The Norwegian National Strategy for Artificial Intelligence Has Launched!

A Summary and Review of the New Strategy for AI On The Day of Its Launch 14th of January This day is special to me, because I have been covering most of the AI strategies in Europe, and today my home country has released their own national strategy. The Norwegian national strategy was released on … Read more The Norwegian National Strategy for Artificial Intelligence Has Launched!

Run your data science team like an Admiral …

Running a data science team is hard! We need to take inspiration wherever we can, and the culture of Nelson’s navy is one place to start. Portrait of Nelson by Lemuel Francis Abbott “no captain can do very wrong if he places his ship alongside that of the enemy” Admiral Horatio Nelson, before the battle … Read more Run your data science team like an Admiral …

Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable

How a team obtained private data, constructed a fake AI model, and got away with the money from a platform for adopting neglected pets The cheaters stole from Petfinder.my, a platform for adopting homeless and neglected pets. [pixabay image] Kaggle just announced that the 1st Place Team, Bestpetting[1], has been disqualified from the Petfinder.my competition … Read more Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable

Is No-SQL killing SQL?

Two reasons why SQL will never, ever die Last week a friend forwarded me an email from a successful entrepreneur that pronounced “SQL is dead.” The entrepreneur claimed that the wildly popular, No-SQL databases like MongoDB and Redis would slowly strangle SQL-based databases out of existence, and therefore learning SQL as a data scientist was … Read more Is No-SQL killing SQL?

Smarter Pricing for Airbnb Using Machine Learning

Increasing host revenue with regression and time series analysis [This project was done as part of an immersive data science program called Metis. You can find the files for this project at my GitHub and the slides here. The final project is accessible here (interactive web app).] I recently designed a new approach to automatic … Read more Smarter Pricing for Airbnb Using Machine Learning

Getting started with Pandas time-series functionality

3 techniques to make your data analysis faster Pandas has exceptional features for analyzing time-series data, including automatic datetime parsing, advanced filtering capabilities, and several datetime-specific plotting functions. I find myself using those features almost every day, but it took me a long time to discover them: many of Pandas datetime capabilities are not immediately … Read more Getting started with Pandas time-series functionality

Mapping World Languages’ Difficulty Relative to English

[This article was first published on Educators R Learners, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was reading r/MapPorn and saw the image below: As fate … Read more Mapping World Languages’ Difficulty Relative to English

Should I buy a lottery ticket?

Lottery Ticket Analysis I analyzed past lottery data to decide to buy a lottery ticket using statistics and probability. Photo by dylan nolte on Unsplash I often find myself in deciding between buying a lottery ticket or not especially for the powerball new year’s eve draw. The reason why I feel hesitant in those moments … Read more Should I buy a lottery ticket?

Predicting Movie Profitability and Risk at the Pre-production Phase

Movie Data and Box Office Numbers In order to build my prediction algorithm, I gathered movie data from a couple online sources. I obtained the bulk of my data from the Internet Movie Database (IMDb) which provides a set of files for free download. However, the IMDb files do not contain data on estimated movie … Read more Predicting Movie Profitability and Risk at the Pre-production Phase

an elegant sampler

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Following an X validated question on how to simulate a … Read more an elegant sampler

AWS Now Offers NVIDIA Quadro Virtual Workstations for EC2 G4 Instances at No Additional Cost

G4 instances provide the latest generation NVIDIA T4 Tensor Core GPUs, AWS custom second generation Intel® Xeon® Scalable (Cascade Lake) processors, up to 50 Gbps of networking throughput, and up to 900 GB of local NVMe storage. They are optimized for machine learning application deployments such as image classification, object detection, recommendation engines, automated speech … Read more AWS Now Offers NVIDIA Quadro Virtual Workstations for EC2 G4 Instances at No Additional Cost

How Do Conversational Agents Answer Questions?

NLP, Knowledge Graphs, and the Three Pillars of Intelligence Jibo, Echo/Alexa, Google Home The Three Pillars of Intelligence To Amazon, the reception for its voice agent, Alexa, was a big surprise. Apple’s Siri had put voice input onto smartphones. But here was a new class of device that you could shout at across the kitchen … Read more How Do Conversational Agents Answer Questions?

Oxford (Real) Farming Conference 2020

NLP: Sentiment Analysis, Word Embeddings and Topic Modelling of 3,8K tweets Last week, from the 7th to 9th of January, Oxford hosted the well-established, traditional and businessy Oxford Farming Conference (OFC) and its antidote Oxford Real Farming Conference (ORFC). Both aims to connect actors involved in the agricultural and food sector to tackle the challenges … Read more Oxford (Real) Farming Conference 2020

Google acquires AppSheet to help businesses create and extend applications—without codingGoogle acquires AppSheet to help businesses create and extend applications—without codingVice President, Business Application Platform, Google Cloud

Today, Google is excited to announce that it has acquired AppSheet, a leading no-code application development platform used by a number of enterprises across a variety of industries.  The demand for faster processes and automation in today’s competitive landscape requires more business applications to be built with greater speed and efficiency. However, many companies lack … Read more Google acquires AppSheet to help businesses create and extend applications—without codingGoogle acquires AppSheet to help businesses create and extend applications—without codingVice President, Business Application Platform, Google Cloud

Exploring the Future of Cloud Computing in 2020 and Beyond

Cloud computing has become a fundamental requirement for most organizations. With this in mind, cloud computing is massively on the rise in the current day and age. In fact, 81 percent of companies with 1,000 employees or more have a multi-platform strategy. The number is to rise to more than 90 percent by 2024. Between … Read more Exploring the Future of Cloud Computing in 2020 and Beyond

Site Planning for Market Coverage Optimization with Mobility Data

Commercial Activity Data: Points of Interest (POIs) This dataset provides information on POIs. Cells are enriched with the following data from this source: Number of competitors within a 250-meter buffer from the cell’s centroid. Number of POIs within a 250-meter buffer from the cell’s centroid. Here we use POIs as a proxy for commercial activity. … Read more Site Planning for Market Coverage Optimization with Mobility Data

The truth about the martingale betting system

I swear by the name of Science that the evidence I shall give shall be the truth, the whole truth, and nothing but the truth. About the simulation from random import *def roll():result = randint(1,36)results.append(result)results = []for i in range(1000000):roll() The script simulates 1000000 roulette outcomes within a second. At each simulation, a random whole … Read more The truth about the martingale betting system

How MonetDB/X100 Exploits Modern CPU Performance

Modern CPUs have undergone significant development. But how does MonetDB exploit this development to maximize its performance? Computer processors have significantly developed in the last three decades. This development involves not only the increasing number of transistors it holds but also the evolution of the architecture. Hence, an application needs to adapt to how the … Read more How MonetDB/X100 Exploits Modern CPU Performance

Preventing the Death of the Dataframe

Source: Disney A definition to save the dataframe from extinction Dataframes emerged from a specific need, but because so many diverse systems now call themselves dataframes, the term is on the verge of meaning nothing. In an effort to preserve the dataframe, we formalized the definition based on the original data model in our recent … Read more Preventing the Death of the Dataframe

Visualising spending behaviour through open banking and GIS

Financial habits have historically been something that people place back of mind, but with the rising amount of information and tools available, a new attitude to financial control is creating a rising popularity in transparent, digital banking. A new breed of financial institutions (such as Monzo, Starling, Revolut and N26) are leveraging digital products to … Read more Visualising spending behaviour through open banking and GIS

Download Email Attachment from Microsoft Exchange Web Services Automatically

Automating The Dull Routine With Python Learn to Handle Email Attachment Using Python Library Exchangelib Photo by Webaroo.com.au on Unsplash Did you need to download email attachments regularly? Do you want to automate this boring process? I know that feel bro. When I first come to my job, I was assigned a daily task: download … Read more Download Email Attachment from Microsoft Exchange Web Services Automatically

How to use the power of “WHY” to achieve what you want

Data science path is not easy. If you’re a data scientist reading this now. You’ll know what I mean. It’s tough. It’s an ever-changing field. It’s dynamic. It’s moving fast. In other words, you need learn fast and adapt quickly to keep up-to-date with the latest trend and technology being used in the industry. The … Read more How to use the power of “WHY” to achieve what you want

LondonR & R-Ladies to host ‘watch party’ at RStudio Conference, Jan 29th

Mango Solutions, a full service certified partner of RStudio, are delighted to be supporting the rstudio::conf 2020 conference in San Francisco, on January 27th-30th.  This annual tech conference promises to be ‘all things R’ and includes an impressive agenda covering tutorials, presentations and lighting talks from recognised R experts and developers. As long time champions … Read more LondonR & R-Ladies to host ‘watch party’ at RStudio Conference, Jan 29th

Get Better: early career researcher development

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. How can we contribute to the development of early career researchers … Read more Get Better: early career researcher development

Learning from cryptocurrency mining attack scripts on Linux

Cryptocurrency mining attacks continue to represent a threat to many of our Azure Linux customers. In the past, we’ve talked about how some attackers use brute force techniques to guess account names and passwords and use those to gain access to machines. Today, we’re talking about an attack that a few of our customers have … Read more Learning from cryptocurrency mining attack scripts on Linux

Exploring the tightened EU CO2 emission standards for cars in 2020 – Why now selling an electric car can be worth an extra 18000€ for producers.

In this blog post I want to explore the EU regulation for average CO2 emissions of manufacturer’s car fleets using the EU data set of newly registered cars. The data was already studied on an aggregate level in my earlier post. Here we explore, in particular, the monetary implications of the regulatory details for different … Read more Exploring the tightened EU CO2 emission standards for cars in 2020 – Why now selling an electric car can be worth an extra 18000€ for producers.

Business Case Analysis with R (Guest Post)

Learning Machines proudly presents a fascinating guest post by decision and risk analyst Robert D. Brown III with a great application of R in the business and especially startup-arena! I encourage you to visit his blog too: Thales’ Press. Have fun! Introduction While Excel remains the tool of choice among analysts for business case analysis, … Read more Business Case Analysis with R (Guest Post)

The Galactic Island Hypothesis

New research provides a quantitative solution to Fermi’s paradox Simulated settlement trajectories, showing how civilizations could spread through the Galaxy (source) Compared to the age of the Milky Way Galaxy, our 200,000-year old human species has only been around for just the blink of an eye. The Milky Way is at least 10 Billion years … Read more The Galactic Island Hypothesis

Hands-on Web Scraping: Building your Twitter dataset with python and scrapy

This assumes that you have some basic knowledge of python and scrapy. If you are interested only in generating your dataset, skip this section and go to the sample crawl section on the GitHub repo. Gathering tweets URL by searching through hashtags For searching for tweets we will be using the legacy Twitter website. Let’s … Read more Hands-on Web Scraping: Building your Twitter dataset with python and scrapy