Ultimate Setup for Your Next Python Project

Starting any project from scratch can be a daunting task… But not if you have this ultimate Python project blueprint! Original image by @sxoxm on Unsplash Whether you are working on some machine learning/AI project, building web apps in Flask or just writing some quick Python script, it’s always useful to have some template for … Read more Ultimate Setup for Your Next Python Project

New Azure blueprint for CIS Benchmark

We’ve released our newest Azure blueprint that maps to another key industry-standard, the Center for Internet Security (CIS) Microsoft Azure Foundations Benchmark. This follows the recent announcement of our Azure blueprint for FedRAMP moderate and adds to the growing list of Azure blueprints for regulatory compliance, which now includes ISO 27001, NIST SP 800-53, PCI-DSS, … Read more New Azure blueprint for CIS Benchmark

The Norwegian National Strategy for Artificial Intelligence Has Launched!

A Summary and Review of the New Strategy for AI On The Day of Its Launch 14th of January This day is special to me, because I have been covering most of the AI strategies in Europe, and today my home country has released their own national strategy. The Norwegian national strategy was released on … Read more The Norwegian National Strategy for Artificial Intelligence Has Launched!

Run your data science team like an Admiral …

Running a data science team is hard! We need to take inspiration wherever we can, and the culture of Nelson’s navy is one place to start. Portrait of Nelson by Lemuel Francis Abbott “no captain can do very wrong if he places his ship alongside that of the enemy” Admiral Horatio Nelson, before the battle … Read more Run your data science team like an Admiral …

Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable

How a team obtained private data, constructed a fake AI model, and got away with the money from a platform for adopting neglected pets The cheaters stole from Petfinder.my, a platform for adopting homeless and neglected pets. [pixabay image] Kaggle just announced that the 1st Place Team, Bestpetting[1], has been disqualified from the Petfinder.my competition … Read more Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable

Is No-SQL killing SQL?

Two reasons why SQL will never, ever die Last week a friend forwarded me an email from a successful entrepreneur that pronounced “SQL is dead.” The entrepreneur claimed that the wildly popular, No-SQL databases like MongoDB and Redis would slowly strangle SQL-based databases out of existence, and therefore learning SQL as a data scientist was … Read more Is No-SQL killing SQL?

Smarter Pricing for Airbnb Using Machine Learning

Increasing host revenue with regression and time series analysis [This project was done as part of an immersive data science program called Metis. You can find the files for this project at my GitHub and the slides here. The final project is accessible here (interactive web app).] I recently designed a new approach to automatic … Read more Smarter Pricing for Airbnb Using Machine Learning

Getting started with Pandas time-series functionality

3 techniques to make your data analysis faster Pandas has exceptional features for analyzing time-series data, including automatic datetime parsing, advanced filtering capabilities, and several datetime-specific plotting functions. I find myself using those features almost every day, but it took me a long time to discover them: many of Pandas datetime capabilities are not immediately … Read more Getting started with Pandas time-series functionality

Mapping World Languages’ Difficulty Relative to English

[This article was first published on Educators R Learners, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was reading r/MapPorn and saw the image below: As fate … Read more Mapping World Languages’ Difficulty Relative to English

Skew who?

[This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In our last post on the SKEW index we looked at … Read more Skew who?

A Shiny app for simple linear regression by hand and in R

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Simple linear regression is a statistical method to summarize … Read more A Shiny app for simple linear regression by hand and in R

Should I buy a lottery ticket?

Lottery Ticket Analysis I analyzed past lottery data to decide to buy a lottery ticket using statistics and probability. Photo by dylan nolte on Unsplash I often find myself in deciding between buying a lottery ticket or not especially for the powerball new year’s eve draw. The reason why I feel hesitant in those moments … Read more Should I buy a lottery ticket?

Predicting Movie Profitability and Risk at the Pre-production Phase

Movie Data and Box Office Numbers In order to build my prediction algorithm, I gathered movie data from a couple online sources. I obtained the bulk of my data from the Internet Movie Database (IMDb) which provides a set of files for free download. However, the IMDb files do not contain data on estimated movie … Read more Predicting Movie Profitability and Risk at the Pre-production Phase

an elegant sampler

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Following an X validated question on how to simulate a … Read more an elegant sampler

AWS Now Offers NVIDIA Quadro Virtual Workstations for EC2 G4 Instances at No Additional Cost

G4 instances provide the latest generation NVIDIA T4 Tensor Core GPUs, AWS custom second generation Intel® Xeon® Scalable (Cascade Lake) processors, up to 50 Gbps of networking throughput, and up to 900 GB of local NVMe storage. They are optimized for machine learning application deployments such as image classification, object detection, recommendation engines, automated speech … Read more AWS Now Offers NVIDIA Quadro Virtual Workstations for EC2 G4 Instances at No Additional Cost

How Do Conversational Agents Answer Questions?

NLP, Knowledge Graphs, and the Three Pillars of Intelligence Jibo, Echo/Alexa, Google Home The Three Pillars of Intelligence To Amazon, the reception for its voice agent, Alexa, was a big surprise. Apple’s Siri had put voice input onto smartphones. But here was a new class of device that you could shout at across the kitchen … Read more How Do Conversational Agents Answer Questions?

Oxford (Real) Farming Conference 2020

NLP: Sentiment Analysis, Word Embeddings and Topic Modelling of 3,8K tweets Last week, from the 7th to 9th of January, Oxford hosted the well-established, traditional and businessy Oxford Farming Conference (OFC) and its antidote Oxford Real Farming Conference (ORFC). Both aims to connect actors involved in the agricultural and food sector to tackle the challenges … Read more Oxford (Real) Farming Conference 2020

Google acquires AppSheet to help businesses create and extend applications—without codingGoogle acquires AppSheet to help businesses create and extend applications—without codingVice President, Business Application Platform, Google Cloud

Today, Google is excited to announce that it has acquired AppSheet, a leading no-code application development platform used by a number of enterprises across a variety of industries.  The demand for faster processes and automation in today’s competitive landscape requires more business applications to be built with greater speed and efficiency. However, many companies lack … Read more Google acquires AppSheet to help businesses create and extend applications—without codingGoogle acquires AppSheet to help businesses create and extend applications—without codingVice President, Business Application Platform, Google Cloud

Exploring the Future of Cloud Computing in 2020 and Beyond

Cloud computing has become a fundamental requirement for most organizations. With this in mind, cloud computing is massively on the rise in the current day and age. In fact, 81 percent of companies with 1,000 employees or more have a multi-platform strategy. The number is to rise to more than 90 percent by 2024. Between … Read more Exploring the Future of Cloud Computing in 2020 and Beyond

Site Planning for Market Coverage Optimization with Mobility Data

Commercial Activity Data: Points of Interest (POIs) This dataset provides information on POIs. Cells are enriched with the following data from this source: Number of competitors within a 250-meter buffer from the cell’s centroid. Number of POIs within a 250-meter buffer from the cell’s centroid. Here we use POIs as a proxy for commercial activity. … Read more Site Planning for Market Coverage Optimization with Mobility Data

The truth about the martingale betting system

I swear by the name of Science that the evidence I shall give shall be the truth, the whole truth, and nothing but the truth. About the simulation from random import *def roll():result = randint(1,36)results.append(result)results = []for i in range(1000000):roll() The script simulates 1000000 roulette outcomes within a second. At each simulation, a random whole … Read more The truth about the martingale betting system

How MonetDB/X100 Exploits Modern CPU Performance

Modern CPUs have undergone significant development. But how does MonetDB exploit this development to maximize its performance? Computer processors have significantly developed in the last three decades. This development involves not only the increasing number of transistors it holds but also the evolution of the architecture. Hence, an application needs to adapt to how the … Read more How MonetDB/X100 Exploits Modern CPU Performance

Preventing the Death of the Dataframe

Source: Disney A definition to save the dataframe from extinction Dataframes emerged from a specific need, but because so many diverse systems now call themselves dataframes, the term is on the verge of meaning nothing. In an effort to preserve the dataframe, we formalized the definition based on the original data model in our recent … Read more Preventing the Death of the Dataframe

Visualising spending behaviour through open banking and GIS

Financial habits have historically been something that people place back of mind, but with the rising amount of information and tools available, a new attitude to financial control is creating a rising popularity in transparent, digital banking. A new breed of financial institutions (such as Monzo, Starling, Revolut and N26) are leveraging digital products to … Read more Visualising spending behaviour through open banking and GIS

Download Email Attachment from Microsoft Exchange Web Services Automatically

Automating The Dull Routine With Python Learn to Handle Email Attachment Using Python Library Exchangelib Photo by Webaroo.com.au on Unsplash Did you need to download email attachments regularly? Do you want to automate this boring process? I know that feel bro. When I first come to my job, I was assigned a daily task: download … Read more Download Email Attachment from Microsoft Exchange Web Services Automatically

How to use the power of “WHY” to achieve what you want

Data science path is not easy. If you’re a data scientist reading this now. You’ll know what I mean. It’s tough. It’s an ever-changing field. It’s dynamic. It’s moving fast. In other words, you need learn fast and adapt quickly to keep up-to-date with the latest trend and technology being used in the industry. The … Read more How to use the power of “WHY” to achieve what you want

LondonR & R-Ladies to host ‘watch party’ at RStudio Conference, Jan 29th

Mango Solutions, a full service certified partner of RStudio, are delighted to be supporting the rstudio::conf 2020 conference in San Francisco, on January 27th-30th.  This annual tech conference promises to be ‘all things R’ and includes an impressive agenda covering tutorials, presentations and lighting talks from recognised R experts and developers. As long time champions … Read more LondonR & R-Ladies to host ‘watch party’ at RStudio Conference, Jan 29th

Get Better: early career researcher development

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. How can we contribute to the development of early career researchers … Read more Get Better: early career researcher development

Learning from cryptocurrency mining attack scripts on Linux

Cryptocurrency mining attacks continue to represent a threat to many of our Azure Linux customers. In the past, we’ve talked about how some attackers use brute force techniques to guess account names and passwords and use those to gain access to machines. Today, we’re talking about an attack that a few of our customers have … Read more Learning from cryptocurrency mining attack scripts on Linux

Exploring the tightened EU CO2 emission standards for cars in 2020 – Why now selling an electric car can be worth an extra 18000€ for producers.

In this blog post I want to explore the EU regulation for average CO2 emissions of manufacturer’s car fleets using the EU data set of newly registered cars. The data was already studied on an aggregate level in my earlier post. Here we explore, in particular, the monetary implications of the regulatory details for different … Read more Exploring the tightened EU CO2 emission standards for cars in 2020 – Why now selling an electric car can be worth an extra 18000€ for producers.

Business Case Analysis with R (Guest Post)

Learning Machines proudly presents a fascinating guest post by decision and risk analyst Robert D. Brown III with a great application of R in the business and especially startup-arena! I encourage you to visit his blog too: Thales’ Press. Have fun! Introduction While Excel remains the tool of choice among analysts for business case analysis, … Read more Business Case Analysis with R (Guest Post)

The Galactic Island Hypothesis

New research provides a quantitative solution to Fermi’s paradox Simulated settlement trajectories, showing how civilizations could spread through the Galaxy (source) Compared to the age of the Milky Way Galaxy, our 200,000-year old human species has only been around for just the blink of an eye. The Milky Way is at least 10 Billion years … Read more The Galactic Island Hypothesis

Hands-on Web Scraping: Building your Twitter dataset with python and scrapy

This assumes that you have some basic knowledge of python and scrapy. If you are interested only in generating your dataset, skip this section and go to the sample crawl section on the GitHub repo. Gathering tweets URL by searching through hashtags For searching for tweets we will be using the legacy Twitter website. Let’s … Read more Hands-on Web Scraping: Building your Twitter dataset with python and scrapy

Guide to Dimensionality Reduction in single cell RNA-seq analysis

Image Source: Unsplash A major breakthrough in the omics area came in early 2000 with the single cell RNA sequencing (scRNA-seq) technology. The ability to isolate and sequence the genetic material of single cells allows researchers to identify which genes are active in each cell. This provides unprecedented opportunities over bulk RNA sequencing technologies, that … Read more Guide to Dimensionality Reduction in single cell RNA-seq analysis

Introduction to ggplot2 in R

Boxplots are another excellent tool for visualizing descriptive statistics. If you want to learn more about boxplots check out this article from fellow Towards Data Science writer — Michael Galarnyk Below is a boxplot shows the spread for all the rating sites. ggplot(data=reviews) +aes(x=Rating_Site, y = Rating, color = Rating_Site) +geom_boxplot() +labs(title=”Comparison of Movie Ratings”) … Read more Introduction to ggplot2 in R

A Tale of Two Cities — A mystery solved with Pandas

Could Perth really be wetter than Melbourne? Photo by Ricardo Resende on Unsplash Having recently moved from Melbourne to Perth I found it natural to make comparisons between the two cities. Which one has better coffee? OK, that one is easy — Melbourne hands down! Which one has more rain — well to answer that … Read more A Tale of Two Cities — A mystery solved with Pandas

Basic Statistics You NEED to Know for Data Science

Numerical: data expressed with digits; is measurable. It can either be discrete (finite number of values) or continuous (infinite number of values). Categorical: qualitative data classified into categories. It can be nominal (no order) or ordinal (ordered data). Mean: the average of a dataset.Median: the middle of an ordered dataset; less susceptible to outliers.Mode: the … Read more Basic Statistics You NEED to Know for Data Science

Enterprise AI/Machine Learning: Lessons Learned

I recently had the privilege of participating on a panel with several AI/Machine Learning experts. There were many great questions, but most were related to how to most effectively establish an AI/Machine Learning (AI/ML) in a large organization. This gave me an opportunity to reflect on my own experiences helping large enterprise accelerate their AI/Machine … Read more Enterprise AI/Machine Learning: Lessons Learned

On the conjugate function

In the MAT7381 course (graduate course on regression models), we will talk about optimization, and a classical tool is the so-called conjugate. Given a function f:\mathbb{R}^p\to\mathbb{R} its conjugate is function f^{\star}:\mathbb{R}^p\to\mathbb{R} such that f^{\star}(\boldsymbol{y})=\max_{\boldsymbol{x}}\lbrace\boldsymbol{x}^\top\boldsymbol{y}-f(\boldsymbol{x})\rbraceso, long story short, f^{\star}(\boldsymbol{y}) is the maximum gap between the linear function \boldsymbol{x}^\top\boldsymbol{y} and f(\boldsymbol{x}). Just to visualize, consider a simple … Read more On the conjugate function