I would use three different models (+ baseline) to emulate step-by-step “work” on the task: Mean model (baseline) Random predictions Linear Regression Gradient Boosting over Decision Trees (LightGBM) In real-world problems, it is equivalent to continuously improving model whose changes are pushed to the repository. One should also define metrics that estimate how good the … Read more Continuous quality evaluation for ML projects using GitHub Actions.
“The powers that be no longer have to stifle information. They can now overload us with so much of it, there’s no way to know what’s factual or not. The ability to be an informed public is only going to worsen with advancing deep fake technology.” J. Andrew Schrecker All of us have heard Donald … Read more How Deepfake Technology Can Become More Dangerous Than a Nuclear Weapon
Solving sim2real transfer for specialized object detectors with no budget Deep learning has recently become the favored approach to object detection problems. However, like with many other uses of this technology, annotating training data is cumbersome and time-consuming, especially if you are a small company with a specific use-case. In this article, I present some … Read more Training Object Detectors with No Real Data using Domain Randomization
Amazon Polly is a service that turns text into lifelike speech. Today, we are excited to announce the general availability of Amazon Polly standard voices in the Middle East (BAH) and Asia Pacific (HKG) region. Customers in these regions can now synthesize 60+ standard voices available in 29 languages in the Polly portfolio. Favorite
Girl at tablet: Photo by Hal Gatewood on Unsplash Cutting through the hype surrounding artificial intelligence, François Chollet, an AI researcher at Google, has proposed the Abstract and Reasoning Corpus (ARC), an intelligence test that could shape the course of future AI research. To date, there has been no satisfactory definition of artificial intelligence nor … Read more Who’s smarter? An IQ test for both AI systems and humans
Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: Getting hired as a data scientist, machine learning engineer or data analyst is hard. And … Read more Mastering the data science job hunt
Do you own and operate a software service? If so, is your service a ”platform”? In other words, does it run and manage applications of a wide range of users and/or companies? There are both simple and complex types of platforms, all of which serve customers. One example could be Google Cloud, which provides, among … Read more Using deemed SLIs to measure customer reliabilityUsing deemed SLIs to measure customer reliabilityCustomer Reliability Engineer
Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if … Read more On Cochran Theorem (and Orthogonal Projections)
We can see that the connections are not particularly strong for either group (the diagonal line can be ignored as it shows correlation with itself and, thus, always equals to 1). To better visualize the connections and the differences, we can project these back onto the brain. #Getting the center coordinates from the component decomposition … Read more Using neural networks for a functional connectivity classification of fMRI data
In November 2019, Appsilon Data Science became an RStudio Full Service Certified Partner. We now officially provide support services for existing Shiny applications, Plumber APIs, and RStudio infrastructure. Further, we now provide Managed Services for the R environment. This means that we can handle ongoing support for your servers and infrastructure to keep applications running … Read more Appsilon Data Science is now an RStudio Full Service Certified Partner
v1.0 Abstract For years I’ve been looking for a theory that will unify my understanding of intelligence and related concepts. Among the ones I’d encountered most share some core ideas, so I’ve been trying to combine and distill them into a more formal and general framework. Here I want to propose PTI in attempt to … Read more Physical Theory of Intelligence
Twitter Data Science Interview Questions Twitter Interviews Twitter is known for its news and debates and holds the title of the SMS of the world. Created in 2006 by Jack Dorsey, Noah Glass, Evan Williams, and Biz Stone, it has grown to have more than 321 million active users per month as well as 1.6 … Read more The Twitter Data Scientist Interview
Starting any project from scratch can be a daunting task… But not if you have this ultimate Python project blueprint! Original image by @sxoxm on Unsplash Whether you are working on some machine learning/AI project, building web apps in Flask or just writing some quick Python script, it’s always useful to have some template for … Read more Ultimate Setup for Your Next Python Project
[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. New year, new RQuantLib! A new release 0.4.11 of … Read more RQuantLib 0.4.11: More polish
We’ve released our newest Azure blueprint that maps to another key industry-standard, the Center for Internet Security (CIS) Microsoft Azure Foundations Benchmark. This follows the recent announcement of our Azure blueprint for FedRAMP moderate and adds to the growing list of Azure blueprints for regulatory compliance, which now includes ISO 27001, NIST SP 800-53, PCI-DSS, … Read more New Azure blueprint for CIS Benchmark
A Summary and Review of the New Strategy for AI On The Day of Its Launch 14th of January This day is special to me, because I have been covering most of the AI strategies in Europe, and today my home country has released their own national strategy. The Norwegian national strategy was released on … Read more The Norwegian National Strategy for Artificial Intelligence Has Launched!
Running a data science team is hard! We need to take inspiration wherever we can, and the culture of Nelson’s navy is one place to start. Portrait of Nelson by Lemuel Francis Abbott “no captain can do very wrong if he places his ship alongside that of the enemy” Admiral Horatio Nelson, before the battle … Read more Run your data science team like an Admiral …
How a team obtained private data, constructed a fake AI model, and got away with the money from a platform for adopting neglected pets The cheaters stole from Petfinder.my, a platform for adopting homeless and neglected pets. [pixabay image] Kaggle just announced that the 1st Place Team, Bestpetting, has been disqualified from the Petfinder.my competition … Read more Kaggle 1st place winner cheated, $10,000 prize declared irrecoverable
Two reasons why SQL will never, ever die Last week a friend forwarded me an email from a successful entrepreneur that pronounced “SQL is dead.” The entrepreneur claimed that the wildly popular, No-SQL databases like MongoDB and Redis would slowly strangle SQL-based databases out of existence, and therefore learning SQL as a data scientist was … Read more Is No-SQL killing SQL?
Increasing host revenue with regression and time series analysis [This project was done as part of an immersive data science program called Metis. You can find the files for this project at my GitHub and the slides here. The final project is accessible here (interactive web app).] I recently designed a new approach to automatic … Read more Smarter Pricing for Airbnb Using Machine Learning
Conclusions Let’s say you vacation to a remote island for a week with no connection to the outside world. When you come back, Team A and Team B played the 100 games and your friend asks “Hey, what if I told you they only won 93 games?” “HA, I’d say the probability of that happening … Read more Wait, so what’s a P-Value?
3 techniques to make your data analysis faster Pandas has exceptional features for analyzing time-series data, including automatic datetime parsing, advanced filtering capabilities, and several datetime-specific plotting functions. I find myself using those features almost every day, but it took me a long time to discover them: many of Pandas datetime capabilities are not immediately … Read more Getting started with Pandas time-series functionality
[This article was first published on Educators R Learners, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I was reading r/MapPorn and saw the image below: As fate … Read more Mapping World Languages’ Difficulty Relative to English
Lottery Ticket Analysis I analyzed past lottery data to decide to buy a lottery ticket using statistics and probability. Photo by dylan nolte on Unsplash I often find myself in deciding between buying a lottery ticket or not especially for the powerball new year’s eve draw. The reason why I feel hesitant in those moments … Read more Should I buy a lottery ticket?
Movie Data and Box Office Numbers In order to build my prediction algorithm, I gathered movie data from a couple online sources. I obtained the bulk of my data from the Internet Movie Database (IMDb) which provides a set of files for free download. However, the IMDb files do not contain data on estimated movie … Read more Predicting Movie Profitability and Risk at the Pre-production Phase
[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Following an X validated question on how to simulate a … Read more an elegant sampler
The American education world, it seems, is just now catching up to what Data Scientists have known for years. You learn from your mistakes. You’re not alone if it sounds farfetched that our education system is just now figuring this out. It’s an age old adage, but somehow it skipped applied education theory. Don’t believe … Read more Learning from the ROC
G4 instances provide the latest generation NVIDIA T4 Tensor Core GPUs, AWS custom second generation Intel® Xeon® Scalable (Cascade Lake) processors, up to 50 Gbps of networking throughput, and up to 900 GB of local NVMe storage. They are optimized for machine learning application deployments such as image classification, object detection, recommendation engines, automated speech … Read more AWS Now Offers NVIDIA Quadro Virtual Workstations for EC2 G4 Instances at No Additional Cost
[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’ve been experimenting with this for a while, and the … Read more sklearn Pipe Step Interface for vtreat
NLP, Knowledge Graphs, and the Three Pillars of Intelligence Jibo, Echo/Alexa, Google Home The Three Pillars of Intelligence To Amazon, the reception for its voice agent, Alexa, was a big surprise. Apple’s Siri had put voice input onto smartphones. But here was a new class of device that you could shout at across the kitchen … Read more How Do Conversational Agents Answer Questions?
NLP: Sentiment Analysis, Word Embeddings and Topic Modelling of 3,8K tweets Last week, from the 7th to 9th of January, Oxford hosted the well-established, traditional and businessy Oxford Farming Conference (OFC) and its antidote Oxford Real Farming Conference (ORFC). Both aims to connect actors involved in the agricultural and food sector to tackle the challenges … Read more Oxford (Real) Farming Conference 2020
Today, Google is excited to announce that it has acquired AppSheet, a leading no-code application development platform used by a number of enterprises across a variety of industries. The demand for faster processes and automation in today’s competitive landscape requires more business applications to be built with greater speed and efficiency. However, many companies lack … Read more Google acquires AppSheet to help businesses create and extend applications—without codingGoogle acquires AppSheet to help businesses create and extend applications—without codingVice President, Business Application Platform, Google Cloud
Cloud computing has become a fundamental requirement for most organizations. With this in mind, cloud computing is massively on the rise in the current day and age. In fact, 81 percent of companies with 1,000 employees or more have a multi-platform strategy. The number is to rise to more than 90 percent by 2024. Between … Read more Exploring the Future of Cloud Computing in 2020 and Beyond
Commercial Activity Data: Points of Interest (POIs) This dataset provides information on POIs. Cells are enriched with the following data from this source: Number of competitors within a 250-meter buffer from the cell’s centroid. Number of POIs within a 250-meter buffer from the cell’s centroid. Here we use POIs as a proxy for commercial activity. … Read more Site Planning for Market Coverage Optimization with Mobility Data
I swear by the name of Science that the evidence I shall give shall be the truth, the whole truth, and nothing but the truth. About the simulation from random import *def roll():result = randint(1,36)results.append(result)results = for i in range(1000000):roll() The script simulates 1000000 roulette outcomes within a second. At each simulation, a random whole … Read more The truth about the martingale betting system
Modern CPUs have undergone significant development. But how does MonetDB exploit this development to maximize its performance? Computer processors have significantly developed in the last three decades. This development involves not only the increasing number of transistors it holds but also the evolution of the architecture. Hence, an application needs to adapt to how the … Read more How MonetDB/X100 Exploits Modern CPU Performance
Source: Disney A definition to save the dataframe from extinction Dataframes emerged from a specific need, but because so many diverse systems now call themselves dataframes, the term is on the verge of meaning nothing. In an effort to preserve the dataframe, we formalized the definition based on the original data model in our recent … Read more Preventing the Death of the Dataframe
Financial habits have historically been something that people place back of mind, but with the rising amount of information and tools available, a new attitude to financial control is creating a rising popularity in transparent, digital banking. A new breed of financial institutions (such as Monzo, Starling, Revolut and N26) are leveraging digital products to … Read more Visualising spending behaviour through open banking and GIS
Automating The Dull Routine With Python Learn to Handle Email Attachment Using Python Library Exchangelib Photo by Webaroo.com.au on Unsplash Did you need to download email attachments regularly? Do you want to automate this boring process? I know that feel bro. When I first come to my job, I was assigned a daily task: download … Read more Download Email Attachment from Microsoft Exchange Web Services Automatically
Data science path is not easy. If you’re a data scientist reading this now. You’ll know what I mean. It’s tough. It’s an ever-changing field. It’s dynamic. It’s moving fast. In other words, you need learn fast and adapt quickly to keep up-to-date with the latest trend and technology being used in the industry. The … Read more How to use the power of “WHY” to achieve what you want
Mango Solutions, a full service certified partner of RStudio, are delighted to be supporting the rstudio::conf 2020 conference in San Francisco, on January 27th-30th. This annual tech conference promises to be ‘all things R’ and includes an impressive agenda covering tutorials, presentations and lighting talks from recognised R experts and developers. As long time champions … Read more LondonR & R-Ladies to host ‘watch party’ at RStudio Conference, Jan 29th
[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. How can we contribute to the development of early career researchers … Read more Get Better: early career researcher development
Cryptocurrency mining attacks continue to represent a threat to many of our Azure Linux customers. In the past, we’ve talked about how some attackers use brute force techniques to guess account names and passwords and use those to gain access to machines. Today, we’re talking about an attack that a few of our customers have … Read more Learning from cryptocurrency mining attack scripts on Linux
In this blog post I want to explore the EU regulation for average CO2 emissions of manufacturer’s car fleets using the EU data set of newly registered cars. The data was already studied on an aggregate level in my earlier post. Here we explore, in particular, the monetary implications of the regulatory details for different … Read more Exploring the tightened EU CO2 emission standards for cars in 2020 – Why now selling an electric car can be worth an extra 18000€ for producers.
Learning Machines proudly presents a fascinating guest post by decision and risk analyst Robert D. Brown III with a great application of R in the business and especially startup-arena! I encourage you to visit his blog too: Thales’ Press. Have fun! Introduction While Excel remains the tool of choice among analysts for business case analysis, … Read more Business Case Analysis with R (Guest Post)
[This article was first published on MilanoR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The new year has started and we at MilanoR are deep into the … Read more eRum2020: call for submissions open!
When to bet big and when to bet small A few weeks ago, my friends and I were playing bar trivia at our favorite bar in town. For those who don’t know, bar trivia is a competition where teams try to answer progressively harder trivia questions for points. The team with the most points at … Read more How to wager at the final questions in bar trivia
New research provides a quantitative solution to Fermi’s paradox Simulated settlement trajectories, showing how civilizations could spread through the Galaxy (source) Compared to the age of the Milky Way Galaxy, our 200,000-year old human species has only been around for just the blink of an eye. The Milky Way is at least 10 Billion years … Read more The Galactic Island Hypothesis
apply and lambda are some of the best things I have learned to use with pandas. I use apply and lambda anytime I get stuck while building a complex logic for a new column or filter. Let’s see if we can use them in CuDF also. a. Creating a Column You can create a new … Read more Minimal Pandas Subset for Data Scientists on GPU
This assumes that you have some basic knowledge of python and scrapy. If you are interested only in generating your dataset, skip this section and go to the sample crawl section on the GitHub repo. Gathering tweets URL by searching through hashtags For searching for tweets we will be using the legacy Twitter website. Let’s … Read more Hands-on Web Scraping: Building your Twitter dataset with python and scrapy