Scraping a table in a PDF, reliably and then test data quality

How to scrape a table within a PDF in Python, unit test the data for quality and then upload it to S3. Photo by Tim Mossholder on Unsplash Suppose you need to ingest some data into your data warehouse and after further discussions with your stakeholders the source of this data is a PDF document. … Read more Scraping a table in a PDF, reliably and then test data quality

Amazon AppFlow now supports AWS CloudFormation

Amazon AppFlow now supports AWS CloudFormation for creating and configuring Amazon AppFlow resources such as Connector profile and Amazon AppFlow Flow along with the rest of your AWS infrastructure—in a secure, efficient, and repeatable way. Amazon AppFlow is a fully managed integration service that enables customers to securely transfer data between AWS services and software-as-a-service … Read more Amazon AppFlow now supports AWS CloudFormation

Machine Learning on AWS SageMaker

Before we jump into this, let’s explain what we need to have in place — I’ll be quick, promise! Setup preparation Amazon S3 Amazon S3 is a storage service allowing us to store and protect our data in directories (Buckets). We will need this service to go forward Buckets: is a container for objects stored … Read more Machine Learning on AWS SageMaker

AWS Secrets Manager has been OSPAR assessed and approved

Security and compliance, including OSPAR, is a shared responsibility between AWS and you. For example, it is your responsibility to configure and manage secrets stored in Secrets Manager to meet ABS Guidelines. To learn more about the actions you may need to take to meet ABS Guidelines, read the AWS Cloud Compliance and OSPAR compliance … Read more AWS Secrets Manager has been OSPAR assessed and approved

Amazon Comprehend now helps you mask personally identifiable information from text documents

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It provides pre-trained models for recognizing entities, key phrases, sentiments, and other common elements in a document. You can also build custom models with Amazon Comprehend to recognize custom entities and classify documents.  Amazon Comprehend … Read more Amazon Comprehend now helps you mask personally identifiable information from text documents

Amazon Kinesis Data Analytics is now available in the Europe (Milan) AWS region

Amazon Kinesis Data Analytics is the easiest way to transform and analyze streaming data in real time with Apache Flink. Apache Flink is an open source framework and engine for processing data streams. Amazon Kinesis Data Analytics reduces the complexity of building and managing Apache Flink applications. Amazon Kinesis Data Analytics for Apache Flink integrates … Read more Amazon Kinesis Data Analytics is now available in the Europe (Milan) AWS region

Amazon Route 53 Resolver Now Supports VPC DNS Query Logging in AWS GovCloud (US) Regions

Route 53 Resolver is the Amazon DNS server (also sometimes referred to as “AmazonProvidedDNS” or the “.2 resolver”) that is available by default in all Amazon VPCs. Route 53 Resolver responds to DNS queries from AWS resources within a VPC for public DNS records, Amazon VPC-specific DNS names, and Amazon Route 53 private hosted zones. … Read more Amazon Route 53 Resolver Now Supports VPC DNS Query Logging in AWS GovCloud (US) Regions

Amazon AppFlow now supports new data formats for ingesting files into Amazon S3

Amazon AppFlow, a fully managed integration service that enables customers to securely transfer data between AWS services and software-as-a-service (SaaS) applications, now offers customers the flexibility to choose json, comma-separated values (CSV), or parquet as the file format when transferring data from a source application to Amazon S3. This feature is supported for all source … Read more Amazon AppFlow now supports new data formats for ingesting files into Amazon S3

Discord notification using CloudWatch Alarms, SNS and AWS Lambda

Select Metric First of all, you will need to choose a CloudWatch metric for the alarm to watch. For the Lambda Function there are 3 types of metrics: Invocation Metrics: binary indicators of the outcome of an invocation. Examples: Invocations, Errors, DeadLetterErrors, DestinationDeliveryFailures, Throttles. Performance Metrics: performance details about a single invocation. Such as: Duration, … Read more Discord notification using CloudWatch Alarms, SNS and AWS Lambda

Amazon S3 bucket owner condition helps to validate correct bucket ownership

S3 Request APIs can now include an optional bucket ownership condition parameter containing an AWS Account ID, that helps customers to verify that a specified AWS Account ID is associated with the bucket they are communicating with. When bucket owner condition is used, S3 API requests will only succeed if the bucket owner matches the … Read more Amazon S3 bucket owner condition helps to validate correct bucket ownership

Amazon CloudWatch Synthetics now supports enhanced monitoring for Broken Link and GUI Workflow Blueprints

Broken password reset links, or misconfigured buttons preventing customers from taking an action often go unnoticed unless reported by end customers. With CloudWatch Synthetics, you can continuously verify your customer experience even when there is no customer traffic on your web applications. This lets you discover issues before your customers do and react quickly to … Read more Amazon CloudWatch Synthetics now supports enhanced monitoring for Broken Link and GUI Workflow Blueprints

Announcing Data API for Amazon Redshift

Amazon Redshift can now be accessed using the built-in Data API, making it easy to build web-services based applications and integrating with services, including AWS Lambda, AWS AppSync, and AWS Cloud9. Redshift Data API simplifies data access, ingest, and egress from languages supported with AWS SDK such as Python, Go, Java, Node.js, PHP, Ruby, and … Read more Announcing Data API for Amazon Redshift

Amazon EKS now supports assigning EC2 security groups to Kubernetes pods

Previously, all pods on a node shared the same security groups. While IAM roles for service accounts solves the pod level security challenge at the authentication layer, many organization’s compliance requirements also mandate network segmentation as an additional defense in depth step. Kubernetes network policies provide an option for controlling network traffic within the cluster, … Read more Amazon EKS now supports assigning EC2 security groups to Kubernetes pods

EKS Now Supports Creation and Management of Fargate Profiles Using AWS CloudFormation

EKS Fargate profiles define which pods for your Amazon EKS clusters run on AWS Fargate, the AWS managed compute engine for containers. Previously, it was only possible to create and manage Fargate profiles using the EKS API or Console.   Now, you can create and manage Fargate profiles using AWS CloudFormation. This means that you … Read more EKS Now Supports Creation and Management of Fargate Profiles Using AWS CloudFormation

The Story of Data — Privacy By Design

Discuss the need for adopting frameworks like Privacy By Design very early in your data management life cycle Image by Author Every byte of data has a story to tell. The question is whether the story is being narrated accurately and securely. Usually, we focus sharply on the trends around data with a goal of … Read more The Story of Data — Privacy By Design

AWS Launch Wizard now supports SAP deployments with SUSE Linux Enterprise Server 15 SP1 and 12 SP5

AWS Launch Wizard offers a guided way of sizing, configuring, and deploying AWS resources for SAP HANA and SAP HANA-based Netweaver systems with a purpose built, easy to use wizard. The following table shows all of the operating systems currently supported for different SAP components that can be deployed with AWS Launch Wizard: AWS Launch … Read more AWS Launch Wizard now supports SAP deployments with SUSE Linux Enterprise Server 15 SP1 and 12 SP5

Meetings readiness checker APIs help developers ensure that end-users can join Amazon Chime SDK meetings from their devices

From the Amazon Chime SDK for JavaScript, a developer can call any of the nine meeting readiness checker methods. These consist of local tests for devices setup and network tests that confirm the application can connect to Amazon Chime by briefly joining and leaving a test Amazon Chime SDK meeting. When executing network tests, the … Read more Meetings readiness checker APIs help developers ensure that end-users can join Amazon Chime SDK meetings from their devices

Amazon Lightsail now offers new OS blueprints

In addition to providing compute instances preinstalled with your favorite OS, Lightsail bundles include storage and a generous amount of data transfer, so you have everything you need to get up and running, all for a fixed monthly price. After your bundles are deployed, Lightsail’s intuitive management console makes it easy to track metrics, create … Read more Amazon Lightsail now offers new OS blueprints

AWS X-Ray launches anomaly detection-based actionable insights in preview

With this feature, you can determine the root cause of the issue, visualize the upstream and downstream services affected by the anomaly, and understand its impact on your end users. You can also view the incident timeline to understand when the issue started and how it progressed. WS X-Ray Insights is available in the following … Read more AWS X-Ray launches anomaly detection-based actionable insights in preview

Amazon CloudFront announces support for TLSv1.3 for viewer connections

Better Performance TLSv1.3 provides better performance with a simpler handshake process that requires fewer roundtrips. TLSv1.3 requires one round-trip (1-RTT) compared to TLSv1.2 that requires two round trips (2-RTT) to negotiate a new secure connection which translates into real-world performance improvements with lower first byte latency. In our own internal tests in the US region … Read more Amazon CloudFront announces support for TLSv1.3 for viewer connections

Presto Federated Queries

Getting Started with Presto Federated Queries using Ahana’s PrestoDB Sandbox on AWS Audio introduction to the post According to The Presto Foundation, Presto (aka PrestoDB), not to be confused with PrestoSQL, is an open-source, distributed, ANSI SQL compliant query engine built for running interactive, ad-hoc analytic queries against data sources of all sizes ranging from … Read more Presto Federated Queries

AWS Cost & Usage Report now offers Monthly Granularity

We are excited to announce that management (payer) accounts can now set up AWS Cost & Usage reports at a monthly level. The AWS Cost & Usage Report contains the most comprehensive set of billing data available. In addition to the amount and corresponding cost of your AWS service usage it also includes metadata such … Read more AWS Cost & Usage Report now offers Monthly Granularity

AWS announces General Availability of Amazon GameLift feature update

In April, we announced the release of this update to GameLift FleetIQ in preview. Configurations you set during preview will continue to work in GA. In addition to the preview feature set, we are also introducing new improvements for GameLift FleetIQ that enable you to use only On-Demand Instances and check game server instance statuses. As a … Read more AWS announces General Availability of Amazon GameLift feature update

Pause and Resume Workloads on M5a and R5a Instances with Amazon EC2 Hibernation

Hibernation saves effort in setting up the applications from scratch, saves time by reducing the bootstrapping time taken by applications, and saves cost by pausing the EC2 instances when not required. By using Hibernation, you can maintain a fleet of pre-warmed instances to get to a productive state faster without modifying your existing applications. Hibernation … Read more Pause and Resume Workloads on M5a and R5a Instances with Amazon EC2 Hibernation

5 essential tips when using Apache Airflow to build an ETL pipeline for a database hosted on…

Tip 1: Start with the simplest DAG Your DAG, the high-level outline that defines tasks in a particular order, should be as simple as possible. It is obviously the best practice in programming, but easy to be forgotten. Why should we start with a simple DAG? Below is the final DAG configuration requirement for my … Read more 5 essential tips when using Apache Airflow to build an ETL pipeline for a database hosted on…

AWS Site-to-Site VPN now supports Internet Key Exchange (IKE) initiation

This feature is now available in these AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), EU (Ireland), EU (Frankfurt), EU (London), EU (Paris), EU (Stockholm), Asia Pacific (Singapore), Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Mumbai), Middle East (Bahrain), … Read more AWS Site-to-Site VPN now supports Internet Key Exchange (IKE) initiation

Amazon SNS launches client library supporting message payloads of up to 2 GB

Large payload transmissions are billed as one SNS request and one S3 request, with billing for the payload based on the amount of data stored in S3. If you need to deliver messages with large payload from Amazon SNS to Amazon SQS, you can use the existing SQS Extended Client Library, which automates the retrieval … Read more Amazon SNS launches client library supporting message payloads of up to 2 GB

Web application to control a swarm of Raspberry Pis with an AI-enabled inference engine — Part 2

The robot, before assembly (Photo by Author) The robot arm arrived from AliExpress, it was without a specific brand, advertised as a DIY toy, which made it an affordable option, costing me only $118 (+$40 in tariffs). Since gripping objects with a robot arm requires a lot of precision, which I could not possibly expect … Read more Web application to control a swarm of Raspberry Pis with an AI-enabled inference engine — Part 2

Distributed data pipelines made easy with AWS EKS and Prefect

How to set up a distributed cloud workflow orchestration system within minutes and focus on providing value rather than on managing clusters Photo by Luke Chesser on Unsplash Building distributed systems for ETL & ML data pipelines is hard. If you tried implementing one yourself, you may have experienced that tying together a workflow orchestration … Read more Distributed data pipelines made easy with AWS EKS and Prefect

AWS App Mesh controller for Kubernetes Version 1.1.1 now available with support for new mesh configuration controls

The AWS App Mesh controller for Kubernetes provides a way to configure and manage AWS App Mesh directly using Kubernetes. AWS App Mesh is a service mesh that provides application-level networking to standardize how your services communicate, giving you end-to-end visibility and ensuring high-availability for your applications.  Favorite

AWS Database Migration Service now supports MongoDB 4.0 as a source

AWS Database Migration Service (AWS DMS) has expanded functionality by adding support for MongoDB 4.0 as a source in AWS DMS v 3.4.1. Using DMS, you can now perform live migrations from MongoDB 4.0 clusters to any AWS DMS supported target including Amazon DocumentDB (with MongoDB compatibility) with minimal downtime. For full list of supported … Read more AWS Database Migration Service now supports MongoDB 4.0 as a source

Deploying Sklearn Machine Learning on AWS Lambda with SAM

On the command line, initialize a SAM application. $ sam init This will ask you 4 questions which will be used to build a starting template for our SAM application. Answer the questions as below. Prompts: Which template source would you like to use? 1 — AWS Quick Start Templates Which runtime would you like … Read more Deploying Sklearn Machine Learning on AWS Lambda with SAM

With an AWS Copilot, Give Kubernetes a Second Thought — Life With Data

New ECS CLI convenience makes it easier to deploy containers without managing infrastructure. Photo by Oscar Sutton on Unsplash Kubernetes is a fantastic container orchestration tool for scalable cloud computing applications. After years of development and use internally, Google open-sourced the tool in 2014, leading to an explosion of adoption from small businesses and enterprises … Read more With an AWS Copilot, Give Kubernetes a Second Thought — Life With Data

Amazon Transcribe now supports speaker labeling for streaming transcription

Amazon Transcribe can label between two and 10 speakers within a single live audio stream. Popular use cases that can leverage speaker labels include real-time contact center phone calls, audio, live media broadcasts, and even patient-clinician interactions during telehealth sessions. Speaker labeling for streaming audio is supported at no additional cost in all AWS Regions … Read more Amazon Transcribe now supports speaker labeling for streaming transcription

AWS Firewall Manager now supports security groups on Application Load Balancers and Classic Load Balancers

AWS Firewall Manager now supports security groups on Application Load Balancers and Classic Load Balancers, allowing you to centrally configure and audit security groups associated with these resource types, across multiple accounts in your organization. Firewall Manager today supports security groups associated with EC2 instances and Elastic Network Interfaces (ENIs). With this launch, you can … Read more AWS Firewall Manager now supports security groups on Application Load Balancers and Classic Load Balancers

Amazon EC2 M6g, C6g and R6g instances powered by AWS Graviton2 processors are now available in Asia Pacific (Mumbai, Singapore, Sydney) regions

These instances are powered by AWS Graviton2 processors that are built utlizing 64-bit Arm Neoverse cores and custom silicon designed by AWS. AWS Graviton2 processors deliver a major leap in performance and capabilities over first-generation AWS Graviton processors, with 7x performance, 4x the number of compute cores, 2x larger caches, and 5x faster memory. AWS Graviton2 processors … Read more Amazon EC2 M6g, C6g and R6g instances powered by AWS Graviton2 processors are now available in Asia Pacific (Mumbai, Singapore, Sydney) regions

Amazon Corretto 8 & 11 support extended

Amazon is extending long-term support (LTS) for Amazon Corretto 8 from June 2023 to May 2026 and for Amazon Corretto 11 from August 2024 to September 2027. Long-term support (LTS) for Corretto includes security updates and specific performance enhancements released at least quarterly. Amazon Corretto is a no-cost, multi-platform, production-ready distribution of OpenJDK.  Favorite

AWS Storage Gateway adds data protection features for Tape Gateway

WORM-enabled virtual tapes ensure that data on active tapes in your virtual tape library cannot be overwritten or erased, providing you an added layer of protection against malicious or accidental deletion of data. This new capability complements WORM capability of virtual tapes archived in Amazon S3 Glacier and Amazon S3 Glacier Deep Archive, providing you … Read more AWS Storage Gateway adds data protection features for Tape Gateway

Amazon Kinesis Data Streams announces two new API features to simplify consuming data from Kinesis streams

Kinesis Client Library (KCL) helps you quickly build custom consumer applications by handling complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed workers, and processing data with fault-tolerance. KCL enables you to focus on business logic while building consumer applications. Customers using the latest KCL versions, KCL 1.14 for … Read more Amazon Kinesis Data Streams announces two new API features to simplify consuming data from Kinesis streams

Amazon EKS support for Arm-based instances powered by AWS Graviton is now generally available

AWS Graviton processors are custom built by Amazon Web Services using 64-bit Arm Neoverse cores to deliver the best price performance for your cloud workloads running in Amazon EC2. The new general purpose (M6g), compute-optimized (C6g), and memory-optimized (R6g) instances deliver up to 40% better price/performance over comparable current generation x86-based instances for scale-out and … Read more Amazon EKS support for Arm-based instances powered by AWS Graviton is now generally available

Amazon Redshift Architecture

From 10,000 ft, Redshift appears like any other relational database with fairly standard SQL and entities like tables, views, stored procedures, and usual data types. We’ll start with Tables as these are containers for persistent data storage and will allow us to dive vertically into the architecture. This is what Redshift looks like from 10,000 … Read more Amazon Redshift Architecture

Amazon Comprehend adds five new languages to Custom Entity Recognition

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to analyze text documents and identify insights such as sentiment, entities, and topics from text. No ML expertise required. You can use Custom Entity Recognition to identify terms that are specific to your domain. For example, you can instantly extract product names, … Read more Amazon Comprehend adds five new languages to Custom Entity Recognition

AWS Copilot CLI launches v0.3 focused on operations and configuration

Today, AWS Copilot CLI for Amazon Elastic Container Service launched version 0.3.0. Starting with this release, you can configure details about an AWS Copilot environment such as a pre-existing VPC, subnets, and CIDR ranges, allowing you to use infrastructure created outside of AWS Copilot. Additionally, you can now configure how AWS Copilot builds your services … Read more AWS Copilot CLI launches v0.3 focused on operations and configuration

Amazon API Gateway now supports enhanced observability via access logs

Beginning today, customers can configure their HTTP, REST, and WebSocket APIs to include new variables in their access logs that provide enhanced observability of how API Gateway processes requests. The new access log variables provide customers with the information they need to troubleshoot issues with their API’s configuration, including latencies and status codes for each … Read more Amazon API Gateway now supports enhanced observability via access logs