Data Tech Bridge

Posted on Mar 1

Amazon RDS - Cheat Sheet

AWS RDS Service Cheat Sheet for AWS Certified Data Engineer - Associate (DEA-C01)

Core Concepts and Building Blocks

Amazon Relational Database Service (RDS) is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient, resizable capacity for industry-standard relational databases and manages common database administration tasks.

Key components:

DB Instances: Primary building block of RDS
DB Engines: Supported database software (MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, Aurora)
DB Parameter Groups: Configuration settings for database engines
Option Groups: Additional features for specific DB engines
Subnet Groups: Collection of subnets for DB instance placement in VPC
Security Groups: Control network access to DB instances
Snapshots & Backups: Point-in-time recovery mechanisms
Multi-AZ Deployments: High availability configuration
Read Replicas: Performance scaling for read operations

Mind Map: AWS RDS Components

Amazon RDS
├── DB Instances
│   ├── Instance Classes (T3, M5, R5, etc.)
│   ├── Storage Types (GP2, GP3, io1, io2)
│   └── Configurations (CPU, Memory, Storage)
├── Database Engines
│   ├── MySQL
│   ├── PostgreSQL
│   ├── MariaDB
│   ├── Oracle
│   ├── SQL Server
│   └── Aurora (MySQL/PostgreSQL compatible)
├── High Availability
│   ├── Multi-AZ Deployments
│   └── Automatic Failover
├── Scaling Options
│   ├── Vertical Scaling (Instance Resize)
│   ├── Storage Scaling
│   └── Read Replicas
├── Security
│   ├── VPC & Security Groups
│   ├── IAM Authentication
│   ├── Encryption (At-rest & In-transit)
│   └── SSL Connections
├── Monitoring & Management
│   ├── CloudWatch Metrics
│   ├── Enhanced Monitoring
│   └── Performance Insights
└── Backup & Recovery
    ├── Automated Backups
    ├── Manual Snapshots
    └── Point-in-Time Recovery

Key Points for AWS RDS

RDS supports six database engines: MySQL, PostgreSQL, MariaDB, Oracle, Microsoft SQL Server, and Amazon Aurora.
DB instances are the basic building blocks of RDS, consisting of isolated database environments in the cloud.
RDS offers three storage types: General Purpose SSD (gp2/gp3), Provisioned IOPS SSD (io1/io2), and Magnetic (standard, legacy).
General Purpose SSD (gp3) provides baseline performance of 3,000 IOPS and 125 MiB/s throughput, with ability to provision up to 16,000 IOPS and 1,000 MiB/s throughput.
General Purpose SSD (gp2) provides baseline 3 IOPS per GB with a minimum of 100 IOPS and can burst up to 3,000 IOPS.
Provisioned IOPS SSD (io1/io2) allows you to specify IOPS rate when creating or modifying a DB instance, with maximum ratio of 50:1 between IOPS and storage in GiB.
Maximum storage size for RDS instances is 64 TiB for most engines (16 TiB for SQL Server).
Maximum provisioned IOPS is 256,000 for io2 Block Express (supported only on specific instance types).
Multi-AZ deployments provide enhanced availability and durability by automatically provisioning and maintaining a synchronous standby replica in a different AZ.
During a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to the standby replica (typically 60-120 seconds failover time).
Read replicas allow you to create read-only copies of your database to offload read traffic from the primary DB instance.
You can create up to 15 read replicas per source DB instance.
Read replicas can be promoted to standalone DB instances (breaking replication).
Cross-region read replicas are supported, allowing for global distribution of read workloads.
RDS supports automated backups with point-in-time recovery (PITR) for up to 35 days.
Manual DB snapshots can be retained indefinitely until explicitly deleted.
RDS instances can be encrypted at rest using AWS KMS keys, but encryption cannot be added to an existing unencrypted instance (must create new encrypted instance).
All engines support encryption of data in transit using SSL/TLS.
RDS supports IAM database authentication for MySQL and PostgreSQL, allowing token-based authentication.
Parameter groups allow you to manage database engine configuration with parameters that apply to all databases on the instance.
Option groups let you enable and configure additional features for specific database engines.
RDS instances can be placed in a VPC for network isolation, with security groups controlling access.
Enhanced Monitoring provides OS-level metrics for your DB instance with granularity as low as 1 second.
Performance Insights helps you monitor database load and analyze performance issues.
RDS supports storage autoscaling, automatically increasing storage when free space is low.
Maximum database size varies by engine: MySQL/MariaDB/PostgreSQL/Oracle (64 TiB), SQL Server (16 TiB).
RDS instances can be accessed only through the endpoint DNS name, not through IP addresses.
Each DB instance has a weekly maintenance window for system updates.
You can modify most DB instance settings without downtime, but some changes require an instance reboot.
RDS Proxy provides connection pooling and reduces connection management overhead for applications.
RDS supports major and minor version upgrades for all database engines.
Blue/Green deployments allow for safe database updates with minimal downtime.
RDS can automatically scale storage when approaching capacity limits (if enabled).
RDS supports resource-level permissions through IAM policies.
RDS instances can be tagged for cost allocation and organization.
RDS supports reserved instances for significant cost savings on long-term commitments.
RDS instances can be stopped for up to 7 days (storage costs still apply).
RDS supports integration with AWS Secrets Manager for credential management.
RDS can publish logs to CloudWatch Logs for centralized log management.
RDS supports CloudWatch Events for DB instance state changes.

Database Engine Comparison

Feature	MySQL	PostgreSQL	MariaDB	Oracle	SQL Server	Aurora
Max DB Size	64 TiB	64 TiB	64 TiB	64 TiB	16 TiB	128 TiB
Max Tables	Unlimited	Unlimited	Unlimited	Unlimited	Limited by storage	Unlimited
IAM Auth	Yes	Yes	No	No	No	Yes
Encryption	Yes	Yes	Yes	Yes	Yes	Yes
Read Replicas	15	15	15	5	5	15
Multi-AZ	Yes	Yes	Yes	Yes	Yes	Built-in
Engine Versions	5.7, 8.0	10-15	10.3-10.6	12c, 19c	2014-2019	MySQL 5.7/8.0, PostgreSQL 11-15
Global Database	No	No	No	No	No	Yes
Serverless	No	No	No	No	No	Yes (v1, v2)
Auto Scaling	Storage only	Storage only	Storage only	Storage only	Storage only	Compute & Storage

Instance Types and Use Cases

Instance Family	Use Case	Key Characteristics
T3/T4g	Dev/Test, Small Production	Burstable performance, low cost
M5/M6g/M6i	General Purpose	Balanced compute, memory, and network
R5/R6g/R6i	Memory-Optimized	High memory-to-vCPU ratio for memory-intensive workloads
X2g	Memory-Optimized	Highest memory-to-vCPU ratio for in-memory databases
C5/C6g/C6i	Compute-Optimized	High performance for compute-bound workloads
z1d	High Frequency	High frequency (4.0 GHz) for latency-sensitive workloads

Storage Types and Performance

Storage Type	Use Case	IOPS	Throughput	Cost
General Purpose SSD (gp3)	Most workloads	3,000 baseline, up to 16,000	125 MiB/s baseline, up to 1,000 MiB/s	$$
General Purpose SSD (gp2)	Dev/Test, small-to-medium DBs	3 IOPS/GB (min 100, max 16,000)	Limited by IOPS	$$
Provisioned IOPS SSD (io1)	I/O-intensive workloads	Up to 64,000	Limited by IOPS	$$$
Provisioned IOPS SSD (io2)	Critical, I/O-intensive workloads	Up to 64,000 (256,000 for Block Express)	Limited by IOPS	$$$
Magnetic (standard)	Legacy applications	N/A	N/A	$

Example Storage Calculation

For a database requiring 5,000 IOPS with 500 GB storage:

gp2: 500 GB × 3 IOPS/GB = 1,500 baseline IOPS (insufficient)
gp3: 3,000 baseline IOPS + 2,000 additional IOPS = 5,000 IOPS
io1/io2: Directly provision 5,000 IOPS

Cost comparison (example, prices vary by region):

gp3: $0.10/GB-month + $0.10/provisioned IOPS-month above 3,000 = $50 + $0.20 = $50.20/month
io1: $0.125/GB-month + $0.10/provisioned IOPS-month = $62.50 + $500 = $562.50/month

RDS Features and Details

Feature	Description	Limits/Notes
DB Instances	Primary building block of RDS	Up to 40 per region (default, can be increased)
Storage	SSD options for different performance needs	20 GB to 64 TiB (16 TiB for SQL Server)
Multi-AZ	Synchronous standby replica in different AZ	60-120 second failover time
Read Replicas	Asynchronous replicas for read scaling	Up to 15 per source instance
Automated Backups	Daily full backup with transaction logs	0-35 day retention period
Manual Snapshots	User-initiated backups	Retained until explicitly deleted
Point-in-Time Recovery	Restore to any point within backup retention period	5-minute recovery point objective (RPO)
Encryption	KMS-based encryption for data at rest	Cannot encrypt existing unencrypted instance
IAM Authentication	Token-based authentication	MySQL and PostgreSQL only
Enhanced Monitoring	OS-level metrics	1, 5, 10, 15, 30, or 60-second intervals
Performance Insights	Database performance analysis	7 days free, extended retention available
Event Notifications	SNS notifications for RDS events	Up to 20 subscriptions per region
Parameter Groups	Engine configuration settings	Engine-specific parameters
Option Groups	Additional database features	Engine-specific options
Storage Autoscaling	Automatic storage increases	Must enable, 10% free space threshold
RDS Proxy	Connection pooling service	Reduces connection management overhead
Blue/Green Deployments	Database update staging environment	Minimal downtime for major updates
Reserved Instances	Discounted pricing for 1-3 year commitments	Up to 60% cost savings
Global Databases	Cross-region replication	Aurora only, 1-second typical replication lag
Serverless	Automatic scaling based on workload	Aurora only (v1, v2)

RDS Limits

Maximum DB instances per region: 40 (default, can be increased)
Maximum storage per DB instance: 64 TiB (16 TiB for SQL Server)
Maximum IOPS with io1/io2: 256,000 (with io2 Block Express)
Maximum read replicas per source instance: 15
Maximum backup retention period: 35 days
Maximum parameter groups per region: 50
Maximum option groups per region: 20
Maximum event subscriptions per region: 20
Maximum DB security groups per region: 25
Maximum rules per DB security group: 20
Maximum DB subnet groups per region: 50
Maximum subnets per DB subnet group: 20
Maximum VPC security groups per DB instance: 5

Performance Considerations

Choose the appropriate instance class based on workload requirements (memory, CPU, network).
Use Provisioned IOPS (io1/io2) for I/O-intensive workloads requiring consistent performance.
For cost-effective performance, use gp3 with ability to independently configure IOPS and throughput.
Monitor storage IOPS utilization and increase provisioned IOPS if consistently above 90%.
Use read replicas to offload read traffic from the primary instance.
Consider Aurora for workloads requiring high throughput and low latency.
Optimize database queries and indexes to improve performance.
Use appropriate parameter group settings for your workload (buffer sizes, connection limits).
Monitor connection count to avoid hitting max_connections limit.
Use RDS Proxy to manage database connections efficiently for applications with many short-lived connections.
Enable Performance Insights to identify database load bottlenecks.
Use Enhanced Monitoring to identify OS-level resource constraints.
Consider placement in the same AZ as your application for lower latency (if not using Multi-AZ).
For write-heavy workloads, scale vertically by choosing a larger instance class.
For read-heavy workloads, scale horizontally using read replicas.
Implement connection pooling at the application level to reduce connection overhead.
Use appropriate transaction isolation levels for your workload.

Data Ingestion and Migration

AWS Database Migration Service (DMS) can be used for homogeneous and heterogeneous migrations to RDS.
Native database tools (mysqldump, pg_dump) can be used for smaller databases.
For large databases, consider using physical backups and AWS Snowball for initial data load.
RDS supports importing data from Amazon S3 for MySQL and PostgreSQL.
Use AWS SCT (Schema Conversion Tool) for heterogeneous migrations.
DMS supports continuous data replication with minimal downtime.
For high-volume data ingestion, consider batch processing to avoid performance impact.
Implement throttling in your application to control the rate of writes to the database.
For bulk data loads, consider temporarily disabling foreign key constraints and indexes.
RDS MySQL supports LOAD DATA FROM S3 for efficient data loading.
RDS PostgreSQL supports the COPY command for efficient data loading.
When migrating to RDS, ensure network bandwidth is sufficient for data transfer.
For replayable data ingestion pipelines, use change data capture (CDC) with DMS.
Consider using Aurora parallel query for faster data loading and analysis.
Implement idempotent data ingestion processes to handle retries safely.

Monitoring RDS with CloudWatch

Metric	Description	Threshold Recommendation
CPUUtilization	Percentage of CPU utilization	<80% sustained
DatabaseConnections	Number of client connections	<80% of max_connections
FreeableMemory	Amount of available RAM	>20% of total memory
FreeStorageSpace	Available storage space	>20% of allocated storage
ReadIOPS/WriteIOPS	I/O operations per second	<90% of provisioned IOPS
ReadLatency/WriteLatency	Average time for I/O operations	<20ms for SSD, <100ms for magnetic
DiskQueueDepth	Number of outstanding I/O requests	<10 for most workloads
ReplicaLag	How far behind read replica is from primary	Depends on RPO, typically <30s
SwapUsage	Amount of swap space used	Should be near zero
BinLogDiskUsage	Binary log space used (MySQL/MariaDB)	Monitor for growth trends
NetworkReceiveThroughput/NetworkTransmitThroughput	Network traffic to/from instance	Monitor for growth trends

Set up CloudWatch alarms for critical metrics like CPUUtilization, FreeStorageSpace, and DatabaseConnections.
Use Enhanced Monitoring for deeper insights into OS processes and resource utilization.
Performance Insights provides a Performance Management dashboard with key metrics like DB load and top SQL queries.
Create custom CloudWatch dashboards combining RDS metrics with application metrics for end-to-end monitoring.
Set up RDS Event subscriptions for operational notifications.

High Availability and Disaster Recovery

Multi-AZ deployments provide high availability within a region with automatic failover.
Read replicas can be promoted to primary in case of regional failure (manual process).
Cross-region read replicas provide disaster recovery capabilities across regions.
Aurora Global Database provides cross-region replication with fast failover for disaster recovery.
RDS automated backups and manual snapshots can be copied across regions for disaster recovery.
Recovery Time Objective (RTO) for Multi-AZ failover is typically 60-120 seconds.
Recovery Point Objective (RPO) for automated backups is typically 5 minutes or less.
Test failover procedures regularly to ensure they work as expected.
Document recovery procedures for different failure scenarios.
Implement application-level retry logic to handle temporary database unavailability.

Security Best Practices

Use VPC security groups to restrict access to RDS instances.
Enable encryption at rest for all production databases.
Use SSL/TLS for all database connections.
Implement IAM database authentication where supported.
Rotate database credentials regularly.
Use AWS Secrets Manager to manage and rotate database credentials.
Enable audit logging for database activity.
Use AWS Config and Security Hub to monitor RDS security compliance.
Implement least privilege access for database users.
Regularly patch database engines by upgrading to the latest minor version.

Cost Optimization

Use Reserved Instances for predictable workloads to save up to 60%.
Stop non-production instances when not in use (storage costs still apply).
Right-size instances based on CloudWatch metrics and Performance Insights.
Use Aurora Serverless for variable or unpredictable workloads.
Monitor and optimize storage usage to avoid unnecessary costs.
Use gp3 storage for better price-performance ratio compared to io1/io2.
Consider Multi-AZ only for production environments requiring high availability.
Use read replicas strategically only where needed for performance.
Set appropriate backup retention periods based on compliance requirements.
Use Storage Auto Scaling to avoid over-provisioning storage.

Implementing Throttling and Overcoming Rate Limits

RDS has service quotas like maximum DB instances per region (40 by default) that can be increased via AWS Support.
Implement connection pooling to manage database connections efficiently and avoid hitting max_connections limit.
Use exponential backoff and jitter in applications when encountering throttling or connection errors.
For bulk operations, implement batch processing with controlled throughput.
Monitor API call rates to avoid hitting AWS API throttling limits.
Use RDS Proxy to manage connection pools and reduce connection churn.
Implement circuit breakers in applications to prevent cascading failures during database overload.
For write-heavy workloads, consider sharding data across multiple RDS instances.
Use caching solutions like ElastiCache to reduce database load.
Implement rate limiting at the application layer to control database access.

Throughput and Latency Characteristics

RDS MySQL/PostgreSQL can typically handle thousands of transactions per second depending on instance size and configuration.
Aurora can provide up to 5x the throughput of standard MySQL and 3x the throughput of standard PostgreSQL.
Network latency between application and database is typically <1ms within same AZ, 1-2ms across AZs, and 50-100ms across regions.
Read replicas add replication lag (typically milliseconds to seconds) but can significantly increase read throughput.
Storage throughput varies by storage type: gp3 (125-1,000 MiB/s), io1/io2 (limited by IOPS and instance type).
Transaction commit latency is higher in Multi-AZ deployments due to synchronous replication.
Aurora Global Database typically has <1 second replication lag between regions.
RDS Proxy adds minimal latency (typically <1ms) while providing connection pooling benefits.
Larger instance types generally provide better network performance and higher EBS-optimized throughput.
I/O-intensive operations like full table scans can saturate storage throughput and increase latency for other operations.

Replayability of Data Ingestion Pipelines

Use AWS DMS with CDC (Change Data Capture) for continuous replication with the ability to restart from checkpoints.
Implement idempotent data processing to safely handle retries and duplicates.
Use Amazon MSK (Managed Streaming for Kafka) or Kinesis as a buffer for database writes with replay capability.
Store raw data in S3 before processing to enable reprocessing if needed.
Implement a dead letter queue (DLQ) for failed database operations to retry later.
Use transaction IDs or sequence numbers to track progress and enable resumption.
Implement checkpointing in data pipelines to restart from known good states.
Consider using AWS Glue for ETL processes with built-in retry mechanisms.
Use AWS Step Functions to orchestrate complex data ingestion workflows with error handling and retry logic.
Implement proper logging and monitoring to quickly identify and resolve data ingestion issues.

RDS Aurora Specific Features

Aurora provides up to 15 low-latency read replicas with typical lag under 100ms.
Aurora storage automatically scales in 10GB increments up to 128TB.
Aurora replicates data across 6 storage nodes in 3 AZs for high durability.
Aurora can automatically recover from physical storage failures.
Aurora Global Database supports cross-region replication with typical lag under 1 second.
Aurora Serverless v2 automatically scales capacity based on workload, from 0.5 to 128 ACUs (Aurora Capacity Units).
Aurora supports zero-downtime patching for most maintenance operations.
Aurora parallel query can improve analytical query performance by pushing processing to the storage layer.
Aurora supports backtrack, allowing you to "rewind" a database to a previous point in time without restoring from backup.
Aurora supports fast cloning using copy-on-write protocol, making database clones very storage-efficient.

Open Source Components in RDS

RDS Engine	Open Source Version	Key Differences in RDS Implementation
MySQL	Community Edition	Limited root access, automated management, some plugins disabled
PostgreSQL	Community Edition	Limited superuser access, automated management, some extensions pre-installed
MariaDB	Community Edition	Limited root access, automated management, some plugins disabled

RDS MySQL is based on the community edition but restricts certain features that require file system access.
RDS PostgreSQL supports many but not all PostgreSQL extensions (e.g., no custom C functions).
RDS MariaDB follows similar restrictions as MySQL regarding root access and filesystem access.
Aurora MySQL is wire-compatible with MySQL but uses a completely different storage architecture.
Aurora PostgreSQL is wire-compatible with PostgreSQL but uses Aurora's distributed storage system.
RDS provides automated minor version upgrades for all open source engines.
RDS parameter groups allow configuration of most but not all engine parameters.
RDS manages log rotation and backup management differently than self-managed open source databases.
RDS provides enhanced monitoring and metrics beyond what's available in open source versions.
RDS restricts direct OS access that would be available in self-managed open source databases.

Advanced Features and Integrations

RDS supports integration with AWS Lambda for custom automation via event notifications.
RDS can publish logs to CloudWatch Logs for centralized log management and analysis.
RDS integrates with AWS Backup for cross-service backup management.
RDS supports tagging for cost allocation and resource organization.
RDS can be integrated with AWS Directory Service for AD authentication (SQL Server and Oracle).
RDS PostgreSQL supports logical replication starting with version 10.
RDS MySQL supports GTID-based replication starting with version 5.7.
RDS Oracle supports Oracle Data Guard for managed disaster recovery.
RDS SQL Server supports native backup and restore to/from S3.
RDS PostgreSQL supports Foreign Data Wrappers for connecting to external data sources.

Best Practices for Data Engineering

Design database schemas with partitioning for large tables to improve query performance and manageability.
Implement appropriate indexing strategies based on query patterns.
Use connection pooling at the application layer to efficiently manage database connections.
Implement a caching strategy using ElastiCache to reduce database load for frequently accessed data.
Design for horizontal scaling using read replicas for read-heavy workloads.
Implement database sharding for write-heavy workloads that exceed single-instance capacity.
Use appropriate transaction isolation levels based on application requirements.
Implement proper error handling and retry logic in applications accessing RDS.
Regularly analyze and optimize slow queries using Performance Insights.
Implement a comprehensive monitoring strategy covering database performance, availability, and security.
Use AWS Glue Data Catalog to make RDS data discoverable for analytics.
Consider using Aurora for mission-critical applications requiring high throughput and availability.
Implement proper backup and disaster recovery procedures based on RPO/RTO requirements.
Use blue/green deployments for major version upgrades to minimize risk and downtime.
Implement a database change management process using infrastructure as code (e.g., AWS CloudFormation).
Consider data lifecycle management, archiving old data to S3 for cost optimization.
Use AWS DMS for continuous replication to data warehouses or data lakes for analytics.
Implement proper security controls including encryption, network isolation, and access management.
Consider multi-region architectures for global applications requiring low latency and high availability.
Regularly test recovery procedures to ensure they meet business requirements.

DEV Community