Eyal Estrin for AWS Community Builders

Posted on Mar 9 • Originally published at eyal-estrin.Medium

Navigating AWS Anti-Patterns: Common Pitfalls and Strategies to Avoid Them

#aws #cloud #security #architecture

Before beginning the conversation about AWS anti-patterns, we should ask ourselves – what is an anti-pattern?

I have searched the web, and found the following quote:

“An antipattern is just like a pattern, except that instead of a solution, it gives something that looks superficially like a solution but isn't one" (“Patterns and Antipatterns” by Andrew Koenig)

Key characteristics of antipatterns include:

They are commonly used processes, structures, or patterns of action.
They initially seem appropriate and effective.
They ultimately produce more negative consequences than positive results.
There exists a better, documented, and proven alternative solution. In this blog post, I will review some of the common anti-patterns we see on AWS environments, and how to properly use AWS services.

Using a permissive IAM policy

This is common for organizations migrating from the on-prem to AWS, and lack the understanding of how IAM policy works, or in development environments, where “we are just trying to check if some action will work and we will fix the permissions later…” (and in many cases, we fail to go back and limit the permissions).

In the example below, we see an IAM policy allowing access to all S3 buckets, including all actions related to S3:

In the example below, we see a strict IAM policy allowing access to specific S3 buckets, with specific S3 actions:

Publicly accessible resources

For many years, deploying resources such as S3 buckets, an EC2 instance, or an RDS database, caused them to be publicly accessible, which made them prone to attacks from external or unauthorized parties.

In production environments, there are no reasons for creating publicly accessible resources (unless we are talking about static content accessible via a CDN). Ideally, EC2 instances will be deployed in a private subnet, behind an AWS NLB or AWS ALB, and RDS / Aurora instances will be deployed in a private subnet (behind a strict VPC security group).

In the case of EC2 or RDS, it depends on the target VPC you are deploying the resources – the default VPC assigns a public IP while creating custom VPC allows us to decide if we need a public subnet or not.

In the example below, we see an AWS CLI command for deploying an EC2 instance with public IP:

In the example below, we see an AWS CLI command for deploying an EC2 instance without a public IP:

In the case of the S3 bucket, since April 2023, when creating a new S3 bucket, by default the “S3 Block Public Access” is enabled, making it private.

In the example below, we see an AWS CLI command for creating an S3 bucket, while enforcing private access:

Using permissive network access

By default, when launching an EC2 instance, the only default rule is port 22 for SSH access for Linux instances, accessible from 0.0.0.0/0 (i.e., all IPs), which makes all Linux instances (such as EC2 instances or Kubernetes Pods), publicly accessible from the Internet.

As a rule of thumb – always implement the principle of least privilege, meaning, enforce minimal network access according to business needs.

In the case of EC2 instances, there are a couple of alternatives:

Remotely connect to EC2 instances using EC2 instance connect or using AWS Systems Manager Session Manager.
If you insist on connecting to a Linux EC2 instance using SSH, make sure you configure a VPC security group to restrict access through SSH protocol from specific (private) CIDR. In the example below, we see an AWS CLI command for creating a strict VPC security group:

In the case of Kubernetes Pods, one of the alternatives is to create a network security policy, to restrict access to SSH protocol from specific (private) CIDR, as we can see in the example below:

Using hard-coded credentials

This is a common pattern organizations have been doing for many years. Storing (cleartext) static credentials in application code, configuration files, automation scripts, code repositories, and more.

Anyone with read access to the mentioned above will gain access to the credentials and will be able to use them to harm the organization (from data leakage to costly resource deployment such as VMs for Bitcoin mining).

Below are alternatives for using hard-coded credentials:

Use an IAM role to gain temporary access to resources, instead of using static (or long-lived credentials)
Use AWS Secrets Manager or AWS Systems Manager Parameter Store to generate, store, retrieve, rotate, and revoke any static credentials. Connect your applications and CI/CD processes to AWS Secrets Manager, to pull the latest credentials. In the example below we see an AWS CLI command for an ECS task pulling a database password from AWS Secrets Manager:

Ignoring service cost

Almost any service in AWS has its pricing, which we need to be aware of while planning an architecture. Sometimes it is fairly easy to understand the pricing – such as EC2 on-demand (pay by the time an instance is running), and sometimes the cost estimation can be fairly complex, such as Amazon S3 (storage cost per storage class, actions such as PUT or DELETE, egress data, data retrieval from archive, etc.)

When deploying resources in an AWS environment, we may find ourselves paying thousands of dollars every month, simply because we ignore the cost factor.

There is no good alternative for having visibility into cloud costs – we still have to pay for the services we deploy and consume, but with simple steps, we will have at least basic visibility into the costs, before we go bankrupt.

In the example below, we can see a monthly budget created in an AWS account to send email notifications when the monthly budget reaches 500$:

Naturally, the best advice is to embed cost in any design consideration, as explained in “The Frugal Architect” (https://www.thefrugalarchitect.com/)

Failing to use auto-scaling

One of the biggest benefits of the public cloud and modern applications is the use of auto-scaling capabilities to add or remove resources according to customer demands.

Without autoscaling, our applications will reach resource limits (such as CPU, memory, or network), and availability issues (in case an application was deployed on a single EC2 instance or single RDS node) which will have a direct impact on customers, or high cost (in case we have provisioned more compute resources than required).

Many IT veterans think of auto-scaling as the ability to add more compute resources such as additional EC2 instances, ECS tasks, DynamoDB tables, Aurora replicas, etc.

Autoscaling is not just about adding resources, but also the ability to adjust the number of resources (i.e., compute instances/replicas) to the actual customer’s demand.

A good example of a scale-out scenario (i.e., adding more compute resources), is a scenario where a publicly accessible web application is under a DDoS attack. An autoscale capability will allow us to add more compute resources, to keep the application accessible to legitimate customers' requests, until the DDoS is handled by the ISP, or by AWS (through the Advanced Shield service).

A good example of a scale-down scenario (i.e., removing compute resources) is 24 hours after Black Friday or Cyber Monday when an e-commerce website receives less traffic from customers, and fewer resources are required. It makes sense when we think about the number of required VMs, Kubernetes Pods, or ECS tasks, but what about databases?

Some services, such as Aurora Serverless v2, support scaling to 0 capacity, which automatically pauses after a period of inactivity by scaling down to 0 Aurora Capacity Units (ACUs), allowing you to benefit from cost reduction for workloads with inactivity periods.

Failing to leverage storage class

A common mistake when building applications in AWS is to choose the default storage alternative (such as Amazon S3 or Amazon EFS), without considering data access patterns, and as a result, we may be paying a lot of money every month, while storing objects/files (such as logs or snapshots), which are not accessed regularly.

As with any work with AWS services, we need to review the service documentation, understand the data access patterns for each workload that we design, and choose the right storage service and storage class.

Amazon S3 is the most commonly used storage solution for cloud-native applications (from logs, static content, AI/ML, data lakes, etc.

When using S3, consider the following:

For unpredictable data access patterns (for example when you cannot determine when or how often objects will be accessed), choose Amazon S3 Intelligent-Tiering.
If you know the access pattern of your data (for example logs accessed for 30 days and then archived) choose S3 lifecycle policies. Amazon EFS is commonly used when you need to share file storage with concurrent access (such as multiple EC2 instances reading from a shared storage, or multiple Kubernetes Pods writing data to a shared storage). When using EFS, consider the following:
For unpredictable data access patterns (for example if files are moved between performance-optimized tiers and cost-optimized tiers), choose Amazon EFS Intelligent Tiering.
If you know the access pattern of your data (for example logs older than 30 days), configure lifecycle policies.

Summary

The more we deep dive into application design and architecture, the more anti-patterns we will find. Our applications will run, but they will be inefficient, costly, insecure, and more.

In this blog post, we have reviewed anti-patterns from various domains (such as security, cost, and resource optimization).

I encourage the readers to read AWS documentation (including the AWS Well-Architected Framework and the AWS Decision Guides), consider what are you trying to achieve, and design your architectures accordingly.

Embrace a dynamic mindset - Always question your past decisions – there might be better or more efficient ways to achieve similar goals.

About the author

Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.

You can connect with him on social media (https://linktr.ee/eyalestrin).

Opinions are his own and not the views of his employer.

DEV Community

Navigating AWS Anti-Patterns: Common Pitfalls and Strategies to Avoid Them

Using a permissive IAM policy

Publicly accessible resources

Using permissive network access

Using hard-coded credentials

Ignoring service cost

Failing to use auto-scaling

Failing to leverage storage class

Summary

About the author

Top comments (0)

Read next

Installing Nvidia Datacenter GPU Manager on Amazon Linux 2023

Understanding eBPF: A Game-Changer for Linux Kernel Extensions

The Top 3 Mistakes People Make with DynamoDB (and How to Fix Them)

Building Scalable PHP Applications: Best Practices for Performance and Security