DEV Community

Paul SANTUS for AWS Community Builders

Posted on • Edited on

How I - well, AWS WAF and CloudFront - saved the day for my client

I was working on the migration to AWS of my client, an e-retailer, when I received a phone call: “Paul, we are in trouble, our site has been attacked by denial of service for a week; we are losing money! Can you help?“

Rushing the migration project was not an option. The client had not yet containerized their app, we hadn't done any data migration test, nor any load test. But as I wrote in a previous blog post, Cloud services can also benefit on-premises infrastructure. Time to prove it!

The first analyzes carried out revealed the attacker used multiple IPs (from the TOR network) and targeted the site's login page. This page makes database calls and the database was overloaded, causing first latency (then outages) throughout the system.

Details of the URLs called (easily visible once WAF is activated)

I get to work right away. Thanks to Terraform, in half a day, I have a functional stack in a test environment, ready to be promoted to production.

The tech stack

Here is a diagram of the technical stack deployed to counter the attack my client faced:

On-premise infrastructure protected by AWS CloudFront and WAF

Here are the main additions to the existing stack:

  • Instead of sending directly to my client's on-site infrastructure, DNS now sends requests to CloudFront.
    • CloudFront is a managed Content-Delivery Network (CDN). That is to say, it makes it possible to serve cached content (or not) from locations close to clients.
    • During the incident, initially, it is not the cache functionality (which reduces the load on the servers, and the latency on the client side) that I was looking for, but rather the possibility of exposing an HTTPS endpoint as an proxy between visitors and my client's infrastructure.
  • Before relaying requests to my "origin" (the existing infrastructure), CloudFront passes through AWS WAF
    • WAF is a Web Application Firewall, which allows the inspection of HTTP requests.
  • On AWS WAF, I configured rules based on AWS managed rule sets. Here are the rules that proved most useful in stopping the attack:
    • The AWSManagedRulesAnonymousIpList rule group contains a rule which precisely targets known exit IPs of the TOR network as well as the most frequently used VPN services, and another one listing hosting providers (who may zombie machines). This rule will do 95% of the job.
    • The second AWSManagedRulesATPRuleSet allows precisely to protect the login pages, by analyzing requests that are made: do they include all the expected login form fields? Is an IP responsible for multiple authentication failures?
      • In addition to these rules, as a precautionary measure, we put in place the "usual" rules: SQL injections, PHP vulnerabilities, OWASP top10, etc.
      • Finally, we added a rule allowing IPs to be whitelisted (the economic model of our e-retailer involves quite a lot of traffic from a few partners, whose IPs were caught by the aforementioned hosting provider list).

Implementation and result

We moved my clients main DNS zone to the Route53 service (luckily, all the preparatory census work had been carried out before). This brings at least two benefits:

  • The automation offered by Route53, in conjunction with Terraform, allowed me to quickly generate the DNS entries necessary for the Certificate Manager service to deliver SSL certificates authenticating my client's domain.
  • The service makes it possible to define a dynamic “A” record (an alias) at the root of the domain, while RFC 1034 does not allow a CNAME (which cannot co-exist with other records) to be positioned at the root.

We created origin.mydomain.fr type records in this zone and my client did the required work on their webserver to process requests made to this address (including with a TLS certificate so that CloudFront - origin traffic is encrypted in-transit).

Once this was tested, we switched the DNS entries for mondomaine.fr and api.mondomaine.fr to CloudFront.

To avoid WAF bypass (in case the hacker discovers the origin URLs or simply directly uses the IP of my client's server), CloudFront was configured to send a "secret" header with each origin request, making it easy for on-premise infrastructure to filter any bypassing traffic.

The result is immediate: at 8pm. we made the switch. The site immediately became fully available again. At 9pm. the attacker stopped the attack (before waited for the next day for his next attempt)
Image description

The image below shows allowed traffic in orange and blocked traffic in blue. We therefore had 6000 requests per minute, more than twice the usual traffic:
Traffic blocked and allowed by WAF

A word on cost / FinOps

WAF costs $0.60 per million requests analyzed using basic managed rules (the group that includes all of our rules except one). That's less than $5 per day to protect my client.

Be careful though! Advanced rules like Account Takeover Protection are billed (after a free tier of 10,000 calls) $1 per 1000 (yes, 1000, not 1,000,000) calls.

And at the beginning, our configuration looked like this this:
ATP rule, triggered too often, can be costly

In 24 hours, we burned $700 worth of WAF usage. Fortunately, I had set up cost anomaly alarms when designing the landing zone! It took just a support ticket (category “dispute a charge”) for AWS to gracefully clean our slate! [Nb: in my experience, AWS always clears high slates resulting from configuration errors; this very good commercial policy is one of the reasons, along with the quality of their support, which makes it my favorite cloud provider].

In short, we corrected it by placing the ATP rule in last position in order of priority and, above all, by conditioning its execution on the presence of a label placed by another rule which tags requests on the /connection path.

Relief when we see the traffic going through the ATP rule go down

Relief all the same when we see the traffic passing by the ATP rule go down!

An additional benefit of Cloudfront

After a well-deserved rest, it was time to add an additional benefit for my client: activating Cloudfront cache for all the static resources served by the application.

Thanks to Terraform, it's not very complicated: the following block allows you to hide all the gifs.

 ordered_cache_behavior {
    path_pattern             = "*.gif"
    allowed_methods          = ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"]
    cached_methods           = ["GET", "HEAD", "OPTIONS"]
    target_origin_id         = local.origin_domain
    viewer_protocol_policy   = "redirect-to-https"
    cache_policy_id          = aws_cloudfront_cache_policy.cachingoptimizez_with_v_header.id
    origin_request_policy_id = "b689b0a8-53d0-40ab-baf2-68738e2966ac" #Hard-Coded: Forward all headers EXCEPT HOST, cookies and query strings
  }
Enter fullscreen mode Exit fullscreen mode

Here too, the effect is immediate. A few minutes later, almost 90% of requests were served by CloudFront, relieving my client's infrastructure of quite a load and improving time-to-full-load for clients!
Nearly 90% of resources served by the cache

Let's talk!

If you need help migrating to the Cloud, helping your dev teams take advantage of the many services available, do not hesitate to contact me via LinkedIn or my website.

TerraCloud, both feet on the ground and head in the Cloud!

Top comments (2)

Collapse
 
acontreras_mp profile image
Armando Contreras

great content! what about the shield standard? in theory that is enabled for any network edge solution on AWS.

Collapse
 
psantus profile image
Paul SANTUS

Hi @acontreras_mp !

Shield standard will only protect your workloads against Network (Layer 3) and Transport (Layer 4) attacks. So bots launching lots of TCP SYN packets for instance.

In this case (notwithstanding the fact that the actual workloads didn't run on AWS), the attacker bothered to run application-level queries (so they actually connected at TCP layer, established TLS-encrypted connection, and then sent HTTP payload).

Overloading the server with those is more costly for the attacker but requires less connections (thousands, not millions) than network-level attacks.

  • On a scalable infrastructure, they would increase hosting cost but also provide the attacker with chances to take control of the server trough malicious payload
  • On a non-scalable infrastructure, they would mostly act as DDoS.

Such attacks are those covered by WAF (I'm not familiar how Shield Advanced, which also includes WAF, would have blocked it).