Jimmy Dahlqvist for AWS Heroes

Posted on Feb 3

PEP and PDP for Secure Authorization with Cognito

#aws #cloud #serverless #authz

In one of my previous posts Building a serverless connected BBQ as SaaS - Part 4 - AuthZ I touched on the topic around Authentication and Authorization with distributed PEPs (Policy Enforcement Points) and a centralized PDP (Policy Decision Point). In this post I will dig a bit deeper and expand on that setup. I'll explore how these concepts work in practice, the benefits they offer, and how we can leverage them in our serverless architecture using AWS Lambda, API Gateway, and Cognito User Pools.

Additionally, I’ll talk about Role-Based Access Control (RBAC) model, how to implement it using Cognito Groups and DynamoDB, and how caching can boost the performance of our authorization system.

The entire setup, with detailed deployment instructions, and all the code can be found on Serverless Handbook PEP and PDP

Let's start with a short recap.

Authentication (AuthN) vs. Authorization (AuthZ)

It’s crucial to distinguish between Authentication and Authorization, two terms that often get mixed up, and that I have had to explain on so many occasions, but serve very different purposes.

Authentication (AuthN)

Authentication is all about verifying identity. It answers the question, Who are you? When a user logs into our application, authentication ensures that they are who they claim to be. This could involve something as simple as a username and password or more complex multi-factor authentication (MFA).

Authorization (AuthZ)

Once a user’s identity is authenticated, authorization kicks in. This process answers the question, What can you do? Authorization determines what resources, data, and actions a user is permitted to access based on their roles and permissions.

What are PEP and PDP?

Before diving into how to implement them in AWS, it’s essential to understand the roles that PEP and PDP play in authorization.

PEP (Policy Enforcement Point)

In simple terms, the PEP is the gatekeeper. It is the points in our system where access control decisions are enforced. When a user attempts to access a protected resource, the PEPs is the component responsible for checking whether the request is allowed or denied based on the user’s permissions.

In our case, the Lambda Authorizer in API Gateway acts as the PEP. The Lambda Authorizer intercepts every incoming API request, validates the JWT token (typically from Cognito User Pool or any identity provider), and forwards the user’s information (claims) to the PDP for authorization evaluation.

The PEP ensures that the JWT is valid, checks its expiration, verifies its signature, and validates claims (like aud, iss, and sub). It then passes the claims to the PDP for a final decision on whether the user is authorized to access the requested resource.

PDP (Policy Decision Point)

The PDP is where the authorization logic resides. Once the PEP checks the JWT and ensures that the token is valid, the PDP determines whether the user is allowed to access the requested resource based on their roles, permissions, or policies.

The PDP is a separate, implementation, a separate micro service. In our case a Lambda function, that performs the actual authorization decision. It checks the user’s roles, which are stored in the groups claim in the JWT (from Cognito), and compares them against the permissions required to access a specific resource, stored in a data store. In our case we'll use DynamoDB.

The PDP validates if the user has the necessary permissions (like Admin, User, or Manager) to access the resource (e.g., GET /admin, POST /profile). The PDP can also incorporate additional business logic, such as checking time-based access or geo-fencing.

Benefits of using PEP and PDP in Authorization

Implementing distributed PEP and centralized PDP offers several benefits, especially as our applications scales.

Separation of Concerns By splitting the concerns of enforce (PEP) and decision (PDP), we gain cleaner, more maintainable code. The Lambda Authorizer (PEP) is focused purely on validation and enforcement. While the PDP is dedicated to policy evaluation.

Reduced Latency: By placing PEPs close to where decisions need to be enforced, we can reduce the latency, with an caching strategy this can be reduced even more.

Management: With a centralized PDP, all of our authorization logic is centralized in one location. This makes it easier to manage and update policies as our requirements evolve. Whether it’s modifying roles or adding new permission sets, having a central PDP reduces the overhead of updating policies in multiple places.

Consistency and Compliance: Every request is evaluated against the same set of policies, ensuring consistent decision-making across our system.

Scalability: Both the PEP and PDP components scale independently based on demand. If our system needs to handle a larger volume of requests, API Gateway and Lambda can scale automatically. Additionally, the PDP can be optimized for performance by implementing caching.

Flexibility: A PDP allows us to adapt the authorization model to our needs. If our requirements change (for example, moving to attribute-based access control (ABAC) or introducing a more granular permission system), we can easily modify the PDP to accommodate these changes without affecting other parts of the system.

Using PEP and PDP in AWS with Serverless Architecture

In AWS, the PEP and PDP integration fits perfectly with serverless components like Lambda and API Gateway.

PEP - API Gateway Lambda Authorizer

When a client sends a request to our API Gateway endpoint, the Lambda Authorizer (PEP) intercepts the request before it hits our backend service. Our implementation will perform several key steps.

JWT Validation: It decodes the JWT, validates the signature, and checks if the token has expired.

Forwarding Claims: After verifying the token, the Lambda Authorizer forwards the claims (such as sub, groups, and role) to the PDP for further authorization checks. In our solution we will actually forward the entire JWT token.

To reduce the number of calls to our PEP and also PDP we can utilize the authorization cache that exists in API Gateway.

PDP - Authorization logic Lambda function

The PDP is implemented as separate Lambda function, in our case, and will receive the entire JWT token, or claims, to perform the authorization logic, that will include several steps.

Check the user’s role (using the groups claim from Cognito).
Query a DynamoDB table that contains role-to-permission mappings (e.g., which roles have access to which API endpoints).
Evaluates whether the user’s role matches the required permissions for the requested, resource or API endpoint.

ID Token vs Access Token

As we implement the PEP and PDP workflow, it’s essential to understand the difference between ID Tokens and Access Tokens, as both are often used in authorization workflows.

ID Token

The ID Token is primarily used for authentication and contains information about who the user is. It contains claims about the identity of the authenticated user, such as name, email, and phone_number.

Access Token

The Access Token is used to grant the user access to protected resources, authorization. The Access Token contains information about the user’s permissions, such as what resources they are allowed to access and the scopes they have been granted, which define what the user can do (e.g., read:profile, write:profile). The access token do not include the aud claim.

Token customization in Cognito

With the Pre token generation Lambda trigger we could before only customize the ID Token, therefor it was often used for authorization as well. With the introduction of new V2 event in Cognito User Pools we can customize both the ID anf Access token.

Implementing PEP and PDP

With that introduction completed let's dig into implementing a PEP and PDP with RBAC. Our PEP will be the Lambda Authorizer in API Gateway and our PDP will be a separate Lambda function. The PDP will use using Cognito Groups and DynamoDB for the RBAC authorization logic.

Architecture Overview

Just as a reminder, the entire code and all of the architecture can be found on Serverless Handbook PEP and PDP

In this solution we will implement our PEP using Lambda Authorizer in API Gateway. The PDP in this case will also be implemented using a Lambda function. We will assign users a Role using Cognito Groups and we keep an Role - Permission mapping in DynamoDB.

To better understand the flow during an API access.

As seen we will not use an API Gateway for our PDP. Instead our PEP will invoke the PDP Lambda function. There are pros and cons with this approach of course.
On the pro side we have lower latency, a direct Lambda invocation is often faster than an API call. Lower cost as we don't have to pay for the API Gateway invocation. On the backside, we do create a more tight coupling and changing the PDP implementation might get harder. We would need to implement a separate cache in the PDP, using an API Gateway we could rely on the API Gateway cache.

However, the approach you choose need to be a case by case approach, there is not a golden rule exactly how to implement this.

Deploy authentication and Cognito

The first thing we will do is to deploy and setup Cognito and the resources needed for login. We will setup the Cognito User Pool, configure the managed login, and a simple website that will handle the callbacks from Cognito and display our JWT tokens. For simplicity it will just be a static html page from CloudFront and some Lambda@Edge functions. I will use the setup that I have described in this blog post, so for a deep dive I recommend that you read that.

So as a first step deploy the Lambda@Edge, CloudFront distribution, and SSL certificate from Serverless Handbook PEP and PDP

Next, let's deploy and setup Cognito. We will create the UserPool, a client, login style, etc.

AWSTemplateFormatVersion: "2010-09-09"
Transform: "AWS::Serverless-2016-10-31"
Description: Creates the User Pool and Client used for Authentication
Parameters:
  ApplicationName:
    Type: String
    Description: The application that owns this setup.
  DomainName:
    Type: String
    Description: The domain name to use for cloudfront
  HostedAuthDomainPrefix:
    Type: String
    Description: The domain prefix to use for the UserPool hosted UI <HostedAuthDomainPrefix>.auth.[region].amazoncognito.com

Resources:
  UserPool:
    Type: AWS::Cognito::UserPool
    Properties:
      UsernameConfiguration:
        CaseSensitive: false
      AutoVerifiedAttributes:
        - email
      UserPoolName: !Sub ${ApplicationName}-user-pool
      Schema:
        - Name: email
          AttributeDataType: String
          Mutable: false
          Required: true
        - Name: name
          AttributeDataType: String
          Mutable: true
          Required: true

  UserPoolClient:
    Type: AWS::Cognito::UserPoolClient
    Properties:
      UserPoolId: !Ref UserPool
      GenerateSecret: True
      AllowedOAuthFlowsUserPoolClient: true
      CallbackURLs:
        - !Sub https://${DomainName}/signin
      AllowedOAuthFlows:
        - code
        - implicit
      AllowedOAuthScopes:
        - phone
        - email
        - openid
        - profile
      SupportedIdentityProviders:
        - COGNITO

  HostedUserPoolDomain:
    Type: AWS::Cognito::UserPoolDomain
    Properties:
      Domain: !Ref HostedAuthDomainPrefix
      ManagedLoginVersion: 2
      UserPoolId: !Ref UserPool

  ManagedLoginStyle:
    Type: AWS::Cognito::ManagedLoginBranding
    Properties:
      ClientId: !Ref UserPoolClient
      UserPoolId: !Ref UserPool
      UseCognitoProvidedValues: true

  UserPoolIdParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Sub /${ApplicationName}/userPoolId
      Type: String
      Value: !Ref UserPool
      Description: SSM Parameter for the User Pool Id
      Tags:
        ApplicationName: !Ref ApplicationName

  UserPoolHostedUiParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Sub /${ApplicationName}/userPoolHostedUi
      Type: String
      Value: !Sub https://${HostedAuthDomainPrefix}.auth.${AWS::Region}.amazoncognito.com/login?client_id=${UserPoolClient}&response_type=code&scope=email+openid+phone+profile&redirect_uri=https://${DomainName}/signin
      Description: SSM Parameter for the User Pool Hosted UI
      Tags:
        ApplicationName: !Ref ApplicationName

Outputs:
  CognitoUserPoolJwksUri:
    Value: !Sub https://cognito-idp.${AWS::Region}.amazonaws.com/${UserPool}/.well-known/jwks.json
    Description: The UserPool jwks uri
    Export:
      Name: !Sub ${AWS::StackName}:jwks-url
  CognitoUserPoolID:
    Value: !Ref UserPool
    Description: The UserPool ID
  CognitoAppClientID:
    Value: !Ref UserPoolClient
    Description: The app client
    Export:
      Name: !Sub ${AWS::StackName}:app-audience
  CognitoUrl:
    Description: The url
    Value: !GetAtt UserPool.ProviderURL
  CognitoHostedUI:
    Value: !Sub https://${HostedAuthDomainPrefix}.auth.${AWS::Region}.amazoncognito.com/login?client_id=${UserPoolClient}&response_type=code&scope=email+openid+phone+profile&redirect_uri=https://${DomainName}/signin
    Description: The hosted UI URL

With this deployment done we can move over to the Console and create groups that users can be added to. I will create three groups, Admin, Developer, and Test. Click on Create Group and give it a name. The group name would represent the Role that the user will have and determine what permission he/she will get, more on that setup further down.

We can then create some users and assign them to one of the groups.

To test this setup we can navigate to the webpage deployed with the CloudFront distribution and inspect the JWT tokens, cookies.

If we copy the access token and decode that, I use jwt.io, we can see that my user has the claim cognito:groups that our PEP and PDP will use later for permissions.

Setup and deploy PDP

Next we can deploy our Authorization service, our PDP, responsible for making permission decisions.

Logic will be implemented in a Lambda function and to manage role-based permissions, we will create a DynamoDB tabe that stores permissions for each role. Each permission defines what resources the user can access this could be a specific API endpoint and HTTP method, but of course not limited to that. We'll model the table data

PK (Partition Key): The Role (e.g., Admin, User).
SK (Sort Key): The resource, for example endpoint and Method e.g. GET /unicorn.
Action: The action e.g. GET, PUT, WRITE, READ, LIST etc
Resource: The Resource, for example the endpoint /unicorn
Effect: The Effect, Allow or Deny
Description: A description of the permission.

PK	SK	Action	Resource	Effect	Description
Admin	GET /unicorn	GET	/unicorn	Allow	Admin can access all unicorns
Test	POST /unicorn	POST	/unicorn	Allow	Test can post on unicorns
Developer	DELETE /unicorn	DELETE	/unicorn	Deny	Manager cannot delete a unicorn

This allows us to efficiently look up permissions for each role using a simple DynamoDB query.

AWSTemplateFormatVersion: "2010-09-09"
Transform: "AWS::Serverless-2016-10-31"
Description: Connected BBQ Application Tenant Service
Parameters:
  ApplicationName:
    Type: String
    Description: Name of owning application
  UserManagementStackName:
    Type: String
    Description: The name of the stack that contains the user management part, e.g the Cognito UserPool

Globals:
  Function:
    Timeout: 30
    MemorySize: 2048
    Architectures:
      - arm64
    Runtime: python3.12

Resources:
  PermissionsTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName:
        Fn::Sub: ${ApplicationName}-pdp-role-permission-map
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
      - AttributeName: PK
        AttributeType: S
      - AttributeName: SK
        AttributeType: S
      KeySchema:
      - AttributeName: PK
        KeyType: HASH
      - AttributeName: SK
        KeyType: RANGE

  LambdaPDPFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: Lambda/AuthZ
      Handler: authz.handler
      Policies:
        - DynamoDBReadPolicy:
            TableName: !Ref PermissionsTable
      Environment:
        Variables:
          JWKS_URL:
            Fn::ImportValue: !Sub ${UserManagementStackName}:jwks-url
          AUDIENCE:
            Fn::ImportValue: !Sub ${UserManagementStackName}:app-audience
          PERMISSIONS_TABLE:
            !Ref PermissionsTable

Outputs:
  PDPLambdaArn:
    Value: !GetAtt LambdaPDPFunction.Arn
    Description: The ARN of the PDP Lambda Function
    Export:
      Name: !Sub ${AWS::StackName}:pdp-lambda-arn
  PDPLambdaName:
    Value: !Ref LambdaPDPFunction
    Description: The Name of the PDP Lambda Function
    Export:
      Name: !Sub ${AWS::StackName}:pdp-lambda-name

Role Authorization logic

The PDP Lambda will decode the JWT, retrieve the role from the cognito:groups claim, and query the DynamoDB table to check if the role has permission to access the requested resource.

import os
import json
import jwt
import boto3
from jwt import PyJWKClient
from botocore.exceptions import ClientError

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["PERMISSIONS_TABLE"])
JWKS_URL = os.environ["JWKS_URL"]
AUDIENCE = os.environ["AUDIENCE"]


def handler(event, context):
  data = event
    jwt_token = data["jwt_token"]
    resource = data["resource"]
    action = data["action"]

    return check_authorization(jwt_token, action, resource)


def check_authorization(jwt_token, action, resource):
    try:
        jwks_client = PyJWKClient(JWKS_URL)
        signing_key = jwks_client.get_signing_key_from_jwt(jwt_token)

        decoded_token = jwt.decode(
            jwt_token,
            signing_key.key,
            algorithms=["RS256"],
            audience=AUDIENCE,
        )

        role = (
            decoded_token["cognito:groups"][0]
            if "cognito:groups" in decoded_token
            else None
        )

        if not role:
            raise Exception("Unauthorized: Role not found in the token")

        if validate_permission(role, action, resource):
            response_body = generate_access(
                decoded_token["sub"], "Allow", action, resource
            )

            return {
                "statusCode": 200,
                "body": json.dumps(response_body),
                "headers": {"Content-Type": "application/json"},
            }

    except Exception as e:
        print(f"Authorization error: {str(e)}")

    response_body = generate_access(decoded_token["sub"], "Deny", action, resource)

    return {
        "statusCode": 403,
        "body": json.dumps(response_body),
        "headers": {"Content-Type": "application/json"},
    }


def validate_permission(role, action, resource):
    print(f"validate_permission Role: {role}, Action: {action}, Resource: {resource}")
    try:
        response = table.query(
            KeyConditionExpression="PK = :role AND SK = :endpoint",
            ExpressionAttributeValues={
                ":role": role,
                ":endpoint": f"{action} {resource}",
            },
        )
        if response["Items"] and response["Items"][0]["Effect"] == "Allow":
            return True
        else:
            return False
    except ClientError as e:
        print(f"Error querying DynamoDB: {e}")
        return False


def generate_access(principal, effect, action, resource):
    auth_response = {
        "principalId": principal,
        "effect": effect,
        "action": action,
        "resource": resource,
    }
    return auth_response

Deploy API and PEP

Now we can deploy our API and PEP, Lambda Authorizer.

AWSTemplateFormatVersion: "2010-09-09"
Transform: "AWS::Serverless-2016-10-31"
Description: Create the API for self service certificate management
Parameters:
  ApplicationName:
    Type: String
    Description: Name of owning application
  UserManagementStackName:
    Type: String
    Description: The name of the stack that contains the user management part, e.g the Cognito UserPool
  PDPStackName:
    Type: String
    Description: The name of the stack that contains the PDP service

Globals:
  Function:
    Timeout: 30
    MemorySize: 2048
    Runtime: python3.12

Resources:
  LambdaGetUnicorn:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: Lambda/API/GetUnicorn
      Handler: handler.handler
      Events:
        GetUnicorns:
          Type: Api
          Properties:
            Path: /unicorn
            Method: get
            RestApiId: !Ref UnicornApi

  UnicornApi:
    Type: AWS::Serverless::Api
    Properties:
      Description: API for creating and managing Unicorns
      Name: !Sub ${ApplicationName}-api
      StageName: prod
      OpenApiVersion: '3.0.1'
      AlwaysDeploy: true
      EndpointConfiguration: REGIONAL
      Cors:
        AllowMethods: "'GET,PUT,POST,DELETE,OPTIONS'"
        AllowHeaders: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
        AllowOrigin: "'*'"
      Auth:
        AddDefaultAuthorizerToCorsPreflight: false
        Authorizers:
          LambdaRequestAuthorizer:
            FunctionArn: !GetAtt LambdaApiAuthorizer.Arn
            FunctionPayloadType: REQUEST
            Identity: 
              Headers: 
                - Authorization
              ReauthorizeEvery: 600
        DefaultAuthorizer: LambdaRequestAuthorizer

  LambdaApiAuthorizer:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: Lambda/Authorizer/
      Handler: auth.handler
      Policies:
        - LambdaInvokePolicy:
            FunctionName: 
              Fn::ImportValue: !Sub ${PDPStackName}:pdp-lambda-name
      Environment:
        Variables:
          JWKS_URL:
            Fn::ImportValue: !Sub ${UserManagementStackName}:jwks-url
          AUDIENCE:
            Fn::ImportValue: !Sub ${UserManagementStackName}:app-audience
          PDP_AUTHZ_ENDPOINT: 
            Fn::ImportValue: !Sub ${PDPStackName}:pdp-lambda-name

We set our PEP as the default authorizer that way it will be added to each resource and method. To reduce the number of calls to our PDP the Authorization cache in API gateway is used with a TTL of 600 seconds.

PEP Authorization logic

The PEP Lambda Authorizer will decode the JWT, check the validity, and then call the PDP for a final permission decision.

import os
import json
import jwt
import boto3
from jwt import PyJWKClient

lambda_client = boto3.client("lambda")


def handler(event, context):
    print(f"Event: {json.dumps(event)}")
    token = event["headers"].get("authorization", "")
    path = event["path"]
    method = event["httpMethod"]

    if not token:
        raise Exception("Unauthorized")

    token = token.replace("Bearer ", "")

    decoded_token = None
    try:
        jwks_url = os.environ["JWKS_URL"]

        jwks_client = PyJWKClient(jwks_url)
        signing_key = jwks_client.get_signing_key_from_jwt(token)

        decoded_token = jwt.decode(
            token,
            signing_key.key,
            algorithms=["RS256"],
            audience=os.environ["AUDIENCE"],
        )

        data = {
            "jwt_token": token,
            "resource": path,
            "action": method,
        }

        response = lambda_client.invoke(
            FunctionName=os.environ["PDP_AUTHZ_ENDPOINT"],
            InvocationType="RequestResponse",
            Payload=json.dumps(data),
        )

        response_payload = json.loads(response["Payload"].read())
        body = json.loads(response_payload["body"])
        effect = body["effect"]

        return generate_policy(
            decoded_token["sub"], effect, event["methodArn"], decoded_token
        )

    except Exception as e:
        print(f"Authorization error: {str(e)}")

    return generate_policy(
        decoded_token["sub"], "Deny", event["methodArn"], decoded_token
    )


def generate_policy(principal_id, effect, resource):
    auth_response = {
        "principalId": principal_id,
        "policyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {"Action": "execute-api:Invoke", "Effect": effect, "Resource": resource}
            ],
        },
    }
    return auth_response

Importance of caching

Caching is important in optimizing our authorization flow. By reducing calls to the PDP and speeding up decision-making, caching helps improve the overall performance, scalability, and cost-efficiency of our application.

Reduce Latency: By caching role and permission data, the PEP avoids repeated calls our PDP, leading to faster response times and lower latency for each request.
Decrease PDP Load: Caching minimizes the number of calls made our PDP, reducing the risk of hitting rate limits or throttling.
Improve Scalability: With fewer requests hitting our PDP, our architecture can scale more efficiently.
Lower Costs: Caching reduces the need for repeated PDP invocations, which directly lowers Lambda invocation costs.

Summary and conclusion

Implementing PEP and PDP in our authorization flow offers a highly scalable, flexible, and secure way to control access to resources. By leveraging AWS Lambda and API Gateway, we can build a serverless authorization system that separates authentication and authorization concerns, scales with demand, and simplifies policy management.

With the addition of Role-Based Access Control and DynamoDB for storing permissions, combined with in-memory caching for enhanced performance, we can create an authorization solution that fits both current and future needs.

Understanding the difference between ID Tokens and Access Tokens ensures that our system uses each appropriately, helping us build a more secure and efficient authorization system.

Happy coding, and stay secure!

Source Code