DEV Community

Cover image for Serverless self-service IoT certificate management - Part 2
Jimmy Dahlqvist for AWS Heroes

Posted on

Serverless self-service IoT certificate management - Part 2

This is the second part in the series about a Certificate Self Service setup for IoT projects. In this part we'll extend the API that we started in the first part. We'll add the possibility to create multiple intermediate CAs, and thereby the possibility to create server certificates from one CA and client/device certificates from a different.

Using multiple intermediate CAs is a good practice from a security and management perspective. If your intermediate CA would become compromised you can revoke and rotate only the affected certificates, signed by the compromised CA. If the same intermediate CA is used to sign all device certificates, in case of a a breach, all device certificates would need to be rotated. We can split device certificates into "cells" and handle each cell independently.

As a reminder!!

WARNING

The solution I build in this series of posts is NOT suited for a production setup. This is purely meant for development environments and for learning purpose!


In a production environment we need:
  • Continuous monitoring and automated renewal of certificates.
  • Integration with hardware security modules (HSMs) for key storage.
  • Compliance with security standards.

For these needs, managed services like AWS Private CA, DigiCert IoT Trust Manager, and Let’s Encrypt are ideal.

Get the source code

As the source code for this project is fairly large not all code is available in this post.
To get the full source code and deploy it your self, visit Serverless-Handbook Self Service IoT Certificate management

Why Build a Self-Service API?

Once again I like to revisit why we like to create this self service system? Why not just use a private CA or IoT Core from AWS or a SaaS solution like DigiCert IoT Trust Manager.

For several of the SaaS solutions that exists you pay per certificate. In many of the teams I have been working with, related to projects in IoT, we have been issuing many certificates per day for testing. Several certs per device, per tenant, and so on. Automatic tests has generated certs over and over again. We have discovered that using some form of self signed certificates, with a self service API, made us more cost efficient. It also enabled us to test different scenarios with several intermediate CAs.

From a learning perspective, new engineers that didn't have that much experience with IoT and certificates could test and learn in a safe and good way, without breaking the bank.

So for my teams a self-service API for certificate management allowed:

  • Automation: Devices and servers can request and renew certificates programmatically.
  • Scalability: As our IoT environment grows, the API can handle the increasing demand for certificates.
  • Learning and Testing: Before adopting a managed service, building your own certificate system helps you understand how PKI works.

Architecture overview

Let's start by going back to the architecture and look at the overview for this setup. There are a some new parts introduced this time.

Image showing the architecture overview

First there is a certificate inventory introduced. Information about certificates are stored in this DynamoDB table, allowing for querying for certificates, based on the signing parent. To populate the inventory the Lambda functions responsible for creating certificates will post an event onto an Amazon EventBridge event-bus, that will invoke a StepFunction that populate the inventory. This StepFunction will use the newly released JSONata support.

Looking the creation flow it would look like this, depending on if it's a CA or leaf (server / client) certificate being created, everything below the dotted line is run asynchronously.

Image showing the architecture overview

Last a new Lambda function, responsible for listing and fetching certificates is created.

Update REST API

Now, let's look at the updated API, we'll add three new endpoints, one for creating new device certificates and two for query and fetching.

Endpoint Method Description
/certificates/root POST Create a new Root CA.
/certificates/intermediate POST Create a new Intermediate CA.
/certificates/server POST Create a new server certificate.
/certificates/device POST Create a new device / client certificate.
/certificates GET List and search certificates
/certificates/{certificate} GET Get a single certificate

I decided to use separate paths (e.g., /certificates/root, /certificates/intermediate, /certificates/server, /certificates/device) rather than a single endpoint with a type parameter (e.g., /certificates with type as input) as I feel this aligns better with REST principles and improves the API’s readability and usability.

The /certificates/device is very similar to the other endpoints that create certificates, this will create a random uuid that will be the device ID.

For querying for certificates the /certificates endpoint accept two query parameters, parent which is the base64 encoded parent FQDN, e.g bbq.example.com and limit will restrict how many certs that are returned, a further extension would be to also include lastevaluatedkey parameter, allowing for a good pagination.

Add certificate inventory

To add the certificate inventory we start by extending the service with a DynamoDB table and Index that can be used to query for certificates and EventBridge and StepFuntions for updating the inventory.

DynamoDB table

We'll use the FQDN as partition key and the ParentFQDN as the sort key. In the index we'll use the ParentFQDN as the partition key. This gives us the possibility to query for a single certificate and all certificates signed by a specific CA.

  InventoryTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: !Sub ${ApplicationName}-certificate-inventory
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: FQDN
          AttributeType: S
        - AttributeName: ParentFQDN
          AttributeType: S
      KeySchema:
        - AttributeName: FQDN
          KeyType: HASH
        - AttributeName: ParentFQDN
          KeyType: RANGE
      GlobalSecondaryIndexes: 
        - IndexName: parent-index
          KeySchema:
            - AttributeName: ParentFQDN
              KeyType: HASH
            - AttributeName: FQDN
              KeyType: RANGE
          Projection:
            ProjectionType: ALL
Enter fullscreen mode Exit fullscreen mode

EventBridge + StepFunctions

We prepared for this already in the first part. The event-bus is created by the template with common infrastructure.

To update the inventory we'll use an event driven approach where the Lambda functions will post an event as soon as the certificate is created. This will however create a eventually consistent solution, where a read after write might not get the latest result. As long as we are aware of this, it should not cause us any problems, and the benefits of using an event-driven approach outweighs that. By using event-driven architecture we can extend and decouple logic in the future.

The functions will post an event with this structure:

{
    "Source": "certificates",
    "DetailType": "created",
    "Detail": 
    {
      "FQDN": "domain name",
      "Type": "Root/Intermediate/Server/Client",
      "ParentFQDN": "Parent",
      "ValidUntil": "Valid to date",
    },
}
Enter fullscreen mode Exit fullscreen mode

To create the StepFunction, we append it to CloudFormation template. We'll add an event matching our structure, so it will be invoked every time an certificate is created.

  CertificateCreatedExpress:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: certificate-created-statemachine/statemachine.asl.yaml
      Tracing:
        Enabled: true
      Logging:
        Destinations:
          - CloudWatchLogsLogGroup:
              LogGroupArn: !GetAtt CertificateCreatedStateMachineLogGroup.Arn
        IncludeExecutionData: true
        Level: ALL
      DefinitionSubstitutions:
        EventBridgeBusName:
          Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
        InventoryTable: !Ref InventoryTable
        ApplicationName: !Ref ApplicationName
      Policies:
        - Statement:
            - Effect: Allow
              Action:
                - logs:*
              Resource: "*"
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        CertificateCreatedEvent:
          Type: EventBridgeRule
          Properties:
            EventBusName:
              Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
            Pattern:
              source:
                - certificates
              detail-type:
                - created
      Type: EXPRESS
Enter fullscreen mode Exit fullscreen mode

As of now our StateMachine definition is not that big, but it leaves room for us to extend on it later.

Comment: Certificate service - Store Certificate Info
QueryLanguage: JSONata
StartAt: Debug
States:
  Debug:
    Type: Pass
    Next: Store Certificate Info
    Assign:
      FQDN: "{% $states.input.detail.FQDN %}"
      ParentFQDN: "{% $states.input.detail.ParentFQDN %}"
      Type: "{% $states.input.detail.Type %}"
      ValidUntil: "{% $states.input.detail.ValidUntil %}"
  Store Certificate Info:
    Type: Task
    Resource: arn:aws:states:::dynamodb:putItem
    Arguments:
      TableName: ${InventoryTable}
      Item:
        FQDN:
          S: "{% $FQDN %}"
        ParentFQDN:
          S: "{% $ParentFQDN %}"
        Type:
          S: "{% $Type %}"
        ValidUntil:
          S: "{% $ValidUntil %}"
    End: true
Enter fullscreen mode Exit fullscreen mode

As you might see we use two of the new features recently released for StepFunctions. That is variables and JSONata.

Variables

With variables we can use Assign to create variables that are available in all states in the StepFunctions. So now we can create and assign data in an early state and use it through out. No need to recreate the information in every state. This is a very welcome addition. To demonstrate this I create variables in the very first state that I then use. When we use JSONata variables are access {% $variable-name %}, with JSONPath $variable-name.

JSONata

JSONata is also a new addition and you can set the QueryLanguage to either JSONPath (default) or JSONata to set the query language. You have to select either one of them, there is no possibility to mix and match. In JSONata we use {%%} instead of the traditional $.

Add client cert creation

Creating a device (client) certificate is almost the same as creating a server certificate. With the difference that I don't want to specify the full domain, instead I only specify the FQDN for the signing intermediate certificate and the logic generates a new UUID that is used. E.g resulting in uuid.clients.bbq.example.com.

We add a new Lambda function and add it to our API.

  LambdaGenerateDeviceCertificate:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: Lambda/API/GenerateDeviceCert
      Handler: handler.handler
      Layers:
        - !Ref UtilsLayer
      Policies:
        - S3FullAccessPolicy:
            BucketName: 
              Fn::ImportValue: !Sub "${CommonInfraStackName}:certificate-bucket-name"
        - EventBridgePutEventsPolicy:
            EventBusName:
              Fn::ImportValue: !Sub "${CommonInfraStackName}:event-bus-name"    
      Events:
        CreateDeviceCertApi:
          Type: Api
          Properties:
            Path: /certificates/device
            Method: post
            RestApiId: !Ref GenerateCertificatesApi 
Enter fullscreen mode Exit fullscreen mode

Introduce Lambda Layer

As we now introduce some common utility functions between five different Lambda Functions, I decided to put these in an Lambda Layer. I normally don't use Layers but this time I felt it would be a good approach and it made the code structure a bit easier. To create and use the layer we need to create the Layer version, and set the Lambda functions to use it.

We add the creation of the Layer to the template and update our Lambda functions.

  UtilsLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: UtilsLayer
      ContentUri: Lambda/Layer
      CompatibleRuntimes:
        - python3.12
    Metadata:
      BuildMethod: python3.12
      Description: "Utils code for Lambda functions"

  LambdaGenerateRootCA:
    Type: AWS::Serverless::Function
    Properties:
      ....
      Layers:
        - !Ref UtilsLayer
      .....
Enter fullscreen mode Exit fullscreen mode

List / Get certificates

Finally we add the possibility to list and get certificates.

  LambdaListGetCertificates:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: Lambda/API/ListGetCertificates
      Layers:
        - !Ref UtilsLayer
      Handler: handler.handler
      Environment:
        Variables:
          DYNAMODB_TABLE: !Ref InventoryTable
          DYNAMODB_INDEX: parent-index
      Policies:
        - S3FullAccessPolicy:
            BucketName: 
              Fn::ImportValue: !Sub "${CommonInfraStackName}:certificate-bucket-name"
        - EventBridgePutEventsPolicy:
            EventBusName:
              Fn::ImportValue: !Sub "${CommonInfraStackName}:event-bus-name"
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        ListCertificatesApi:
          Type: Api
          Properties:
            Path: /certificates
            Method: get
            RestApiId: !Ref GenerateCertificatesApi
        GetCertificatesApi:
          Type: Api
          Properties:
            Path: /certificates/{certificate}
            Method: get
            RestApiId: !Ref GenerateCertificatesApi
Enter fullscreen mode Exit fullscreen mode

One major difference between this function and the other four is that this will not be a function with single responsibility. This function will be responsible for both listing certificates and fetching a single certificates. It will handle all the GET methods. This is one of many design approaches you can use when building an API with Lambda functions, single purpose, Lambdalith, read/write separation. I decided to use single purpose functions for several of the write functions, even if there are similarities between them, I felt they was different enough to be single purpose. For the read functionality I decided to put the logic in one function, as the functionality is very similar, getting one certificate or several is just the matter of a list.

This approach has also created a nice Command Query Responsibility Segregation (CQRS).

Conclusion

In this second part we extended our self service API with functionality to create device certificates, we introduced an inventory and the possibility to list and get certificates. Stay tuned for the next part where we will increase the security of the API and the certificate storage, we will also extend the functionality for listing and fetching certificates.

As a reminder!!

This is build for Learning. Use Managed Services for production

While this API is a great learning tool, services like AWS Private CA or Let’s Encrypt are better suited for production.

To get the full source code and deploy it your self, visit Serverless-Handbook Self Service IoT Certificate management

Final Words

Don't forget to follow me on LinkedIn and X for more content, and read rest of my Blogs

As Werner says! Now Go Build!

Top comments (0)