DEV Community

Cover image for AWS Lambda: OCR and Text Translation in the AWS Cloud

AWS Lambda: OCR and Text Translation in the AWS Cloud

Extracting text from images and translating it automatically can be a complex process, but with AWS Lambda, Amazon Rekognition, and Amazon Translate, you can create a powerful, serverless workflow that handles both Optical Character Recognition (OCR) and translation seamlessly. Whether for automating document processing, translating visual content, or analyzing multilingual images, this setup offers a scalable solution in the cloud.

Introduction to OCR and AWS Lambda

OCR (Optical Character Recognition) is a technology that identifies and extracts text from images. By combining AWS Lambda, a serverless service that executes code in response to events with Amazon Rekognition for OCR and Amazon Translate for language translation, we can automate text extraction and translation without managing servers.

In this article, you’ll learn:

  • How to use AWS Lambda to trigger Amazon Rekognition for OCR.
  • How to send extracted text to Amazon Translate for translation.
  • How to automate this workflow so the user receives translated text as the final output.

If you’re curious to see this in action, I’ve recorded a video tutorial demonstrating text extraction and translation with AWS. You’ll find a step-by-step example of using Lambda and Rekognition to recognize and translate text from images into any language.

Step 1: Setting Up S3 and Lambda

First, let’s prepare the necessary resources to store images and trigger the OCR and translation process.

  • Create an S3 Bucket – This will be used to store your images. Lambda can automatically respond to new file uploads in S3 by triggering the OCR and translation workflow.
  • Configure AWS Lambda – Create a Lambda function that activates when an image is uploaded to the S3 bucket. Python or Node.js are great options for using AWS SDK.
  • IAM Permissions – Assign the Lambda function a role with permissions to access S3, Amazon Rekognition, and Amazon Translate.

Step 2: Implementing Text Recognition with Amazon Rekognition

Once an image is uploaded to S3, Lambda will use Amazon Rekognition to detect text in the image.

Here’s a sample code in Python to perform OCR and translation:

import boto3

def lambda_handler(event, context):
    s3_client = boto3.client('s3')
    rekognition_client = boto3.client('rekognition')
    translate_client = boto3.client('translate')

    # Image details
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    image_key = event['Records'][0]['s3']['object']['key']

    # OCR with Rekognition
    response = rekognition_client.detect_text(
        Image={'S3Object': {'Bucket': bucket_name, 'Name': image_key}}
    )

    # Extract detected text
    detected_text = " ".join([text['DetectedText'] for text in response['TextDetections'] if text['Type'] == 'LINE'])

    if detected_text:
        # Translate the text
        translated_text = translate_client.translate_text(
            Text=detected_text, 
            SourceLanguageCode="auto", 
            TargetLanguageCode="pl"
        )['TranslatedText']

        print(f"Translation: {translated_text}")

    return {
        'statusCode': 200,
        'body': translated_text
    }

Enter fullscreen mode Exit fullscreen mode

Step 3: Translating Text with Amazon Translate

In this example:

  • Lambda triggers Amazon Rekognition, which scans the image for text and returns it.
  • The text is then sent to Amazon Translate, which translates it into Polish (but it can be any other language).
  • For multilingual applications, Amazon Translate supports automatic detection of the source language with "SourceLanguageCode": "auto", allowing seamless translation between languages.

Additional Features: Changing Image Format, Resizing, and Adding Watermarks

Aside from OCR and translation, you may want to process images by changing their format, resizing, or adding watermarks. This can be done with AWS Lambda using additional libraries like Pillow. I have a video tutorial on advanced image processing, where I cover how to resize images, convert formats, and add watermarks using AWS Lambda. If you’re interested in learning these additional steps, feel free to check it out!

Step 4: Optimizing and Scaling the Process

  • Batch Processing – For multiple images, you can set Lambda to process them in batches or schedule processing for larger workloads.
  • Adding a Layer for Libraries – This is something I mentioned earlier. If you need advanced image processing, you can add a layer to your Lambda function with libraries like Pillow to facilitate resizing, watermarking, or format conversion.
  • Storing Translated Results in S3 – For larger projects, consider storing translated text in S3 for easy retrieval and reference.

Use Cases and Benefits

Combining AWS Lambda, Amazon Rekognition, and Amazon Translate opens up flexible possibilities for OCR and translation, including:

  • Automated processing and translation of documents in multiple languages.
  • Processing visual content such as screenshots or product images.
  • Creating a fully automated translation system for marketing and e-commerce.

Conclusion

With AWS, you can build a scalable OCR and translation system that works without server management. AWS Lambda provides the flexibility to trigger workflows, and Amazon Rekognition and Amazon Translate make it possible to recognize and translate text with high accuracy.

For a full demonstration of OCR and translation, or to learn about advanced image processing, feel free to check out my video tutorials on these topics. Each tutorial walks through practical examples, showing how easy it can be to set up powerful automation in the AWS cloud.

Top comments (0)