My kids love to write and draw. To reduce the screen time and to improve their writing skills, I encourage them to scribble their thoughts on paper. So far so good. But keeping the hard copies is a bit difficult. I would like to convert these handwritten notes to soft copies in textual format [Foreseeing some Kindle plans, when it becomes a small little bundle :-) ].
Let’s solve this the AWS way.
Solution Description
Let’s build a solution leveraging S3 replication / event notification, a serverless compute layer with lambda function and Textract. The sample architecture looks like this.
Two buckets are created, source bucket and destination bucket. Source bucket will hold the images user is giving as input.
The drawings are uploaded to the source bucket with tags say ‘picture’. S3 same region replication is enabled on the source bucket on the configured tag.
A folder is created say ‘story’ in source bucket to upload the handwritten text. S3 event notification is enabled on the source bucket on prefix ‘/story’, in such a way that any upload to this folder will create an event notification.
A lambda function is set up to receive these event notifications. The lambda function will invoke Textract passing the uploaded image as input. Textract will scan the image and return the identified words in the document to the lambda function.
Lambda function logic will consolidate it into sentences and save it to destination bucket as text.
Event notification is enabled on the destination bucket to invoke SNS to send external emails once the process is completed.
Serverless Features in Spot Light
S3 Replication
Amazon Simple Storage Service (S3) can automatically replicate S3 objects to help you reduce costs and protect data.
Two main replication options are:
Cross-Region Replication (CRR) — copies S3 objects across multiple AWS Regions.
Same-Region Replication (SRR) — copies S3 objects between buckets in different availability zones (AZs), in the same region.
S3 also offers a Replication Time Control (RTC) that guarantees object replication in less than fifteen minutes.
S3 Event Notification
Amazon S3 Event Notifications feature helps you to receive notifications when certain events happen in your S3 bucket.
To enable this, a notification configuration to be added with events details that you want S3 to publish and a destination where you want S3 to send the notifications.
Computation Layer with Lambda
AWS Lambda is the serverless event-driven computation layer that runs your code without provisioning or managing servers
Textract
Amazon Textract is a service that can be used to automatically extracts text, handwriting, and data from documents /images.
It uses advanced machine learning (ML) algorithms to achieve this.
SNS
Amazon Simple Notification Service (Amazon SNS) is a fully managed messaging service. It enables communication between applications with a pub/sub functionality. It also supports application to person communications by sending messages to users at scale via SMS, mobile push, and email.
Let’s Build.
Step 1 : Create Buckets
Navigate to the S3 console and create 2 buckets in the same region, say ‘textractsourcebucket’ and ‘textractdestinationbucket’ with versioning enabled.
Step 2 : Creating Roles
We need 2 roles, one for S3 replication and another for lambda function.
Role 1 : Let the S3 replication feature itself create a new role for this. Skip this for now.
Role 2 : Create a role with Cloud Watch, Textract and S3 access for lambda.
Step 3 : Enabling replication
Navigate to the Management tab of the source bucket and create replication rules.
Create replication rule by providing a name. Enable versioning if you have not enabled it before.
We are filtering the replication items based on tag. The drawings will be uploaded with a tag ‘picture’.
Select the destination bucket created before in Step 1 as destination.
Allow S3 replication feature to create a role for you with required permissions.
Save the replication configuration.
You can choose not to replicate existing objects.
Step 4 : Creating lambda function
The lambda function will receive the event notification from S3 when handwritten note is uploaded. It will call Textract to extract the textual details and consolidate it and save to destination bucket as text file.
Navigate to lambda console and create a new function with run time as Node.js.
Change default execution role as the existing role and select the role created in Step 2 Role 2 for lambda. Copy the contents of index.js from below GitHub repo (after making modification, like region, bucket name etc. if any) to the code section of lambda.
https://github.com/asnakhader/textract
Create a test event to test the function. Sample given below (obtained from Cloud Watch logs when the S3 event notification happens).
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "ap-south-1",
"eventTime": "2022-09-23T04:59:35.441Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "AWS:BBBBBBB"
},
"requestParameters": {
"sourceIPAddress": "00.00.00.174"
},
"responseElements": {
"x-amz-request-id": "XDG0P50ZJ02XX7JR",
"x-amz-id-2": "c+xP2nQ780GjtppxeXERgXK9OJt7ZTLUqR941EE9/y74GhIBoQX5ZRb2leJWD40XpkFFK82HZR/lfT/lTR0n/Atnr26OmFK5wzx3F7npKJc="
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "storyUpload",
"bucket": {
"name": "<your source bucket name>",
"ownerIdentity": {
"principalId": "BBBBBBB"
},
"arn": "arn:aws:s3:::<your source bucket name>"
},
"object": {
"key": "<your object name>",
"size": 1522574,
"eTag": "c5fa617910f9118efa01ddcbd82e1433",
"versionId": "Q9yd0M4L2otyIdURpFOgnZdi46C54y91",
"sequencer": "00632D3D375CC84842"
}
}
}
]
}
Test the function to see if return a success response.
Step 5: Setting up S3 Event Notification
Navigate to the S3 console and select the source bucket.
Create a folder, say ‘images; to upload handwritten notes.
Navigate to the ‘Properties’ tab and create event notifications.
Create event notification with prefix as /images and suffix as .jpg as we the handwritten notes will be uploaded to this folder as .jpg image.
Enable this for all object creation events and set the destination as lamda function created before.
Step 6 : Create SNS Topic
Navigate to the SNS Console and create a topic.
Navigate to the Access Policy tab and attach a policy which allows the S3 bucket to publish messages to SNS.
Create a subscription for this topic with an email end point.
Confirm the subscription by accepting the invitation received in email.
Step 7 : Setting up email notification for process completion
Navigate to the destination bucket and create event notification.
Select destination as the previously created SNS topic and save changes.
Step 8 : Testing
Navigate to S3 Source Bucket and upload a drawing with tag as ‘picture’.
Source Bucket
Upload the handwritten text images in folder ‘/images’.
Navigate to destination bucket and check the details. S3 replication has replicated the uploaded image with tag ‘picture’.
Check the details inside the image folder.
The lambda function with Textract has converted the handwritten notes uploaded as .jpg images into textual format. An email notification is received indicating process completion. Woohoo!
Final Product
Excuse the spelling / grammar mistakes pls. Big shoutout to Textract for reading this :-)
At the end of the day, the whole purpose of technology is to ease human lives and save it from monotonous and repetitive work. Do you agree?
Tail End: Now that the kids have taken the pictures of all their stories and given to me, the ball is in my court to convert it and keep it safe.
Top comments (0)