How to use Glue crawler to add tables automatically

#aws #cloudskills #learning

This document will cover the steps on how to use Glue crawler to extract data from S3 to automatically add tables to the glue DB and run queries on it from Dremio or Athena

Setup Diagram

Steps to follow

Create an S3 bucket and upload the raw data i.e, csv, json files.
Go to AWS Glue Console and Create Glue DB
Go to Tables page and Select Add Tables using crawler on the top right corner

This should land you to the AWS Glue Crawler setup page

Follow below steps to fill in the details

Name - Enter the Crawler name
Add data source
- Data source - Select S3
- Location of S3 data - Select In this account (if that’s the case)
- S3 path - Browse for the S3 bucket which contains the data and don’t forget to add forward slash at the end
- Subsequent crawler runs - Select Crawl all sub-folders
Click Add an S3 data source
Click Next → Configure security settings
Click Create new IAM role and give a name to the role. It will create a new IAM role required by the Glue crawler to extract the data present in the S3 bucket
Next, Set output and scheduling
- Select the Target Database - you can choose default or create a new one
- Crawler schedule - On Demand
Next → Review and Create → Create Crawler
Now, the crawler has been successfully created and you can run the crawler

It will take few minutes to extract the data from S3 bucket and once it is done, you should see the state as Ready

Now, you should be able to see a table added in the glue DB

Go to Dremio → Add the glue catalog as a source
Name - Enter glue catalog name
Region - Select the AWS region
Authentication - AWS Access key

Click Save and run queries on the glue DB from Dremio! or Athena

DEV Community

How to use Glue crawler to add tables automatically

Top comments (0)

Read next

🚀 Amazon Nova: AWS's New Foundation Model for GenAI🤖

How to Define AI Agents with Cloudformation and SAM: A Builder's Guide

Conclusion of My Node.js Journey and a Sneak Peek into My Upcoming AWS Series

I Tried Every Hot Programming Language