Jaira Encio for AWS Community ASEAN

Posted on Jun 11, 2021 • Edited on Nov 7, 2022

Creating an API that runs Selenium via AWS Lambda

#serverless #lambda #selenium

Being an automation tester, my job is to automate everything. As I was running my test script via terminal I realised that I’m the only who can execute the scripts I made. What if someone wants to run it? like the Devs, Project Manager, etc. It would be a tedious task to clone my repo, install libraries, and run the script. So I decided that maybe I can store my test script inside a serverless machine and make it accessible via API request.

I experimented with various AWS resources such as creating my own lambda function, checking features of API gateway, Codepipeline, etc. After several attempts, I was finally able to run my script inside. And then I just researched how to access my lambda via API.

This will result in higher production and time savings. Engineers may focus on vital work because automated testing does not require human interaction. This is like a portable testing device that anyone could execute. With fast test execution, developers get testing reports in an instant, so whenever a failure occurs, they will react to it quickly. Test automation will make it easier to update programs quickly. As a result, automated testing leads to increased team responsiveness, improved user experience, and increased customer satisfaction.

Overview

Create 2 Lambda layers that has selenium and chromedriver libraries
Include created lambda layers in serverless.yml of lambda then deploy

Creating Selenium Lambda Layer

Place libraries in python/lib/python3.6/site-packages/ to include them in a layer.

Download Selenium to layer directory



$ pip3.6 install -t selenium/python/lib/python3.6/site-packages selenium==3.8.0
$ cd selenium
$ zip -r python.zip python/

Once finished, Create lambda layer then upload zip file

1. Go to AWS Console Lambda/Layers
2. Click Create Layer
3. Input the following in the layer configuration



Name: selenium
Description: Selenium layer
Upload zip file created: python.zip
Compatible runtimes: Python 3.6

4. Click Create

Note: You can user whatever version you prefer, you just need to select compatible runtime when uploading your package

Creating Chromedriver Lambda layer

Download chrome driver



$ mkdir -p chromedriver
$ cd chromedriver
$ curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
$ unzip chromedriver.zip
$ rm chromedriver.zip

Download chrome binary



$ curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-41/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
$ unzip headless-chromium.zip
$ rm headless-chromium.zip

Compress driver and binary



$ ls
chromedriver headless-chromium
$ zip -r chromedriver.zip chromedriver headless-chromium

Once finished, Create lambda layer then upload zip file

1. Go to AWS Console Lambda/Layers
2. Click Create Layer
3. Input the following in the layer configuration



Name: chromedriver
Description: chrome driver and binary layer
Upload zip file created: chromedriver.zip
Compatible runtimes: Python 3.6

4. Click Create

Creating Lambda Function

To ensure that your function code has access to libraries included in layers, Lambda runtimes include paths in the '/opt' directory.

File Structure


 shell
── /lambda/            # lambda function
  ├── /handler.py      # source code of lambda function 
  └── /serverless.yaml # serverless config

Code

Copy the code below to /lambda/handler.py



from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def main(event, context):
    options = Options()
    options.binary_location = '/opt/headless-chromium'
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--single-process')
    options.add_argument('--disable-dev-shm-usage')

    driver = webdriver.Chrome('/opt/chromedriver',chrome_options=options)
    driver.get('https://www.google.com/')

    driver.close();
    driver.quit();

    response = {
        "statusCode": 200,
        "body": "Selenium Headless Chrome Initialized"
    }

    return response

Copy the code below to /lambda/serverless.yaml.



service: selenium-lambda

provider:
  name: aws
  runtime: python3.6
  region: ap-southeast-2
  timeout: 900

functions:
  main:
    memorySize: 1000
    handler: handler.main
    events:
      - http:
          path: test
          method: get

    layers:
      - arn:aws:lambda:ap-southeast-2:{}:layer:chromedriver:2
      - arn:aws:lambda:ap-southeast-2:{}:layer:selenium:2

resources:
  Resources:
    ApiGatewayRestApi:
      Properties:
        BinaryMediaTypes:
          - "*/*"

Deploy Lambda Function

Go to /lambda directory



$ sls deploy

Output



Serverless: Stack update finished...
Service Information
service: selenium-lambda
stage: dev
region: ap-southeast-2
stack: selenium-lambda-dev
api keys:
  None
endpoints:
  GET - https://{name}.execute-api.ap-southeast-2.amazonaws.com/dev/test
functions:
  main: selenium-lambda-dev-main

You should get same response as below when accessing API



{
"statusCode": 200,
"body": "Selenium Headless Chrome Initialized"
}

This deployment automatically creates cloudformation stack and s3 bucket.

Deprecation note:

Since python3.6 has been deprecated, you can try using the ff packages compatible with 3.9. note that if this exceeds lambda function zip file, you can try using docker image or uploading to s3 instead.



$ python3.9 -m pip install -t python/lib/python3.9/site-packages selenium==4.5.0
$ cd selenium
$ zip -r python.zip python/

Name: selenium
Description: Selenium layer
Upload zip file created: python.zip
Compatible runtimes: Python 3.9



$ mkdir -p chromedriver
$ cd chromedriver
$ curl -SL https://chromedriver.storage.googleapis.com/107.0.5304.62/chromedriver_linux64.zip > chromedriver.zip
$ unzip chromedriver.zip
$ rm chromedriver.zip



$ curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-57/stable-headless-chromium-amazonlinux-2.zip > headless-chromium.zip
$ unzip headless-chromium.zip
$ rm headless-chromium.zip

References for new releases:
chromedriver release
https://chromedriver.chromium.org/downloads

serverless-chrome release
https://github.com/adieuadieu/serverless-chrome/releases

Checkout Achim's comment on how he used 3.9 using dockerimage
https://dev.to/achimgrolimund/comment/22d99

Jaira Encio Linkedin

Top comments (48)

prenit-wankhede • Jun 7 '22

Thanks a ton brother for simple yet elegant walk through.
I have been trying so many tutorials and ways to get it to work but no luck.

With selenium version, chromedriver version and headless-chrome version as mentioned in the post, finally got it working. Thanks a bunch !

Awolad Hossain • Jan 28 '22

@jairaencio It's working great. But I can't use the selenium-stealth plugin. Getting an error. Message: unknown error: Chrome failed to start: exited abnormally\n (Driver info: .....

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium_stealth import stealth

def lambda_handler(event, context):
    options = Options()
    options.binary_location = '/opt/headless-chromium'    
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    driver = webdriver.Chrome('/opt/chromedriver',chrome_options=options)

    stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

    driver.get('https://quizlet.com/446134722/it-management-flash-cards/')

    driver.close();
    driver.quit();

    response = {
        "statusCode": 200,
        "body": "Selenium Headless Chrome Initialized"
    }

    return response

Jaira Encio • Jan 31 '22

Hi @awolad I think you need to include the stealth library package in your lambda layer. Notice in my tutorial I have 2 different lambda layers for my selenium and chromedriver package. You can create another lambda layer or just simply include it in the 2 layers

Awolad Hossain • Jan 31 '22

@jairaencio Yes, I've added the stealth library package in the selenium lambda layer. There is no import error.

Jaira Encio • Jan 31 '22

Great! Always happy to help :)

Awolad Hossain • Jan 31 '22

@jairaencio Sorry, It's not solved yet. I mean the error is not related to the import the stealth package issue. Because the package is already in my lambda layer. The driver fails to load when the stealth package is used.

Jaira Encio • Jan 31 '22 • Edited

does the error only occur when you add selenium-stealth library? Upon checking I noticed that others are experiencing issue in their local machines just by using stealth. You could try adding options.add_argument("--disable-blink-features=AutomationControlled") . Then try if it works both on your local and lambda.

Awolad Hossain • Jan 31 '22

Yes.

With the selenium-stealth default options like following:

options = Options()
options.binary_location = '/opt/headless-chromium'
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome('/opt/chromedriver', chrome_options=options)

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

I'm getting error: Message: unknown error: Chrome failed to start: exited abnormally\n (Driver info: .....

By using this post options like following:

options = Options()
options.binary_location = '/opt/headless-chromium'
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')
options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome('/opt/chromedriver', chrome_options=options)

stealth(driver,
     languages=["en-US", "en"],
     vendor="Google Inc.",
      platform="Win32",
      webgl_vendor="Intel Inc.",
      renderer="Intel Iris OpenGL Engine",
       fix_hairline=True,
)

I'm getting error: "'WebDriver' object has no attribute 'execute_cdp_cmd'"

Jaira Encio • Jan 31 '22

I'm seeing this article related to "execute_cdp_cmd" error. Apparently they used pip install --pre selenium to be able to execute CDP commands github.com/SeleniumHQ/selenium/iss...

Awolad Hossain • Jan 31 '22

I also tried that but not working. I forgot to mention that. It would be helpful for us if you try with the selenium-stealth package and update this post. Because some websites we can't scrape without the selenium-stealth package. Thanks!

Da Shen • Oct 24 '21

tested working.. good article. note that Python runtime has to be 3.6. It won't work otherwise.

Achim Grolimund • Oct 21 '22 • Edited

Hey @jairaencio
Hello everyone, thanks for the guide. but it seems to me that this method no longer works without a Dockerimage.

I keep getting the error:

{
  "errorMessage": "Message: Service /opt/chromedriver unexpectedly exited. Status code was: 127\n",
  "errorType": "WebDriverException",
  "requestId": "ef0d3b0d-ee3f-4822-ba11-9dd40920680b",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 29, in main\n    driver = webdriver.Chrome(service=s, options=op)\n",
    "  File \"/opt/python/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py\", line 69, in __init__\n    super().__init__(DesiredCapabilities.CHROME['browserName'], \"goog\",\n",
    "  File \"/opt/python/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py\", line 89, in __init__\n    self.service.start()\n",
    "  File \"/opt/python/lib/python3.9/site-packages/selenium/webdriver/common/service.py\", line 98, in start\n    self.assert_process_still_running()\n",
    "  File \"/opt/python/lib/python3.9/site-packages/selenium/webdriver/common/service.py\", line 110, in assert_process_still_running\n    raise WebDriverException(\n"
  ]
}

Python 3.9
Selenium 4.5.0
chromedriver 106.0.5249.61
headless-chromium v1.0.0-57

This is My Makefile to create all the stuff i need. At the end i do upload all what is inside the lambda Folder (Code + 2 Layers)

BOLD := \033[1m
NORMAL := \033[0m
GREEN := \033[1;32m

.DEFAULT_GOAL := help
HELP_TARGET_DEPTH ?= \#
.PHONY: help
help: # Show how to get started & what targets are available
    @printf "This is a list of all the make targets that you can run, e.g. $(BOLD)make dagger$(NORMAL) - or $(BOLD)m dagger$(NORMAL)\n\n"
    @awk -F':+ |$(HELP_TARGET_DEPTH)' '/^[0-9a-zA-Z._%-]+:+.+$(HELP_TARGET_DEPTH).+$$/ { printf "$(GREEN)%-20s\033[0m %s\n", $$1, $$3 }' $(MAKEFILE_LIST) | sort
    @echo


install: clean # Format all files (terraform fmt --recursive .)
    python3.9 -m pip install -t python/lib/python3.9/site-packages selenium==4.5.0 --upgrade
    python3.9 -m pip install -t python/lib/python3.9/site-packages wget --upgrade
    curl -SL https://chromedriver.storage.googleapis.com/106.0.5249.61/chromedriver_linux64.zip > chromedriver.zip
    curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-57/stable-headless-chromium-amazonlinux-2.zip > headless-chromium.zip
    mkdir -p lambda
    zip -9 -r lambda/python.zip python
    unzip chromedriver.zip
    unzip headless-chromium.zip
    zip -9 -r lambda/chromedriver.zip chromedriver headless-chromium
    rm -f chromedriver.zip headless-chromium.zip chromedriver headless-chromium

build: # Create all layers to upload to the Lambda Function
    zip -9 -r lambda/app.zip app.py

clean:
    rm -rf lambda python

And here is a overview of my python script:

import re
import wget

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def main(event, context):
    regex = r"[ \w-]+?(?=\.txt\.gz)"

    s = Service('/opt/chromedriver')
    op = webdriver.ChromeOptions()
    op.binary_location = '/opt/headless-chromium'
    op.add_argument('--headless')
    op.add_argument('--no-sandbox')
    op.add_argument('--disable-dev-shm-usage')
    op.add_argument('--disable-gpu')
    op.add_argument('--disable-dev-tools')
    op.add_argument("--disable-extensions")
    op.add_argument('--no-zygote')
    op.add_argument('--single-process')
    op.add_argument('--enable-logging')
    op.add_argument('--log-level=0')
    op.add_argument("--disable-notifications")
    op.add_argument('--v=99')
    driver = webdriver.Chrome(service=s, options=op)

    driver.get(
        "https://xxxxxx")

    (.....)

    driver.close()

Is there another way to run it directly in a lambda without a docker image?

Best Regards

Jaira Encio • Oct 25 '22

Great take on this achin💪 I still havent gotten back on this but your method of using dockerimage is very useful for everyone as well 👏🏽

Rocco • Feb 6 '23

I have the configuration propose on the edited version of the article but I get the same error. I created all the version on an amazon linux using the python3.9 command like in the example but I keep getting the same error.

Did you find any solution?

Achim Grolimund • Feb 6 '23

Yes it needs an special tool inside the docker image. I wil post it here today as soon as im on my computer

Nadia-Ou • Aug 9 '23

did you find any solution, please?

rajans163 • Dec 13 '22

Hi Dear...were you able to get resolution of this issue. I am getting the same issue in my lambda.
Please share the resolution as I am completely stuck.

quibski • Jun 11 '21

Surely many Devs and QA will benefit from this. Hopefully a demo can be made/shown

tchua • Jun 16 '21

+1 to this, a demo would be great!

Jaira Encio • Jun 16 '21

uhm hahaha

Christine John • Jun 23 '22

I'm getting the following error -
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
(Driver info: chromedriver=2.37.544315(730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Linux 4.14.255-276-224.499.amzn2.x86_64 x86_64)

Could someone help me please? :(

awspipe • Jul 19 '22

Hi I am new to AWS lambda, so kindly apologize for any obvious questions...
I was able to create the 2 layers mentioned by you, for python and chromium.
But I have no idea how can I run serverless.yaml...
You've mentioned "sls deploy", but I don't have linux to run this.

Any other alternative to run this? Thanks

Jaira Encio • Aug 5 '22

I'm not a windows user but here is what I found codegrepper.com/code-examples/shel...

Christine John • Jun 23 '22

@jairaencio

Jaira Encio • Jul 6 '22

Hi, this error would likely happen if the location for your chromedriver is incorrect. (lambda layer)

sebifc • Jul 18 '22

Hi! I tried to run it in a Lambda function and the execution keeps running until it get the 600 seconds timeout.
I had downloaded the chromium 93 driver and the headless chrome 93 as well.

I'm using the following code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium_stealth import stealth

def lambda_handler(event, context):
    options = Options()
    options.binary_location = '/opt/headless-chromium'
    options.add_argument("start-maximized")
    #options.add_experimental_option("excludeSwitches", ["enable-automation"])
    #options.add_experimental_option('useAutomationExtension', False)
    #options.add_argument("--disable-blink-features=AutomationControlled")
    #options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--single-process')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-notifications')
    options.add_argument("--enable-javascript")

    driver = webdriver.Chrome('/opt/chromedriver',chrome_options=options)

    stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

    driver.get('https://trends.google.com/trends/trendingsearches/daily?geo=US')

    driver.close();
    driver.quit();

    response = {
        "statusCode": 200,
        "body": "Selenium Headless Chrome Initialized"
    }

    return response

Jaira Encio • Aug 5 '22

I'm guessting that the timeout issue is caused by deployment package size. You would have to use container images solve it.

chrisjeriel • Jun 11 '21

Great work! A well-thought-out article, straightforward and concise. Looking forward more advanced implementations.

cesarcastmore • Jun 19 '23 • Edited

Hello! .

I just have one question. When I was uploading my image to Lambda, I noticed that it required a lot of memory, and I think that could significantly increase costs. What are the differences between layers and Docker in Lambda?

I was following this documentation and managed to do it for version 3.7, but I was unsuccessful with version 3.9. Some of the comments below suggest using Docker, but I realized that using Docker requires increasing the Lambda's memory. Do you know of any alternatives to using Docker that won't consume a lot of memory and increase costs

Jaira Encio • Jul 22 '23

As of now there is nothing else other than docker to work with this. I dont recommend setting up an instance as well. We can only hope that aws improves their lambda pricing and memory.

rajans163 • Dec 14 '22

@jairaencio

I tried the same stuff for Python3.8(same chrome driver as you did for 3.9) but got the below error. Please help

START RequestId: ce98c274-1b51-46a5-968e-cdabe1e08a2a Version: $LATEST
[ERROR] WebDriverException: Message: Service /opt/chromedriver unexpectedly exited. Status code was: 127

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 13, in lambda_handler
    driver = webdriver.Chrome('/opt/chromedriver')
  File "/opt/python/selenium/webdriver/chrome/webdriver.py", line 69, in init
    super().init(DesiredCapabilities.CHROME['browserName'], "goog",
  File "/opt/python/selenium/webdriver/chromium/webdriver.py", line 89, in init
    self.service.start()
  File "/opt/python/selenium/webdriver/common/service.py", line 98, in start
    self.assert_process_still_running()
  File "/opt/python/selenium/webdriver/common/service.py", line 110, in assert_process_still_running
    raise WebDriverException(END RequestId: ce98c274-1b51-46a5-968e-cdabe1e08a2a
REPORT RequestId: ce98c274-1b51-46a5-968e-cdabe1e08a2a Duration: 610.08 ms Billed Duration: 611 ms Memory Size: 128 MB Max Memory Used: 47 MB Init Duration: 253.40 ms

Jaira Encio • Dec 25 '22

dev.to/achimgrolimund/comment/22d99 hi, you might want to follow this using dockerimage

View full discussion (48 comments)

DEV Community

Creating an API that runs Selenium via AWS Lambda

Overview

Creating Selenium Lambda Layer

Creating Chromedriver Lambda layer

Creating Lambda Function

File Structure

Code

Deploy Lambda Function

Output

Deprecation note:

Top comments (48)

Read next

Top 18 Cheapest AWS Services

The Hidden Costs of Poor Code Quality: Why Testing Matters

Data API for Amazon Aurora Serverless v2 with AWS SDK for Java - Part 10 Aurora Serverless v2 Data API meets DevOps Guru or not?

Hosting a Next.js (App Router) app on Amazon S3