DEV Community

Toshipy
Toshipy

Posted on

Simple search and library application with Next.js x OpenSearch x Lambda(Python)

Background

When I have been in charge of implementing a search function in my work, I have been filtering and displaying data retrieved from the back-end side on the front-end side.

However, I was concerned about performance degradation as the amount of data increased, and I also thought it would be difficult to handle complex search requirements.

I was just about to learn about serverless architecture as well, so after doing some research, I found an introductory example on Youtube that was implemented using AWS serverless architecture, so I decided to try it myself.

Output

Technology Stack

  • Frontend: TypeScript, Next.js(App router v15)
  • Backend: Python
  • Infrastructure: Lambda, API Gateway, DynamoDB, OpenSearch
  • Authentication: Auth0(Omitted in this article)
  • Containers: Docker (omitted in this article)
  • Local environment construction: Cloudflare tunnel

Architecture

Front End Layer:.

  • User access to the application through a browser
  • Implemented with UI component using Next.js
  • Authentication server using Auth0 handles user authentication

Back-end layer:

  • FastAPI (Python-based) API Gateway installed
  • Process requests from the front end with Lambda

Data Storage Layer:

  • Use DynamoDB as main data storage
  • Employs OpenSearch as the search engine
  • Uses DynamoDB streams to handle data synchronization and updates

Backend

Build OpenSearch in a local environment (Docker)

At first, we built OpenSearch on Docker and prepared an environment that can be accessed from Lambda using Cloudflare tunnel. I verified whether I could confirm communication with OpenSearch in the local environment.

Cloudflare tunnel is a service of Cloudflare that enables secure publication to the Internet and provides an encrypted tunnel that does not require a public IP address or open ports.

The image is the following docker-compose.yml for the environment, building OpenSearch and OpenSearch Dashboard, and making sure it can be accessed from Lambda via Cloudflare tunnel.

version: '3'
services:::services
  opensearch-node1:::opensearchproject/opensearch:2.18.18.18.
    image: opensearchproject/opensearch:2.18.0
    container_name: opensearch-node1
    environment: opensearch-node1
      - discovery.type=single-node
      - node.name=opensearch-node1
      - plugins.security.disabled=true
      - "_JAVA_OPTIONS=-XX:UseSVE=0"
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD}
      - http.host=0.0.0.0
      - transport.host=127.0.0.1
    ulimits:: ${OPENSEARCH_INITIAL_ADMIN_PASSWORD
      memlock: -1
        soft: -1
        hard: -1
      nofile: -1
        soft: 65536
        hard: 65536
    volumes: opensearch-data1:/usr/share/opensearch/data
      - opensearch-data1:/usr/share/opensearch/data
    ports: opensearch-data1:/usr/share/opensearch/data
      - 9200:9200
      - 9600:9600
    networks: opensearch-net
      - opensearch-net
  opensearch-dashboards: opensearch-project/opensearch-dashboards:2.18.0
    image: opensearchproject/opensearch-dashboards:2.18.0
    container_name: opensearch-dashboards
    ports: opensearch-net
      - 5601:5601
    expose: '5601'
      - '5601'
    environment:.
      # OPENSEARCH_HOSTS: '["https://opensearch-node1:9200", "https://opensearch-node2:9200"]'
      - OPENSEARCH_HOSTS=http://opensearch-node1:9200
      - DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
    networks:.
      - opensearch-net

volumes: opensearch-net
  opensearch-data1:

networks: opensearch-net
  opensearch-net:
Enter fullscreen mode Exit fullscreen mode

Building Infrastructure with Serverless Framework

The overall infrastructure is structured such that the search and registration APIs are defined in API Gateway, processing is done in Lambda, DynamoDB is used for storage, and OpenSearch is used as the search engine.

service: search-api

provider: aws
  name: aws
  runtime: python3.9
  region: ap-northeast-1
  environment: ${file(env.yml)}
  iam: ${file(env.yml)}
    role:: ${file(env.yml)}
      statements: Allow
        - Effect: Allow
          Action: dynamodb:*
          Resource: !GetAtt BooksTable.Arn
        - Effect: Allow
          Action: es:ESHttp*
          Resource: !
            - Sub arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/${self:provider.environment.OPENSEARCH_DOMAIN_NAME}/*

package: true
  individually: true
  patterns: true
    - '! **'
    - 'requirements.txt'

functions:
  api:: src.main.handler
    handler: src.main.handler
    package: include: src.main.handler
      include: src/**/*/*.py
        - src/**/*.py
    events:: httpApi
      - httpApi:: httpApi
          path: /docs
          method: get
      - httpApi: path: /openapi.json
          path: /openapi.json
          method: get
      - httpApi: path: /openapi.json
          path: /search
          method: get
      - httpApi: path: /openapi/json method: get
          path: /books
          method: ANY
    layers: PythonRequirementsLambdaLayer
      - Ref: PythonRequirementsLambdaLayer

  syncToOpensearch: handler: src.dynamodb_stream.handler
    handler: src.dynamodb_stream.handler
    package: src.dynamodb_stream.handler
      include: src/**/*/*.py
        - src/**/*.py
    events:: stream
      - stream: stream
          type: dynamodb
          arn: !GetAtt BooksTable.StreamArn
    layers: pythonRequirementsLambdaLayer
      - Ref: PythonRequirementsLambdaLayer

resources: PythonRequirementsLambdaLayer
  Resources: PythonRequirementsLambdaLayer
    BooksTable::Table
      Type: AWS::DynamoDB::Table
      Properties: TableName: Books
        TableName: Books
        AttributeDefinitions: AttributeName: id
          - AttributeName: id
            AttributeType: S
        KeySchema: Books
          - AttributeName: id
            KeyType: HASH
        BillingMode: PAY_PER_REQUEST
        StreamSpecification: S
          StreamViewType: NEW_AND_OLD_IMAGES

    OpenSearchDomain:.
      Type: AWS::OpenSearchService::Domain
      Properties: ${self:OpenSearchService::Domain
        DomainName: ${self:provider.environment.OPENSEARCH_DOMAIN_NAME}
        ClusterConfig:: ${self:provider.environment.OPENSEARCH_DOMAIN_NAME}
          InstanceCount: 1
          InstanceType: t3.small.search
        EBSOptions: true
          EBSEnabled: true
          VolumeSize: 10
          VolumeType: gp2
        EncryptionAtRestOptions: true
          Enabled: true
        NodeToNodeEncryptionOptions: Enabled: true
          Enabled: true
        DomainEndpointOptions: true
          EnforceHTTPS: true
        AdvancedSecurityOptions: Enabled: true
          Enabled: true
          InternalUserDatabaseEnabled: true
          MasterUserOptions: Enabled: true
            MasterUserName: ${self:provider.environment.OPENSEARCH_USERNAME}
            MasterUserPassword: ${self:provider.environment.OPENSEARCH_PASSWORD}

plugins: ${self:provider environment.OPENSEARCH_USERNAME}
  - serverless-python-requirements

custom:: ${self:provider environment.
  pythonRequirements: true
    layer: true
Enter fullscreen mode Exit fullscreen mode

Features.

  • Python 3.9 runtime
  • Dependencies managed by serverless-python-requirements plugin
  • OpenSearch in single node configuration (t3.small)
  • Uses FastAPI framework (from the presence of /docs endpoint) to provide search and CRUD operations on book data
  • Enables DynamoDB streams to synchronize DyonamoDB changes to OpenSearch, Lambda is executed
  • syncToOpensearch: Function to synchronize data from DynamoDB to OpenSearch
  • DynamoDB table (BooksTable) to store and search book data

Search API

We wanted to test the API using SwaggerUI during development, so we used Python's FastAPI to be able to automatically generate OpenAPI (Swagger) documents.

Feature.

  • Both title and body text are searched
  • Enforces IAM-based authentication and HTTPS communication in the connection process with OpenSearch clients.
  • Separate use of DynamoDB and OpenSearch (separation of writing and searching) **Description of registration API is omitted.
  • hits → get the number of hits (result["hits"]["total"]["value"])
  • results → convert the search results list to an object of type SearchResponse.
table = get_dynamodb_table()

app = FastAPI(
  title="Search API", description="Search API for OpenSearch", description="Search API for OpenSearch",
  description="Search API for OpenSearch", version="1.0.0", version="1.0.0
  version="1.0.0")

@app.get("/books",response_model=SearchResult, summary="Search for a character", description="Search for a character by name")
async def search(keyword: str=None):
  try:: client = get_opensearch_search_result
    client = get_opensearch_client()
    query = build_opensearch_query(keyword)
    result = client.search(index="books", body=query)
    hits = result["hits"]["total"]["value"]]
    results = [
      SearchResponse(
        id=str(hit['_source']['id']),.
        title=hit['_source']['title'],
        story=hit['_source']['story'],
        attributes=hit['_source']['attributes'],
        created_at=hit['_source']['created_at'], attributes=hit['_source']['attributes'],
        updated_at=hit['_source']['updated_at'].
      )
      for hit in result["hits"]["hits"]["hits"]
    ]

    return SearchResult(hits=hits, results=results)
  except Exception as e:.
    raise HTTPException(status_code=500, detail=str(e))

handler = Mangum(app)
Enter fullscreen mode Exit fullscreen mode

For schema and utils, see

from pydantic import BaseModel
from typing import List

class SearchResponse(BaseModel):
    id: str
    title: str
    story: str
    attributes: List[str]: str
    created_at: str
    updated_at: str

class SearchResult(BaseModel): class SearchResult(BaseModel)
    hits: int
    results: List[SearchResponse].

class CreateBook(BaseModel): hits: int
    title: str

class Book(BaseModel):
    id: str
    title: str
    story: str
    attributes: List[str]: str
    created_at: str
    updated_at: str

Enter fullscreen mode Exit fullscreen mode
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import os
import boto3

def get_opensearch_client():
    endpoint = os.environ.get('OPENSEARCH_ENDPOINT', '')
    host = endpoint.replace('https://', '').replace('http://', '')

    credentials = boto3.Session().get_credentials()
    region = os.environ.get('AWS_REGION', 'ap-northeast-1')

    auth = AWS4Auth(
        credentials.access_key,
        credentials.secret_key,
        region,
        'es',
        session_token=credentials.token
    )

    return OpenSearch(
        hosts=[{'host': host, 'port': 443}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=False,
        connection_class=RequestsHttpConnection
    )

def get_dynamodb_table():
  return boto3.resource('dynamodb').Table('Books')

OPENSEARCH_INDEX = 'books'

OPENSEARCH_MAPPING = {
  "mappings": {
    "properties": {
      "id": {"type": "keyword"},
      "title": {"type": "text"},
      "story": {"type": "text"},
      "attributes": {"type": "keyword"},
      "created_at": {"type": "date"},
      "updated_at": {"type": "date"}
    }
  }
}

def build_opensearch_query(keyword):
  return {
    "query": {
      "multi_match": {
        "query": keyword,
        "fields": ["title", "story"]
      }
    }
  }
Enter fullscreen mode Exit fullscreen mode

Frontend

Separation of Interest between ServerActions and Services

The following separation of concerns between ServerActions and Services makes the code easier to read and clarifies whether an error occurs in the schema validation or in the request processing stage or in the response when a request is thrown. When a request is thrown, it becomes clear whether the error occurs in the schema validation or in the request processing stage or in the response.

ServerActions(next-safe-action):

  • Schema validation of input values with zod
  • Calling interface from client
  • Type safety
  • Error handling (application-level error handling)

Service:

  • Allows you to focus on business logic implementation
  • API call implementation
  • Response handling
  • Error handling (error handling related to API communication and business logic)
'use server'.

import { actionClient } from '@/lib/actions/safe-action'
import { searchBooks } from '@/lib/services/books/search-books'
import { searchQuerySchema } from '@/lib/zod/schemas/books'

export const searchBooksAction = actionClient
  .schema(searchQuerySchema)
  .action(async input => {
    const books = await searchBooks({
      params: { keyword: input.parsedInput.keyword }
    })
    return books
  })

Enter fullscreen mode Exit fullscreen mode
import 'server-only' from '@/lib/services/utils' import { path, handleFailed, handleSucceed
import { path, handleFailed, handleSucceed } from '@/lib/services/utils'
import { SearchBooksRequest, SearchBooksResponse } from '@/lib/types/books'

export const searchBooks = async ({
  params: { keyword }
}: SearchBooksRequest): Promise<SearchBooksResponse> => {
  const url = path(`/books?keyword=${encodeURIComponent(keyword)}`)
  return fetch(url, {
    headers: {
      'Content-Type': 'application/json'
    }, { headers: { 'Content-Type': 'application/json'
    method: 'GET', { cache: 'no-store'
    cache: 'no-store'
  })
    .then(handleSucceed)
    .catch(handleFailed)
}

Enter fullscreen mode Exit fullscreen mode

shadcn-ui provides a modern UI component.

We used the command component in this implementation.
https://ui.shadcn.com/docs/components/command

Using shadcn/ui, you can save implementation labor (such as drop-down and focus management, which are complex implementations specific to search UI) and concentrate on the business logic of the search function.
As a result, it is recommended as it can significantly reduce the time required to implement the UI.

Fast Linter & Formatter by Biome

Since we are using TypeScript this time, we used Biome instead of ESLint or Prettier.

Biome has the following features.

  • Fast code linter & formatter written in Rust
  • Combines the functionality of ESLint and Prettier in one tool
  • Designed for performance and runs fast (very fast)
  • Easy to set up so it can be deployed quickly (with VSCode, all you need to do is install the Biome extension and customize biome.json)

For example, just write the following Makefile as follows, and you can check the type, linter, and format with make check on the terminal.
Also, with make format, formatting will be executed quickly according to the rules of biome that you customized yourself.



biome-check: .
npx biome check . /src

biome-write: npx biome check --write .
npx biome check --write . /src

tsc-check: npx tsc --noEmit
npx tsc --noEmit

check: npx tsc --noEmit
make biome-check
make tsc-check

format: make biome-write
make biome-write


https://biomejs.dev/ja/formatter/

## Finally
If I have a chance in the future, I would like to introduce Japanese morphological analysis tools such as `Kuromoji`and `Sudachi` to improve the accuracy of searching Japanese texts.

Since there are a lot of contents, I think some parts may be hard to understand or half-omitted, but I wrote this in the hope that it will be useful to someone! 🙇
Description
Enter fullscreen mode Exit fullscreen mode

Top comments (0)