DEV Community

Cover image for The cost of self-hosted LLM model in AWS
Yaroslav Yarmoshyk
Yaroslav Yarmoshyk

Posted on

The cost of self-hosted LLM model in AWS

Introduction

There are numerous reasons why you’d like to run an LLM model locally and isolated from the internet instead of using the public OpenAI, Meta or Deepseek apis.

For me the most important are the following:

  1. Data privacy
    • Some industries (healthcare, finance, legal) require sensitive or proprietary data to remain on-premises or within specific geographic regions
    • Stringent regulations (e.g., HIPAA, GDPR) by avoiding data transfers to external third-party services.
  2. Security
    • The content, generated or processed, must remain confidential. A local solution prevents sending queries to an external API
    • You have end-to-end control (network, physical access, encryption at rest/in transit) when models are self-hosted.

Everything that includes PII of your clients or the business confidential information should never be uploaded to the public services.

In order to meet these requirements, while still being able to use LLM to boost performance of your organization, the local setup in your cloud account can be the golden bullet.

llama in the cloud

What is the financial side of this setup?

I prepared an approximate forecast to run the LLama models v3.2 based on requirements in AWS Cloud.

In my calculations I covered the following 2 cases:

  1. The LLM should be online only during the working hours (40h/week)
  2. LLM should be available 24/7
  3. No saving plans, reserved instances, upfront payment included.
Llama Name Possible EC2 Instance Instance Details Monthly Price (40 hrs/week) Monthly Price (168 hrs/week)
Llama 3.2 1B Instruct g4dn.xlarge 16GB RAM
4 vCPUs
1 GPU (NVIDIA T4)
$91.42 $383.98
Llama 3.2 3B Instruct g4dn.2xlarge 32GB RAM
8 vCPUs
1 GPU (NVIDIA T4)
$130.70 $548.96
Llama 3.2 11B Vision g5.8xlarge 128GB RAM
32 vCPUs
1 GPU (24GB Memory)
$429.33 $1,803.04
Llama 3.2 90B Vision g5.48xlarge 768GB RAM
192 vCPUs
1 GPU (192GB Memory)
$2,834.85 $11,906.24

Notes on the Table

  1. Possible EC2 Instance was selected based on the LLama models v3.2 requirements
    • For smaller Instruct models (1B, 3B), a single g4dn or g5 instance with an NVIDIA T4 should be enough.
    • For 11B Vision, the g5.8xlarge meets the minimum 22 GB VRAM requirement (A10G has 24 GB VRAM).
    • For 90B Vision, you typically need multiple high-end GPUs. The g5.48xlarge offers 8× A100 GPUs (40 GB each = 320 GB total VRAM) plus sufficient CPU and RAM.
  2. Monthly Price Calculated from approximate On-Demand hourly rates in us-west-2 (Oregon). Prices shown are for:
    • 160 hours/month (40 hrs/week)
    • 720 hours/month (24×7 usage). Actual AWS rates may vary slightly by region and can change over time.
  3. Storage The selected instances typically come with local NVMe SSD volumes. In production, you’ll often attach an EBS volume to meet or exceed the required disk space. EBS costs are not included in the prices above.

Here is a link to the pricing calculator. You can use it as a baseline in your cost forecasts.

Optimization Options

  1. Reserved Instances or Savings Plans can drastically reduce hourly rates.
  2. Spot Instances offer lower prices but can be interrupted.
  3. For large models, you might also explore distributed training/inference techniques to scale across multiple smaller GPUs.

Always confirm instance pricing using the official AWS Pricing Calculator or up-to-date AWS documentation for G4 instaces and G5 instances.

Top comments (0)