The cost of self-hosted LLM model in AWS

#llm #cloudcomputing #cloudbudget #aws

Introduction

There are numerous reasons why you’d like to run an LLM model locally and isolated from the internet instead of using the public OpenAI, Meta or Deepseek apis.

For me the most important are the following:

Data privacy
- Some industries (healthcare, finance, legal) require sensitive or proprietary data to remain on-premises or within specific geographic regions
- Stringent regulations (e.g., HIPAA, GDPR) by avoiding data transfers to external third-party services.
Security
- The content, generated or processed, must remain confidential. A local solution prevents sending queries to an external API
- You have end-to-end control (network, physical access, encryption at rest/in transit) when models are self-hosted.

Everything that includes PII of your clients or the business confidential information should never be uploaded to the public services.

In order to meet these requirements, while still being able to use LLM to boost performance of your organization, the local setup in your cloud account can be the golden bullet.

What is the financial side of this setup?

I prepared an approximate forecast to run the LLama models v3.2 based on requirements in AWS Cloud.

In my calculations I covered the following 2 cases:

The LLM should be online only during the working hours (40h/week)
LLM should be available 24/7
No saving plans, reserved instances, upfront payment included.

Llama Name	Possible EC2 Instance	Instance Details	Monthly Price (40 hrs/week)	Monthly Price (168 hrs/week)
Llama 3.2 1B Instruct	g4dn.xlarge	16GB RAM 4 vCPUs 1 GPU (NVIDIA T4)	$91.42	$383.98
Llama 3.2 3B Instruct	g4dn.2xlarge	32GB RAM 8 vCPUs 1 GPU (NVIDIA T4)	$130.70	$548.96
Llama 3.2 11B Vision	g5.8xlarge	128GB RAM 32 vCPUs 1 GPU (24GB Memory)	$429.33	$1,803.04
Llama 3.2 90B Vision	g5.48xlarge	768GB RAM 192 vCPUs 1 GPU (192GB Memory)	$2,834.85	$11,906.24

Notes on the Table

Possible EC2 Instance was selected based on the LLama models v3.2 requirements
- For smaller Instruct models (1B, 3B), a single g4dn or g5 instance with an NVIDIA T4 should be enough.
- For 11B Vision, the g5.8xlarge meets the minimum 22 GB VRAM requirement (A10G has 24 GB VRAM).
- For 90B Vision, you typically need multiple high-end GPUs. The g5.48xlarge offers 8× A100 GPUs (40 GB each = 320 GB total VRAM) plus sufficient CPU and RAM.
Monthly Price Calculated from approximate On-Demand hourly rates in us-west-2 (Oregon). Prices shown are for:
- 160 hours/month (40 hrs/week)
- 720 hours/month (24×7 usage). Actual AWS rates may vary slightly by region and can change over time.
Storage The selected instances typically come with local NVMe SSD volumes. In production, you’ll often attach an EBS volume to meet or exceed the required disk space. EBS costs are not included in the prices above.

Here is a link to the pricing calculator. You can use it as a baseline in your cost forecasts.

Optimization Options

Reserved Instances or Savings Plans can drastically reduce hourly rates.
Spot Instances offer lower prices but can be interrupted.
For large models, you might also explore distributed training/inference techniques to scale across multiple smaller GPUs.

Always confirm instance pricing using the official AWS Pricing Calculator or up-to-date AWS documentation for G4 instaces and G5 instances.

DEV Community

The cost of self-hosted LLM model in AWS

Introduction

What is the financial side of this setup?

Notes on the Table

Optimization Options

Top comments (0)

Read next

Creating APIS with AWS Controllers for Kubernetes (ACK) and Kube Resource Orchestrator(KRO) using KCL.

🚀 Debugging AWS CloudWatch Logs with DevOps-GPT: My Journey & Lessons Learned 🚀

Deep Analysis of Image and Video Using Amazon Rekognition

How to Learn DevOps in 2025: A Comprehensive Guide