Deploying Alibaba's Qwen-2.5 model on AWS using Amazon SageMaker involves several steps, including preparing the environment, downloading and packaging the model, creating a custom container (if necessary), and deploying it to an endpoint. Below is a step-by-step guide for deploying Qwen-2.5 on AWS SageMaker.
Prerequisites:
- AWS Account: You need an active AWS account with permissions to use SageMaker.
- SageMaker Studio or Notebook Instance: This will be your development environment where you can prepare and deploy the model.
- Docker: If you need to create a custom container, Docker will be required locally.
- Alibaba Model Repository Access: Ensure that you have access to the Qwen-2.5 model weights and configuration files from Alibaba’s ModelScope or Hugging Face repository.
Step 1: Set Up Your SageMaker Environment
-
Launch SageMaker Studio:
- Go to the AWS Management Console.
- Navigate to Amazon SageMaker > SageMaker Studio.
- Create a new domain or use an existing one.
- Launch a Jupyter notebook instance within SageMaker Studio.
Install Required Libraries:
Open a terminal in SageMaker Studio or your notebook instance and install the necessary libraries:
pip install boto3 sagemaker transformers torch
Step 2: Download the Qwen-2.5 Model
You can download the Qwen-2.5 model from Alibaba’s ModelScope or Hugging Face repository. For this example, we’ll assume you are using Hugging Face.
-
Download the Model Locally:
Use the
transformers
library to download the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen-2.5" # Replace with the actual model name if different
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Save the model and tokenizer locally
model.save_pretrained("./qwen-2.5")
tokenizer.save_pretrained("./qwen-2.5")
-
Package the Model:
After downloading the model, package it into a
.tar.gz
file so that it can be uploaded to S3.
tar -czvf qwen-2.5.tar.gz ./qwen-2.5
- Upload the Model to S3: Upload the packaged model to an S3 bucket:
import boto3
s3 = boto3.client('s3')
s3.upload_file("qwen-2.5.tar.gz", "your-s3-bucket-name", "qwen-2.5/qwen-2.5.tar.gz")
Step 3: Create a Custom Inference Container (Optional)
If you want to use a pre-built container from AWS, you can skip this step. However, if you need to customize the inference logic, you may need to create a custom Docker container.
-
Create a Dockerfile:
Create a
Dockerfile
that installs the necessary dependencies and sets up the inference script.
FROM python:3.8
# Install dependencies
RUN pip install --upgrade pip
RUN pip install transformers torch boto3
# Copy the inference script
COPY inference.py /opt/ml/code/inference.py
# Set the entry point
ENV SAGEMAKER_PROGRAM inference.py
-
Create the Inference Script:
Create an
inference.py
file that handles loading the model and performing inference.
import os
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
def model_fn(model_dir):
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir)
return {"model": model, "tokenizer": tokenizer}
# Handle incoming requests
def input_fn(request_body, request_content_type):
if request_content_type == 'application/json':
input_data = json.loads(request_body)
return input_data['text']
else:
raise ValueError(f"Unsupported content type: {request_content_type}")
# Perform inference
def predict_fn(input_data, model_dict):
model = model_dict["model"]
tokenizer = model_dict["tokenizer"]
inputs = tokenizer(input_data, return_tensors="pt")
outputs = model.generate(**inputs)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Return the response
def output_fn(prediction, response_content_type):
return json.dumps({"generated_text": prediction})
- Build and Push the Docker Image: Build the Docker image and push it to Amazon Elastic Container Registry (ECR).
# Build the Docker image
docker build -t qwen-2.5-inference .
# Tag the image for ECR
docker tag qwen-2.5-inference:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/qwen-2.5-inference:latest
# Push the image to ECR
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/qwen-2.5-inference:latest
Step 4: Deploy the Model on SageMaker
- Create a SageMaker Model: Use the SageMaker Python SDK to create a model object. If you created a custom container, specify the ECR image URI.
import sagemaker
from sagemaker import Model
role = "arn:aws:iam::<your-account-id>:role/<your-sagemaker-role>"
model_data = "s3://your-s3-bucket-name/qwen-2.5/qwen-2.5.tar.gz"
image_uri = "<aws_account_id>.dkr.ecr.<region>.amazonaws.com/qwen-2.5-inference:latest"
model = Model(
image_uri=image_uri,
model_data=model_data,
role=role,
name="qwen-2.5-model"
)
- Deploy the Model to an Endpoint: Deploy the model to a SageMaker endpoint.
predictor = model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large'
)
Step 5: Test the Endpoint
Once the endpoint is deployed, you can test it by sending inference requests.
import json
# Test the endpoint
data = {"text": "What is the capital of France?"}
response = predictor.predict(json.dumps(data))
print(response)
Step 6: Clean Up
To avoid unnecessary charges, delete the endpoint and any associated resources when you're done.
predictor.delete_endpoint()
Conclusion
You have successfully deployed Alibaba's Qwen-2.5 model on AWS using Amazon SageMaker. You can now use the SageMaker endpoint to serve real-time inference requests. Depending on your use case, you can scale the deployment by adjusting the instance type and count.
Top comments (0)