New implementation on watsonx.ai: automate RAG pipeline development & deployment with a SDK

#llm #rag #autoai #watsonx
Introduction

A new feature added and announced on watsonx.ai platform; using AutoAI to automate and accelerate the search for an optimized, production-quality, Retrieval-augmented generation (RAG) pattern based on users’ data and use-case.
What does this feature bring to users

This feature takes the complexity out of choosing which LLM, document chunking techniques, and retrieval methods work best for your RAG use-case, as you can read the full explanation from this post of Armand Ruiz, VP of AI at IBM.
Sample example of implementation.
Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).
Install and import the required modules and dependencies
!pip install -U 'ibm-watsonx-ai[rag]>=1.2.4' | tail -n 1
!pip install -U "langchain_community>=0.3,<0.4" | tail -n 1
Defining the watsonx.ai credentials
This cell defines the credentials required to work with the watsonx.ai Runtime service.

Action: Provide the IBM Cloud user API key. For details, see documentation.

import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)
Defining the project id
The foundation model requires a project id that provides the context for the call. We will try to obtain the id directly from the project in which this notebook runs. If this fails, you'll have to provide the project id.

import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")
Create an instance of APIClient with authentication details.

from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, project_id=project_id)

RAG Optimizer definition
Defining a connection to training data
Upload training data to a COS bucket and then define a connection to this file. This example uses the Base description from the ibm_watsonx_ai documentation.

The code in the next cell uploads training data to the bucket.

import os
import requests

url = "https://ibm.github.io/watsonx-ai-python-sdk/base.html"

document_filename = "base.html"

response = requests.get(url)

response.raise_for_status()

if not os.path.isfile(document_filename):
    with open(document_filename, "w", encoding="utf-8") as file:
        file.write(response.text)

document_asset_details = client.data_assets.create(name=document_filename, file_path=document_filename)

document_asset_id = client.data_assets.get_id(document_asset_details)
document_asset_id
Creating data asset...
SUCCESS
'4f76e9c4-724e-45a2-8099-2d93f2746db3'
Define a connection to training data.

from ibm_watsonx_ai.helpers import DataConnection

input_data_references = [DataConnection(data_asset_id=document_asset_id)]
Defining a connection to test data
Upload a json file that will be used for benchmarking to COS and then define a connection to this file. This example uses content from the ibm_watsonx_ai SDK documentation.

benchmarking_data_IBM_page_content = [
    {
        "question": "How can you set or refresh user request headers using the APIClient class?",
        "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})",
        "correct_answer_document_ids": [
            "base.html"
        ]
    },
    {
        "question": "How to initialise Credentials object with api_key",
        "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')",
        "correct_answer_document_ids": [
            "base.html"
        ]
    }
]
The code in the next cell uploads testing data to the bucket as a json file.

import json

test_filename = "benchmarking_data_Base.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(name=test_filename, file_path=test_filename)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id
Creating data asset...
SUCCESS
'84b59630-65a4-466d-b174-400928fb9634'
Define connection information to testing data.

test_data_references = [DataConnection(data_asset_id=test_asset_id)]
RAG Optimizer configuration
Provide the input information for AutoAI RAG optimizer:

name - experiment name
description - experiment description
max_number_of_rag_patterns - maximum number of RAG patterns to create
optimization_metrics - target optimization metrics
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG run - Base documentation',
    description="AutoAI RAG Optimizer on ibm_watsonx_ai Base documentation",
    foundation_models=["ibm/granite-13b-chat-v2"],
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    retrieval_methods=["simple"],
    chunking=[
        {
            "chunk_size": 512,
            "chunk_overlap": 64,
            "method": "recursive"
        }
    ],
    max_number_of_rag_patterns=4,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
Configuration parameters can be retrieved via get_params().

rag_optimizer.get_params()
{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'chunking': [{'chunk_size': 512, 'chunk_overlap': 64, 'method': 'recursive'}],
 'embedding_models': ['ibm/slate-125m-english-rtrvr'],
 'retrieval_methods': ['simple'],
 'foundation_models': ['ibm/granite-13b-chat-v2'],
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

RAG Experiment run
Call the run() method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying background_mode=True.

run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)

##############################################

Running 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69'

##############################################


pending.................
running....
completed
Training of 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69' finished successfully.
You can use the get_run_status() method to monitor AutoAI RAG jobs in background mode.

rag_optimizer.get_run_status()
'completed'

Comparison and testing of RAG Patterns
You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

summary = rag_optimizer.summary()
summary
mean_answer_correctness mean_faithfulness mean_context_correctness chunking.method chunking.chunk_size chunking.chunk_overlap embeddings.model_id vector_store.distance_metric retrieval.method retrieval.number_of_chunks generation.model_id
Pattern_Name           
Pattern4 0.7083 0.2317 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 3 ibm/granite-13b-chat-v2
Pattern1 0.5833 0.2045 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 5 ibm/granite-13b-chat-v2
Pattern2 0.5833 0.2372 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 5 ibm/granite-13b-chat-v2
Pattern3 0.5833 0.2117 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 3 ibm/granite-13b-chat-v2
Additionally, you can pass the scoring parameter to the summary method, to filter RAG patterns starting with the best.

summary = rag_optimizer.summary(scoring="faithfulness")
rag_optimizer.get_run_details()
{'entity': {'completed_at': '2025-01-10T10:15:30.808Z',
  'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110', 'name': 'L'},
  'input_data_references': [{'location': {'href': '/v2/assets/4f76e9c4-724e-45a2-8099-2d93f2746db3?project_id=b9156b62-8f2a-4a40-8570-990fdd5d67cb',
     'id': '4f76e9c4-724e-45a2-8099-2d93f2746db3'},
    'type': 'data_asset'}],
  'message': {'level': 'info', 'text': 'AAR019I: AutoAI execution completed.'},
  'parameters': {'constraints': {'chunking': [{'chunk_overlap': 64,
      'chunk_size': 512,
      'method': 'recursive'}],
    'embedding_models': ['ibm/slate-125m-english-rtrvr'],
    'foundation_models': ['ibm/granite-13b-chat-v2'],
    'max_number_of_rag_patterns': 4,
    'retrieval_methods': ['simple']},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 1,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 16,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/indexing_inference_notebook.ipynb'},
      'name': 'Pattern1',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 5},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_efb6f9ce_20250110101318',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.2541,
       'ci_low': 0.155,
       'mean': 0.2045,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 2,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 13,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/indexing_inference_notebook.ipynb'},
      'name': 'Pattern2',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 5},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'euclidean',
        'index_name': 'autoai_rag_efb6f9ce_20250110101349',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.3194,
       'ci_low': 0.155,
       'mean': 0.2372,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 3,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 25,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern3/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern3/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern3/indexing_inference_notebook.ipynb'},
      'name': 'Pattern3',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 3},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'euclidean',
        'index_name': 'autoai_rag_efb6f9ce_20250110101349',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.219,
       'ci_low': 0.2044,
       'mean': 0.2117,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 4,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 24,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern4/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern4/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern4/indexing_inference_notebook.ipynb'},
      'name': 'Pattern4',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 3},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_efb6f9ce_20250110101318',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.75,
       'ci_low': 0.6667,
       'mean': 0.7083,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.2589,
       'ci_low': 0.2044,
       'mean': 0.2317,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}}],
  'results_reference': {'location': {'path': 'default_autoai_rag_out',
    'training': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69',
    'training_status': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/training-status.json',
    'training_log': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/output.log',
    'assets_path': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/assets'},
   'type': 'container'},
  'running_at': '2025-01-10T10:12:50.000Z',
  'state': 'completed',
  'step': 'generation',
  'test_data_references': [{'location': {'href': '/v2/assets/84b59630-65a4-466d-b174-400928fb9634?project_id=b9156b62-8f2a-4a40-8570-990fdd5d67cb',
     'id': '84b59630-65a4-466d-b174-400928fb9634'},
    'type': 'data_asset'}],
  'timestamp': '2025-01-10T10:19:50.024Z'},
 'metadata': {'created_at': '2025-01-10T10:10:59.861Z',
  'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
  'id': 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69',
  'modified_at': '2025-01-10T10:15:30.971Z',
  'name': 'AutoAI RAG run - ModelInference documentation',
  'project_id': 'b9156b62-8f2a-4a40-8570-990fdd5d67cb'}}
Get selected pattern
Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

best_pattern = rag_optimizer.get_pattern(pattern_name="Pattern1")
best_pattern
The pattern details can be retrieved by calling the get_pattern_details method:

rag_optimizer.get_pattern_details(pattern_name='Pattern2')
Create the index/collection
Build solution on the best pattern, with additional document indexing.

You can check which index_name you are working on:

best_pattern.vector_store._index_name
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_text_extraction.html"
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)
USER_AGENT environment variable not set, consider setting it to identify your requests.
best_pattern.indexing_function(doc_splits)
['26e3b6934e2d26b5016e48a72f8066e6e6c46f842921dc3850a9d7e90db422b7',
 '2a41a4ffaa8030ef315178951f17656100e491e07cbc27e3fd5c246e47297470',
 'df6e5765dcfdfa245e80b5f526c4b6ae9bf661b94f46263f6d10fc235670a649',
 '3cafdf9aa72070f81f85138d6548290f69ce1daf06c2ff77833acb109c54daf1',
 '800812f60df6a099a6540f30db77eada51455f15d781f1ecc59a830206e9ee4a',
 '5c177d22821e150afc95ea18a34a94bd4e4cc18fc906fa9f48067a73e4fd9a14',
 'd83f23a1d666979ab04abca6229d981af044ce09fbef23a0d180eb5756c50bc8',
 '987e2db87207c400067195d32d74c46002c6dbea3d63cbcf2bf7739375075424',
 '75ad5d3f99270fe00cfea3f68f9825fabd66206746ab7b62fe7d7533cb249c78',
 '03222e428bf11a47ab5b4622437bb34aca042c6e889fbca7817b1d80500e954a',
 '9d83427cd6341b0d3be46ae5ecef1dafb4ddcafb63540ec12a005c9d1deb92bf',
 'da6f13c46cf0bd9e4fc79ef9f80cb89a42fcd260bc9c9397bfa349333fc312eb',
 '710988569ce081d6edc51cc23903d248a2017451afc6b75c36e69a34ee02c468',
 '531720786df6b24f44b66afb5e99e5d16f51d78f3a399df9f24e929e0c6e37ae',
 '9673f7cc44e2c6b59af016a5f97efe77682acab9a46bdf3f57863c0353db5f3d',
 '73a56b3bf7e6ccd5105013c1ebc8d73a45ed64e48daaea25ad95a9ea0e530f66',
 'fd6e62fbdf29810ab9312b9ca0bba262b62a0991e6f36474a22818575d688ef0',
 'c1f9e2d2b63d7a68bd8405caacd3ee7d4271a935503e9fc4ad132d863f0f9152',
 'c1ddf6459ad12eb0aad568ec5da83f03097f6442eb144133f0f2b788afd2c088',
 '6956b13358764b6c097b6ab421e8e74a30088bad09ef937f2ffb50c9ddf5898e',
 '860961359a43043a8fd90d8b7376473edd0fd39ce1b6b724c1ff8323a01758dd',
 '68799f2438b62ad530437694832105a3fc974eb7f8fb9e55ef83395793911577',
 '2352e4a519641e8a66a5b87b0015c3a02ecbdc5aa1de62f7a904038d2acda69d',
 'b686b759549d9781b9f66325db9f6034e6afaeac60de759afeb4b874b34b3cd6',
 'e4a6a6b3c45d63bbc81b143815e6ac7b0a3998e38ce6cc590fa427fd0a1e7bed',
 'cf632d17fdaabb5000d5bc1d07af8a8356277ec3cd5ec5a367d5915bf44ad634',
 'f61cc206172c9f07ad86f8c5320e44e662fd88b8cc49252b2722fdad6bf884e5',
 'ee8cc57587a324ac3c43b44e0669b169d4c3139a31adce852d688ca6b19f0fc6',
 '531b89f8fb66af799ec63adaf4e8163aa86a415f59f71ba403c71a7a1586b8e0',
 'c16e03a96347002fc7365194f0f63ed22d76f8b1b4699fcd391be5dedc1c005f',
 '2029010a59ab1a1b1b8f360ed3d6532e77176ff31c6bf9d9dffa9b37b23f7dee',
 'e4f459e29d7080ed5a85c0942dfaf355d231cee59d53648170867507e7df6d5d',
 '18446c0fa3966431f1c74dd3db41217e76b9c969e7e1de96b4669d26bdbd4f68',
 '1973580c0f13fb48f067d79f597590b4db4d75c164bd09ceab943de09c161df1',
 '54619f9b8ce803679610afc8a4c4e316c16fff73d67031bf8c1f83b8fd9a7b31',
 'bd84c8df19f0ac48edda33555ce69c21507d0e666bb1433805647f99df0959bb',
 '3095fd98675edfec0e467b7b4e1ce38dbd5eba89e7ee9cfa03ba944852313724',
 'a8bd89d8e09e3a95060956b88a3d9b85dff06a6f3bdff60f6efe3bd5b2b7d8f2',
 'c277e67493de9ee9e92b82030b57c9e021b384c126a46ecea6445e6c63b912c6',
 '87f5a49d9549eb4260a2a66bce63f367fa39e9e915cb6790c99e290257988e6a',
 '2fddccc515c8d4d23d2d1e75ac4dc391f413fd2c6baf378d2bb76110bbb60a19',
 '001b26fe1f1a0a6b642b34eba602bf838947cfff794415787960e855fa68c91c',
 'a2cb2616f62ab0831970ae4889e45685168d869dd11ee8f792ae389c3efb913c',
 'b2b99fd59119815de9bc3dfd29d46aeaf4929e7eb4875c4e53b527d2fd982eab',
 '481b335b0d8f1fc74be755601b13981af0fdbe4befa99c7c9ec59eb0e2ed001c',
 '6e8c8f4479328c60de924d7636089ad176041648938dc6156000577e52caffcb',
 'c7484d65bb4e88c10a710b68a0b444b002e73ebef345ff15627f9f8b2ce4c1ad',
 '17858cb4104c0c36a9f1287888351599dc3103d2b57ca182f85089ab087fdf3c',
 '2e1e623e9a5e5d25c96e3e63c40dff4839c63787bdfa423ea6b115ee1cc6e67c',
 '303835f544bbcec40d2063472f10bb7189ed514bf93bcdfae201e6e5d163787f',
 'd25fa313eeceafd6b3f16ad27794e7d816b3d26cab0fec035eee862e6b7e3e6a',
 '68e3e02337f85ecd81da415de431d4be81b7230cfbf35976398f7a569f782273',
 '547676950eb909affba820fda6e0e3a4e741197e597214ef86c6c8da3d135e61',
 '04b0c30d3de9df3176bca372e27fb30d71a7d1aad77d60ca8e0f3571298f174c',
 '236c80504dd62f867ba64780339310ff6da2c1207d3bdad2601175d92ad490d6',
 '61c6b52998e65348bb296a7a8befcdc96f0af6506690a275e1892a08f5bf6496',
 '3dbba258aab554d28bb2d6e55d05c730dfa2d21a4e2d5abe50d81dba8cc5055f',
 'a7d024c9244d327a30df25e0b7dc63659e72bb959407f51947959b84de30b3f7',
 '2568930293951272143b719c7b613ffef404cc11fcb0c48913e070b80ce362a9',
 '6152bbd7066755eb867681c0e39170d6c9c7494fe915285b575d94e407057728',
 '864c04bab6375d7cae2a99d0deaf5af954259842144674c67ffc088baa04130e',
 '1739949dd889dc912a77c1045b0e2a6bfe4cfd7f051fb6749f5a6d46227c36e8',
 '96c1df2eb0305d0faf8c18399e818cfc6be287c66ca416e9161f25d5b06243f2',
 '0c6b4cf5c33d84c97271da2fbddfd910914b80c63ca22cd4e9f9f0ee54eda8ef',
 'bdbd0974ded5efd7349bf9f493a0628df9b65245446fcb177e1c2186b1fe9c8c',
 '0ec434ccae697cd03bf14f41247e5bc22da92c1c99dd96d538715161f6d52c03',
 '321f0fc064aa3a50ef0a6df155b33ce054fe5046003c1c4a19870fa1bda21831',
 'e7e0b918538d1831863485c25ea55c7a05505e77332c3f96600fd2bd706f93a1',
 'ee2d58e72cdf6f056861a2523fd183ae2f26e7678430520f05f823a0127478df',
 'fdebb9d726075059af7622033d727f5c36f411c71f291e1d3eea449cf5e568ed']
Query the RAGPattern locally, to test it.

questions = ["How to add Task Credentials?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.token
        }
    ]
}

best_pattern.inference_function()(payload)
{'predictions': [{'fields': ['answer', 'reference_documents'],
   'values': [["\nTo add Task Credentials, you can use the `client.task_credentials.store()` method. This method requires no parameters and will create new task credentials if they do not already exist. If the list of task credentials is empty, this method will automatically add them.\n\nHere's an example of how to add Task Credentials:\n\n```

python\nfrom ibm_watsonx_ai import APIClient\n\n# Initialize the APIClient object if needed\nfrom ibm_watsonx_ai import APIClient\nclient = APIClient(credentials)\n\n# Add Task Credentials\nclient.task_credentials.store()\n

```\n\nNote: If you are using a custom foundation model, you will need to add Task Credentials before deploying the model. Failure to do so will result in token expiration issues.",
     [{'page_content': 'With task credentials, you can deploy a custom foundation model and avoid token expiration issues.\nFor more details, see Adding task credentials.\nTo list available task credentials, use the list method:\nclient.task_credentials.list()\n\n\nIf the list is empty, you can create new task credentials with the store method:\nclient.task_credentials.store()\n\n\nTo get the status of available task credentials, use the get_details method:\nclient.task_credentials.get_details()',
       'metadata': {'document_id': '8500262700953266120',
        'language': 'en',
        'sequence_number': 7,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html',
        'start_index': 0,
        'title': 'Custom models - IBM watsonx.ai'}},
      {'page_content': 'Initialize an APIClient object¶\nInitialize an APIClient object if needed. For more details about supported APIClient initialization, see Setup.\nfrom ibm_watsonx_ai import APIClient\n\nclient = APIClient(credentials)\nclient.set.default_project(project_id=project_id)\n# or client.set.default_space(space_id=space_id)\n\n\n\n\nAdd Task Credentials¶\n\nWarning\nIf not already added, Task Credentials are required on IBM watsonx.ai for IBM Cloud to make a deployment.',
       'metadata': {'document_id': '8500262700953266120',
        'language': 'en',
        'sequence_number': 6,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html',
        'start_index': 0,
        'title': 'Custom models - IBM watsonx.ai'}},
      {'page_content': 'Note\nWhen the credentials parameter is passed, one of these parameters is required: [project_id, space_id].\n\n\nHint\nYou can copy the project_id from the Project’s Manage tab (Project -> Manage -> General -> Details).\n\nExample:\n from ibm_watsonx_ai import Credentials\n from ibm_watsonx_ai.foundation_models import Embeddings\n from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams\n from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes',
       'metadata': {'document_id': '-541837184247348180',
        'language': 'en',
        'sequence_number': 10,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html',
        'start_index': 0,
        'title': 'Embeddings - IBM watsonx.ai'}},
      {'page_content': 'Parameters:\n\ncredentials (Credentials, optional) – credentials to the Watson Machine Learning instance\nproject_id (str, optional) – ID of the Watson Studio project, defaults to None\nspace_id (str, optional) – ID of the Watson Studio space, defaults to None\napi_client (APIClient, optional) – initialized APIClient object with a set project ID or space ID. If passed, credentials and project_id/space_id are not required, defaults to None\n\n\nRaises:',
       'metadata': {'document_id': '87506292822977493',
        'language': 'en',
        'sequence_number': 6,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_text_extraction.html',
        'start_index': 0,
        'title': 'Text Extractions - IBM watsonx.ai'}},
      {'page_content': 'Initialize the APIClient object¶\nInitialize the APIClient object if needed. For information about supported APIClient initialization, see Setup.\nfrom ibm_watsonx_ai import APIClient\n\nclient = APIClient(credentials)\nclient.set.default_project(project_id=project_id)\n# or client.set.default_space(space_id=space_id)\n\n\n\n\nList model specifications¶\n\nWarning\nOnly applicable for IBM watsonx.ai for IBM Cloud Pak® for Data 4.8.4 and later.',
       'metadata': {'document_id': '8500262700953266120',
        'language': 'en',
        'sequence_number': 17,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html',
        'start_index': 0,
        'title': 'Custom models - IBM watsonx.ai'}}]]]}]}

Historical runs
In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the list() method and provide the 'rag_optimizer' filter.

experiment.runs(filter='rag_optimizer').list()
timestamp run_id state auto_pipeline_optimizer name
0 2025-01-10T10:15:30.971Z efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69 completed AutoAI RAG run - ModelInference documentation
1 2025-01-09T15:13:26.515Z 555cb99c-925b-4f71-9e09-83533ed22fd3 completed AutoAI RAG run - ModelInference documentation
2 2025-01-09T12:58:25.539Z e0b4281c-8908-433b-a762-b68c9a7e3b09 completed AutoAI RAG run - ModelInference documentation
3 2025-01-09T09:49:10.264Z 71d650bb-c357-468a-87cb-e461242c68b3 completed AutoAI RAG run - ModelInference documentation
run_id = run_details['metadata']['id']
run_id
'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69'
Get executed optimizer's configuration parameters
experiment.runs.get_rag_params(run_id=run_id)
{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'chunking': [{'chunk_overlap': 64, 'chunk_size': 512, 'method': 'recursive'}],
 'embedding_models': ['ibm/slate-125m-english-rtrvr'],
 'retrieval_methods': ['simple'],
 'foundation_models': ['ibm/granite-13b-chat-v2'],
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}
Get historical rag_optimizer instance and training details
historical_opt = experiment.runs.get_rag_optimizer(run_id)
List trained patterns for selected optimizer
historical_opt.summary()
mean_answer_correctness mean_faithfulness mean_context_correctness chunking.method chunking.chunk_size chunking.chunk_overlap embeddings.model_id vector_store.distance_metric retrieval.method retrieval.number_of_chunks generation.model_id
Pattern_Name           
Pattern4 0.7083 0.2317 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 3 ibm/granite-13b-chat-v2
Pattern1 0.5833 0.2045 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 5 ibm/granite-13b-chat-v2
Pattern2 0.5833 0.2372 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 5 ibm/granite-13b-chat-v2
Pattern3 0.5833 0.2117 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 3 ibm/granite-13b-chat-v2

Clean up
To delete the current experiment, use the cancel_run method.

Warning: Be careful: once you delete an experiment, you will no longer be able to refer to it.

rag_optimizer.cancel_run()
'SUCCESS'
If you want to clean up all created assets: please follow up this sample notebook.

experiments
trainings
pipelines
model definitions
models
functions
deployments
Conclusion

This new feature adds more simplification of implementing RAG in watsonx.ai.
Useful link(s)

Watson Machine Learning examples: https://github.com/IBM/watson-machine-learning-samples/tree/master
DEV Community

New implementation on watsonx.ai: automate RAG pipeline development & deployment with a SDK

Introduction

What does this feature bring to users

Conclusion

Useful link(s)

Top comments (0)

Read next

Build a Weather Chatbot with DeepSeek v3 and OpenAI SDK: A Step-by-Step Guide

NVIDIA CES 2025 Keynote: AI Revolution and the $3000 Personal Supercomputer

7 LLM Benchmarks for Performance, Capabilities, and Limitations

ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning