Forem

Cover image for Chatbot with Semantic Kernel - Part 6: AI Connectors 🔌
David Sola
David Sola

Posted on

Chatbot with Semantic Kernel - Part 6: AI Connectors 🔌

One of Semantic Kernel's key features is the ability to easily swap between different AI providers. This allows us to compare different models and their performance to find the model that best suits our use case.

Semantic Kernel supports major AI providers including OpenAI, Google, Azure, Mistral, Meta, Hugging Face, and others. Although all these platforms provide Large Language Models, each model differs in its characteristics and capabilities:

  • Modality: Input and output formats (text, image, video, audio, etc.) differ between models and platforms.
  • Velocity: Some models are faster at generating responses to users.
  • Cost: Bigger and more powerful models (especially reasoning models) have a higher cost per token.
  • Structured output: This is the capability of a model to generate a predictable response following a defined JSON schema. This advanced feature is not enabled on all models.
  • Function calling: This is the ability of the model to invoke plugins (or tools) defined as native code. Although most major providers support function calling, it remains limited in the small language models ecosystem.

One of the supported platforms that I find particularly interesting because of its possibilities is Ollama.

Ollama

Ollama is an open-source tool for running Large Language Models locally. Although LLMs are complex and large models, some of them (especially Small Language Models) can run easily on a laptop. Running models locally is very useful for AI developers since it allows us to maintain everything offline, switch between open models effortlessly, and avoid unnecessary complexity and costs associated with cloud-based models.

Ollama can be installed on Windows, Linux, and macOS from its webpage. Once installed, we can explore their library of models to find a model that fits our purpose. Keep in mind that each model requires different hardware capacity to run locally. Although the reality is more complex, you can consider the size of the model (number of billions of parameters) to estimate its hardware needs. Additionally, each model in the library includes tags with useful information: function calling (tool) support, modality, etc. To run the model locally, simply open a terminal and run ollama run <model-name>:<flavour> (e.g., ollama run phi4 or ollama run llama3.2:1b).

Ollama - Run model

Once the model is running, it can be accessed via terminal, api or via libraries/frameworks. In the next section, we will use models served from Ollama with Semantic Kernel.

Ollama - Interact from terminal

Keep in mind that this blog only showcases the most basic usage of Ollama. It includes a much more complete list of features such as model customization, templated prompts, and more.

Using Ollama on Semantic Kernel

Although this section focuses on Ollama, most of its explanation can be easily adapted to any other AI Connector supported by Semantic Kernel. On this link you can find all the currently supported connectors for chat completion services.

First, we need to install the specific semantic-kernel package for Ollama:

pip install semantic-kernel[ollama]
Enter fullscreen mode Exit fullscreen mode

Next, we define the settings for the connector. There are different ways of defining these settings; in this example, we use settings defined via environment variables. You can findhere all the defined settings in Semantic Kernel for the different connectors.

OLLAMA_CHAT_MODEL_ID        = "..."     # Completion Chat Model
OLLAMA_TEXT_MODEL_ID        = "..."     # Completion Text Model
OLLAMA_EMBEDDING_MODEL_ID   = "..."     # Embedding Model
OLLAMA_HOST                 = "..."     # Url of the Ollama server. If not defined it defaults to localhost
Enter fullscreen mode Exit fullscreen mode

Finally, we inject the Ollama services into the Kernel.

# Import dependencies
from semantic_kernel.connectors.ai.ollama import (
    OllamaChatCompletion
)

...

# Inject service into Kernel
self.kernel.add_service(OllamaChatCompletion(service_id='chat_completion')) 

# Retrieve service from Kernel
self.chat_service = self.kernel.get_service(type=OllamaChatCompletion)  

# Get default settings for the service
self.chat_settings = self.kernel.get_prompt_execution_settings_from_service_id(service_id='chat_completion')  

# Define function tool behavior. Set to None if the model does not support function calling
if support_tool:
    self.chat_settings.function_choice_behavior = FunctionChoiceBehavior.Auto()
else:
    self.chat_settings.function_choice_behavior = FunctionChoiceBehavior.NoneInvoke()
Enter fullscreen mode Exit fullscreen mode

Another scenario, particularly when we want to easily switch between providers to test them, is to have one single code base where changing between providers can be done easily. In that case, we can define an environment variable GLOBAL_LLM_SERVICE that specifies which provider we are going to use:

llm_service = os.environ['GLOBAL_LLM_SERVICE']

# Define services per AI connector
services = {  
    'AzureOpenAI': [  
        ('chat_completion', AzureChatCompletion),  
        ('audio_to_text_service', AzureAudioToText),  
        ('text_to_audio_service', AzureTextToAudio)  
    ],  
    'OpenAI': [  
        ('chat_completion', OpenAIChatCompletion),  
        ('audio_to_text_service', OpenAIAudioToText),  
        ('text_to_audio_service', OpenAITextToAudio)  
    ],  
    'Ollama': [  
        ('chat_completion', OllamaChatCompletion)  
    ]  
}

self.kernel = Kernel()

# Init services
for service_id, service_class in services.get(llm_service, []):  
    self.kernel.add_service(service_class(service_id=service_id))

# Set settings
self.chat_settings = self.kernel.get_prompt_execution_settings_from_service_id(service_id='chat_completion')
self.chat_settings.function_choice_behavior = (  
    FunctionChoiceBehavior.Auto() if support_tool else FunctionChoiceBehavior.NoneInvoke()
)

# Retrieve services
chat_completion = self.kernel.get_service(service_id='chat_completion')
if llm_service in ['AzureOpenAI', 'OpenAI']:
    audio_to_text_service = self.kernel.get_service(service_id='audio_to_text_service')
    text_to_audio_service = self.kernel.get_service(service_id='text_to_audio_service')
Enter fullscreen mode Exit fullscreen mode

Summary

In this chapter, we have added several AI connectors to the chatbot so it can work with different models from different providers. Additionally, we have explored in more detail how to run models locally with Ollama.

Remember that all the code is already available on my GitHub repository 🐍 PyChatbot for Semantic Kernel.

Top comments (0)