Posted on Jul 21, 2024 • Edited on Nov 18, 2024

Combining .NET Aspire, Python, Docker (Remote), and Machine Learning Models for Summarising Photos

#dotnet #docker #aspire #ai

When it comes to cloud native technologies, there have been several ways to create local development environments to speed up onboarding new engineers or the process of switching projects / environments for development purposes. Microsoft has provided .Net Aspire where we can use C# to define our local development infrastructure and spin them up as opposed to using various Domain Specific Languages (DSL) or tools such as Docker Compose.

Over the past few years Open source machine learning models have also gained popularity and have been lowering the barriers of entry for building applications that expose such functionality. Ollama is an Open Source tool with a simple message: "Get up and running with large language models."

The purpose of this post is to find out how flexible .Net Aspire can be under specific requirements that might require using a remote Docker host with dedicated GPU. Of course having a fun use case where we will be summarising photos using multi modal models such as LLaVA and eventually others is an added bonus.

Under normal circumstances, defining our dependencies and then running our solution on whichever machine we are using would be straightforward. However, utilising a dedicated GPU (especially VRAM) can help iterating faster locally when using Machine Learning models. Often the laptops we are using does not have necessary hardware / memory resources and the fans get loud quickly if we try to run inference on ML models locally.

At times like this, what if we could Aspire to manage orchestration as usual but run the containers remotely on a host with necessary resources over the local network? From a high level, it would look like this

This post covers this approach and the structure will be as following:

Using a remote Docker host - How Docker allows running workloads on a remote host.
.Net Aspire - How support remote Docker hosts when using .Net Aspire?
- Ensuring the containers listen on Lan IO address.
- Injecting connection strings with correct IP address to workloads.
How to add support for custom containers such as Ollama or our own Python project running beside our .Net projects?
A demo application where we use a .Net Web API, Background Worker, PostgreSQL database, RabbitMQ and Ollama together to summarise photos from our library.

In this post, the term Docker implies any runtime that is compliant with standards outlined by Open Container Initiative. As such Podman and Rancher Desktop is also expected to work beside Docker Engine. This allows for switching the backend while using the same CLI tools in pipelines and other environments instead of changing the configuration. Podman Documentation has a section dedicated to this concept.

DOCKER_HOST Environment Variable

Given a container runtime consists of a CLI and an API component, it is possible to manage Docker Daemon on remote hosts using Docker cli locally. This is similar to how kubectl and Kubernetes API work together.

The image below illustrates this concept.

One way to achieve our objective of running containers on a remote host when using .Net Aspire is using ssh protocol as following:

There is a host machine on our network and it is possible for us to ssh into the from the client machine.
- This machine has an Nvidia GPU and also NVIDIA Docker configured.
Set the DOCKER_HOST environment variable export DOCKER_HOST=ssh://user@ip_address in the client machine.
Run docker commands as usual
- Provided the ssh authentication is successful, the commands will be executed in the remote host.

It is even possible to do this without the environment variables:
docker -H ssh://user@ip_address ps
Once we specify -H parameter and a valid value (SSH is what we are using in this use case but there are other options)

This is all it takes to be able to run docker commands on a remote host on the local network.

Transformers , Vision Transformers (ViTs) and Multimodal Models

Transformers have been widely adopted as a type of deep learning architecture that is effective at understanding and processing sequences of data. While the initial use case was natural language processing, recently they have proven their effectiveness in computer vision and multimodal models.

Some of the key concepts are:

Self-Attention Mechanism: This is the core feature of transformers. It allows the model to focus on different parts of the input data and understand the relationships between them. For example, in an image, self-attention helps the model to understand which parts of the image are important for a given task. Traditionally, we would consider convolution operations and focus on local spatial relationships however, attention mechanisms allow to understand patterns where local spatial relationships may not always help.
Architecture: Transformers consist of an encoder and a decoder. The encoder processes the input data (text / images), and the decoder generates the output (like translations or image descriptions in our case).
Multimodal Models: These models can process and integrate different types of data, such as text, images, and audio. Transformers are great fit for this as they can handle sequences of any kind of data once we obtain the embeddings.
Vision Transformers: These are transformers specifically adapted for computer vision tasks. They treat images as sequences of smaller patches (like small blocks of the image), which allows them to apply the same powerful techniques used in language processing to understand and generate visual information.

In this demo we will be utilising the following models using Ollama as well as Hugging Face.

Florence-2 variants: An advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks with relatively low resource requirements.
LLaVA variants: LLaVA is a multimodal model for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4.

Ollama is an open source project that allows easy access to open source LLM models. Ollama provides a local cli and cli that is capable of downloading necessary models simplifies experimentation process locally. Their slogan sums it all: "Get up and running with large language models"

System Components

So how can we put it all together when using .Net Aspire?

The setup provided here includes sample image and should work with a valid Docker Host as well as running all locally with lightweight models and the process will be covered later.

The demo project provides the following key functionality:

A .Net API to handle the requests for importing / updating photos and sending them via a bus
A background worker listening to messages and performing import / summarise actions.
An Ollama container that serves LLaVA models using built in Ollama http API.
A Python container that is built using Flask that runs inference on Florence-2 models and triggered by a Http API Call.
RabbitMq, PostgreSQL, PgAdmin,

.Net Aspire

Although the primary use case of summarising photos has been a personal interest recently, the motivation behind this post is mainly driven by curiosity about how does .Net Aspire compare to using something like Docker Compose.

.Net Aspire provides an orchestration approach for local development where we can describe our component dependencies using C# code and then debug our solution end to end locally. We also get a nice dashboard where even the structured logs from applications we are building and components we are suing are available.

Configuration and Running Remote containers transparently

As discussed, when we interact with Docker CLI, depending on the settings, the Docker daemon we are interacting could be on another machine.

    "http-remote-host-with-gpu": {
      "commandName": "Project",
      "dotnetRunMessages": true,
      "launchBrowser": true,
      "applicationUrl": "http://localhost:15273",
      "environmentVariables": {
        "DOCKER_HOST": "ssh://user@ip_address",
        "ENABLE_NVIDIA_DOCKER": "true",
        "VISION_MODEL": "llava:13b",
        "FLORENCE_MODEL": "Florence-2-large-ft",
        "ASPNETCORE_ENVIRONMENT": "Development",
        "DOTNET_ENVIRONMENT": "Development",
        "ASPIRE_ALLOW_UNSECURED_TRANSPORT" : "true",
      }
    }

As we can see below, by setting DOCKER_HOST environment variable in launch settings.json for the Apphost is all we need to do.

DOCKER_HOST: the user and the host that has necessary resources to run our containers and allows access to the specific user via ssh.
ENABLE_NVIDIA_DOCKER: For Ollama it is possible to use NVIdia Docker provided host machine supports it. Selecting this option regardless of Docker host setting will enable Nvidia Docker in the container by passing "--gpus=all" runtime arguments to the container as below. It is important that the machine we are running containers (local or not) has a compatible Nvidia GPU and has Nvidia Docker setup. On an Ubuntu machine, I am using Lambda Stack for this purpose.

if (useGpu)
{
    ollamaResourceBuilder.WithContainerRuntimeArgs("--gpus=all");
}

As we adjust the above flags, the containers we need will either run locally or remotely when we launch our Aspire project on a developer machine. This happens transparently as you do not need to perform any actions on the local or remote machine.

However this is transparent to our developer experience as we will be debugging / starting our projects the same way regardless what Docker Host is used.

Gotchas

Although this is straightforward, the basic Docker rules still apply. When using defaults, and we define containers, the exposed container ports will only be accessible from the local machine using localhost.

If we are running our containers on a remote host, we need to access these services using the Lan IP of the docker host and the container port mappings should be expose on the correct network interface.

Once we do this, we also need to inject connection strings to our projects. For instance, database connection sting would originally have localhost but if our db container is on a remote host, we need to ensure the correct ip / port is reflected on the relevant connection strings so that we can have our services work as expected.

The points will be illustrated in the following sections.

Adding Ollama Resource

The extensibility around Aspire is great and Jason Fauchelle from RAYGUN provided an excellent article on how to integrate Ollama with Aspire that the code in this demo is based on. The code here is based on this example.

Define a Resource OllamaResource.cs
- Responsible for:
- Capturing what port / network interface to bind to
- Exposing the correct connection string.
- Capturing parameters such as model name to download,
Define the extensions to orchestrate the resource OllamaResourceBuilderExtensions
- Responsible for:
- Defining the container parameters such as image, tag and volumes
- Optionally adding GPU support for the Container
- If running on a remote Docker host, set the correct interface or use localhost if no remote host set.
Define a lifecycle hook OllamaResourceLifecycleHook
- Responsibilities:
- To publish updates when models are being downloaded so that these can be shown in the dashboard.


// Define the resource and pass the parameters from the environment variables.

var ollamaContainer = builder.AddOllama(hostIpAddress: dockerHost, modelName: ollamaVisionModel,
    useGpu: enableNvidiaDocker);

// Any other service depending on will have correct connection string injected after     .WithReference(ollamaContainer) call.

With all this done, we can use IOllamaApiClient in our code to run inference via API as below:

  private async Task<ConversationContextWithResponse?> RunPrompt(string modelName, string prompt, long[]? context, string? base64Image)
    {
        var attempt = 0;

        var request = new GenerateCompletionRequest()
        {
            Prompt = prompt,
            Model = modelName,
            Stream = false,
            Context = context,
            Images = string.IsNullOrEmpty(base64Image) ? null : [base64Image]
        };

        var result = await ollamaApiClient.GetCompletion(request);
        while(string.IsNullOrEmpty(result?.Response) && attempt++ < MaxRetryAttempts)
        {
            result = await ollamaApiClient.GetCompletion(request);
        }

        return result;
    }

Building a Flask Application and Serving Hugging Face models

Fortunately, there is already an Aspire Package that allows running Python projects as part of our setup. So we do not need to do much here besides focusing on our Python application.

We can write Python Code and integrate as a web API via Flask or similar and start this up when our Aspire project starts. As a bonus, we also get OpenTelemetry support for free too.

This code exposes the functionality to run inference on Floerence-2 model variants downloaded from Huggingface, Particularly the Florence-2-base variants are relatively lightweight to run locally on a developer machine.

Provided we have created a Python project as documented in Aspire-Python repository, it is pretty simple to start our Python API from our aspire host:


var flaskAppFlorenceApi = builder.AddFlaskProjectWithVirtualEnvironment("florence2api", 
    "../PhotoSearch.Florence2.API/src")    .WithEnvironment("FLORENCE_MODEL",Environment.GetEnvironmentVariable("FLORENCE_MODEL"))
    .WithEnvironment("PYTHONUNBUFFERED","0");

Once we do the above, we can set dependencies as usual in our Aspire Apphost and ensure connection strings are injected. As an example, this project works locally on developer machine and therefore there is no need to adjust the connection strings in Aspire host.

The following are required to integrate Python code:

Create a directory in the solution directory for Python project.
Initialise the project using rye (link at the bottom)
Write some Python code. Examples below.

Checking the Dashboard

Below we can see the traces originating from the Web API then Background Worker via RabbitMQ and then for each photo in the database, call to Florence-2 Api (Python) is made. At the summary view, www can see all calls correlated. If we then select the last trace, we can see the spans generated from all 3 services correlated as expected in the second image below. What I find really interesting is this is due to utilising Open Telemetry functionality and not by some magic from Aspire. Pretty much how Docker is utilised.

The project comes with a http file that can be used to trigger the API call to import photos in DB as well as summarising the models:

We can also see structured logs (besides traces and metrics) from our Python project as below.

Results

Here is a section illustrating the output from various models using one of the images provided.

Model	Description	Objects
llava:7b	The image captures a vibrant concert scene. A stage set against a backdrop of the iconic AC/DC logo, illuminated by white and red lights, serves as the main focus. On this stage, two performers can be seen: one on the left wielding a microphone and another on the right strumming a guitar. The crowd in the foreground is filled with enthusiastic fans, their faces blurred, adding to the dynamic atmosphere of the concert. In the background, two large speakers stand as silent sentinels, ready to amplify the sound of the performance. The perspective of the image suggests it was taken from a distance, placing the viewer in the midst of the crowd, allowing them to take in the entirety of the event. Despite the absence of text, the image conveys the energy and excitement synonymous with live music performances.	stage, backdrop, logo, lights, performer, microphone, guitar, crowd, fans, speakers
llava:13b	The image shows a concert scene featuring the band AC/DC. On stage, there are three band members visible; one is playing a guitar while another stands behind a microphone, likely performing vocals or speaking to the audience. A third person appears to be controlling audio or lighting equipment. The stage is set against a large LED screen displaying the name AC/DC prominently in bold white letters on a red background. The lighting from the stage and the screens creates a dynamic atmosphere, typical of a live rock concert.\n\nIn the foreground, there are numerous blurred figures that represent the audience, suggesting the photo was taken during a performance when the crowd is actively engaged with the band. The colors in the image are vibrant with reds and whites dominating the stage's lighting, contributing to the energetic feel of the concert setting. There is no visible text providing additional context or information about the event or location within the image.	Band members on stage, LED screen with AC/DC written on it, Microphone stand, Audio/lighting equipment operator
Florence-2-base-ft	The stage is lit up. There are people sitting in the stands watching the concert. There is a large sign on the stage that says AC/C.D.	building, person
Florence-2-large-ft	A band is performing on a stage. There is a large sign on the stage that says AC/DC. There are lights on the ceiling above the stage. A large bell is hanging from the ceiling. A lot of people are standing in front of the stage watching the band.	drum, guitar, loudspeaker, person

Model	Description	Objects
llava:7b	The image presents a bustling scene centered around the iconic Royal Albert Hall, a renowned concert venue in London. This grand structure is immediately recognizable by its large circular shape and distinctive blue roof. The building's architectural details are clearly visible, showcasing the intricate patterns that make it stand out amidst the cityscape. In front of the building, there's a lively plaza where people can be seen enjoying their time. A tree with green leaves stands prominently on this plaza, providing shade to those under its branches. The plaza also features a statue, silently observing the hustle and bustle around it. The photo captures the building from a distance, allowing for a full view of its architecture and the activity surrounding it. The clear sky overhead contrasts with the greenery in the foreground, creating a harmonious blend of nature and urban development. Despite being just an image, this photograph tells a story of the vibrancy and energy of London, with its mix of historical architecture like the Royal Albert Hall and everyday life in the form of people enjoying their time in front of this landmark.	building, tree, statue
llava:13b	The image shows a view of the Royal Albert Hall, a famous concert venue in London, United Kingdom. The building has a distinctive architectural design with its large central dome and multiple arched windows around it. The facade is red brick with white detailing, and there are several people visible on the walkway in front of the hall, suggesting it might be a public space where visitors can explore or enjoy outdoor events. The sky is clear with some clouds scattered across it, indicating fair weather conditions when the photo was taken. There are trees framing the image at the edges, adding to the sense of an open park-like setting around this historic building.	Royal Albert Hall, concert venue, London, United Kingdom, architecture, dome, arched windows, red brick facade, white detailing, visitors, walkway, outdoor events, fair weather, sky, clouds, trees
Florence-2-base-ft	A large brown building with a dome on top of it. There is a large green tree next to the building. There are people standing in front of the building on the grass.	building
Florence-2-large-ft	A large brown building with a dome on top of it. There is a large green tree in front of the building. There are people standing on the sidewalk in front and on the grass.	building, house

Model	Description	Objects
llava:7b	The image depicts a vibrant and colorful scene of urban street art, specifically graffiti. Dominating the foreground is a large metal wall covered in various tags and pieces of graffiti. Some of the text visible on the wall includes "CHEW" "JIM BOB" and "TICKLES" The style of the artwork suggests it was done with spray paint, which is common for street art. The wall appears to be an outdoor structure as indicated by the natural light illuminating the scene. Behind the metal wall, there's a glimpse of what seems to be a building facade featuring more graffiti and a mural with text that reads "GRAND BARNACLE." The overall impression is one of a bustling urban environment with vibrant street art as its defining characteristic.	metal wall, tags, graffiti, spray paint, text, building facade, mural, GRAND BARNACLE, natural light
llava:13b	The image displays a wall covered in graffiti. The graffiti is done in various styles and colors, with some pieces appearing to be hand-drawn while others have a more stylized appearance, possibly painted with spray paint. The content of the graffiti includes both letters and illustrations. There are several pieces that seem to contain profanity or derogatory language, as indicated by the bold, capitalized text. Additionally, there are some images that appear to be representations of people, one of which is depicted in a fighting stance with its fists raised. The wall itself has a concrete texture and appears to be an outdoor structure, possibly a part of a public or urban space where graffiti is commonplace. There are no visible texts providing context about the location, artistic intent, or history of this specific piece of graffiti.	Wall covered in graffiti
Florence-2-base-ft	The wall is covered in graffiti. The graffiti is bright and colorful. The letters on the wall are large and red. There is a sun in the sky. The sun is bright yellow. There are palm trees in the painting. The shutters on the building are made of metal. The ground under the building is made of concrete.	poster
Florence-2-large-ft	The wall is covered in graffiti. The graffiti is bright and colorful. The letters on the wall are large and red. There is a sun in the sky. The sun is bright yellow. There are palm trees in the painting. The shutters on the building are made of metal. The ground under the building is made of concrete.	poster

Conclusion and Next Steps

In this post, we have gone through .Net Aspire, some Open Source models / tools allowing convenient access to perform inference locally and also how we can make use of Docker concepts with Aspire so that we can offload ML tasks to other machines that might be available on local network.

I was not sure how comparable this experience would be compared to using a Docker Compose file that defined all containers we need to work and then using DOCKER_HOST value to run it on a remote host.

The code example here adds up to a whole weekend and a couple of evenings to complete so it is not exactly how we would implement such services for production. Given this was the first time I was touching Aspire in detail, does not seem too bad. Potentially, with Docker compose approach, there would be less time to setup and configure so I could spend more time on the actual application functionality. At the end of the process it felt somewhat natural and perhaps in ling term might be more effective as it also reduces barriers of entry in a team setting.

In addition, one area Aspire has advantage is logging as I am not sure if there are any dashboards that support open telemetry logging as smooth as what we get in Aspire dashboard. In terms of traces and metrics it is quire straightforward to set up manually but last time I checked, open telemetry logging functionality was still lagging.

At the end of the process, .Net Aspire workflow seems natural and provided it supports Docker cli, Open Telemetry and the relevant concepts without extending the functionality, and feels like a good choice.

Next steps

Although it is straightforward to experiment with powerful models these days, there are key concepts that are crucial to success. Given that we have different configuration parameters as well as different models, how can we define what is the success criteria and being able to tell whether our model is performing good or bad?

Evaluating model performance is crucial and this requires further understanding of the concepts under the hood. given we have have covered the initial setup and concepts covered in this post, in the next post we will be experimenting with measuring the performance of our system and potentially introducing a front end to help evaluating the model performance.

Repository link: https://github.com/syamaner/photo-search

If there is any interest, I can summarise the steps to run the code locally in the repository.

DEV Community

Combining .NET Aspire, Python, Docker (Remote), and Machine Learning Models for Summarising Photos

DOCKER_HOST Environment Variable

Transformers , Vision Transformers (ViTs) and Multimodal Models

System Components

.Net Aspire

Configuration and Running Remote containers transparently

Gotchas

Adding Ollama Resource

Building a Flask Application and Serving Hugging Face models

Checking the Dashboard

Results

Conclusion and Next Steps

Next steps

Links

Top comments (0)

Read next

How to Use the Prebuilt ReAct Agent in LangGraph

Understanding Large Language Models: From Training to Real-World Use

Amazon Q: Your GenAI Assistant for Business Processes, Code Reviews, and Documentation

deploy Jenkins using docker compose with production ready