DEV Community

Cover image for Deploying ML projects with Argo CD
Jesse Williams for KitOps

Posted on • Originally published at jozu.com

Deploying ML projects with Argo CD

Machine learning (ML) projects often involve numerous dependencies, convoluted model management processes, and frequently updating components like datasets, model parameters, and resulting artifacts. Consequently, effective deployment and management of ML projects is a necessary and intimidating task. As engineering teams embrace microservices architectures and machine learning models grow more complicated, traditional deployment methods often fall short.

To address these issues, this article demonstrates how Argo CD, a Kubernetes continuous delivery tool, can simplify the deployment process and transform how ML engineers and data scientists implement their projects. You will also learn to effectively package and seamlessly share your ML projects using KitOps: a ModelKit-based packaging tool.

The combination of Argo CD and KitOps addresses an important gap in ML. While Argo CD makes deployment smoother, KitOps streamlines the packaging and sharing of ML projects. This combination enables teams to maintain a standardized approach to both deployment and project distribution, enhancing collaboration and reproducibility in ML workflows.

Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes that assists with the deployment and management of applications. Unlike similar tools, it embraces the GitOps approach by enabling automated syncing and rollback capabilities. Argo CD's ability to manage numerous applications and environments within various Kubernetes clusters makes it ideal for large-scale ML projects. Other features of Argo CD include:

  • Maintaining uniformity in configuration across environments.
  • Automating deployments directly from Git repositories.
  • Enabling easy rollbacks and version control.
  • Minimizing human error and manual intervention.

Let’s install Argo CD and use it to deploy a demo ML project locally.

Installing Argo CD

1/ Before installing Argo CD, make sure to install Docker, minikube, and kubectl. These tools are necessary for Argo CD and serve the following purposes:

  • Docker: Provides a container runtime for building and running containerized applications. It is essential to create a local Kubernetes cluster.
  • Minikube: Sets up a local Kubernetes cluster for deploying and managing Argo CD.
  • kubectl: Facilitates interaction with the Kubernetes cluster for installing, configuring, and managing Argo CD.

2/ Create a new namespace by running the following commands:

minikube start
    kubectl create namespace argocd
    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml 
Enter fullscreen mode Exit fullscreen mode

3/ [Install Argo CD CLI](https://argo-cd.readthedocs.io/en/stable/cli_installation/\) using brew (in Mac, , Linux and WSL Homebrew).

brew install argocd 
Enter fullscreen mode Exit fullscreen mode

4/ By default, the Argo CD API server is not exposed with an external IP. To access the API server, run:

    kubectl port-forward svc/argocd-server -n argocd 8080:443
Enter fullscreen mode Exit fullscreen mode

The API server can then be accessed using https://localhost:8080.

5/ Login to Argo CD using the CLI. The default username is admin, and you can access the default password by running:

    argocd admin initial-password -n argocd
Enter fullscreen mode Exit fullscreen mode

Then, login using the following command:

     argocd login 127.0.0.1:8080
Enter fullscreen mode Exit fullscreen mode

6/ Finally, you will need to create a cluster to deploy your applications. For this demonstration, you can create a local cluster. You will need to have Docker up and running before executing the command below:

    kubectl config get-contexts -o name
    argocd cluster add docker-desktop
Enter fullscreen mode Exit fullscreen mode

Now, you are ready to create an ML application.

Creating a wine quality classifier

With Argo CD installed and your cluster set up, the next step is to train an ML model for deployment. You will also need to serve the trained model via an API. Follow the steps below to train a wine quality classifier and serve it through FastAPI endpoints.

1/ Download the wine quality dataset from Kaggle and save it as winequality.csv inside the dataset folder in your workspace.

2/ Install pandas and scikit-learn and freeze the requirements for reproducibility by executing the commands below:

    pip install pandas scikit-learn
    pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

3/ Create a new file, train.py, to train and save the final model. The file should contain the following code:

    # Importing necessary libraries
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import classification_report
    import joblib

    # Load the dataset
    file_path = 'dataset/winequality.csv'  # Update with the correct file path if needed
    df = pd.read_csv(file_path)

    # Preprocessing: Separate features and target
    X = df.drop('quality', axis=1)
    y = df['quality']

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train a Random Forest Classifier
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

    # Save the model to disk
    model_path = 'saved_model/wine_quality_model.pkl'  # Specify the desired path to save the model
    joblib.dump(model, model_path)
    print(f"Model saved to {model_path}")
Enter fullscreen mode Exit fullscreen mode

The above code uses libraries like pandas and scikit-learn to load the data and train a random forest classifier. After the training is complete, the code saves the model locally to the directory named saved_model.

4/ Run the train.py script using python train.py. You should now see the final saved model in saved_model directory.

At this point, your directory structure should look something like this:

.
├── dataset
│   └── winequality.csv
├── requirements.txt
├── saved_model
│   └── wine_quality_model.pkl
└── train.py
Enter fullscreen mode Exit fullscreen mode

5/ You will also need to create an API for deployment, so install FastAPI. Since you have installed a new library, you will also need to freeze the dependencies.

    pip install "fastapi[standard]"
    pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

6/ In the main.py file, load the saved model and expose an endpoint for your users:

    from fastapi import FastAPI
    import numpy as np
    import joblib
    from pydantic import BaseModel
    import starlette
    from typing import List

    app = FastAPI()


    @app.get("/")
    def read_main():
        return {"message": "Welcome"}


    class WineData(BaseModel):
        fixed_acidity: float
        volatile_acidity: float
        citric_acid: float
        residual_sugar: float
        chlorides: float
        free_sulfur_dioxide: float
        total_sulfur_dioxide: float
        density: float
        pH: float
        sulphates: float
        alcohol: float


    @app.post("/winequality/")
    def analyze_wine_quality(wine_data: WineData):
        classifier = joblib.load("saved_model/wine_quality_model.pkl")
        print("The data is \n")
        print(wine_data)
        predict_data = [
            wine_data.fixed_acidity,
            wine_data.volatile_acidity,
            wine_data.citric_acid,
            wine_data.residual_sugar,
            wine_data.chlorides,
            wine_data.free_sulfur_dioxide,
            wine_data.total_sulfur_dioxide,
            wine_data.density,
            wine_data.pH,
            wine_data.sulphates,
            wine_data.alcohol,
        ]
        predict_data = np.array(predict_data).reshape(1, -1)
        prediction = classifier.predict(predict_data)
        print("thre predictin is")
        print(prediction, type(prediction))

        return_obj = {"quality": int(prediction)}
        return return_obj
Enter fullscreen mode Exit fullscreen mode

In the code above, you will see that the function, analyze_wine_quality(wine_data: WineData) defines the endpoint @app.post("/winequality/"). The function loads the saved model and uses it to predict the quality of the wine.

7/ You will now be able to send an API request to examine the wine quality. The application exposes a POST method at http://127.0.0.1:8000/winequality/. You can use Postman to send requests as shown in the image below.

Sample API request and response

8/ Finally, you will need to create a Docker image for your application so that you can register it with Argo CD. As such, create a Dockerfile with the following contents:

    # Use the official Python base image
    FROM python:3.11-slim

    # Set the working directory inside the container
    WORKDIR /app

    # Copy the requirements file into the container
    COPY requirements.txt .
    # Install the Python dependencies
    RUN pip install --upgrade pip
    RUN pip install -r requirements.txt

    # Copy the application code into the container
    COPY . .
    # Expose the FastAPI app's default port (8000)
    EXPOSE 8000
    # Command to run the FastAPI app using uvicorn
    CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

Create a Docker image using the command:

    docker build -t fastapi-app .
Enter fullscreen mode Exit fullscreen mode

Now, you should see a new image fastapi-app in the list of your Docker images.

List of all Docker images

At this stage, you are ready to go back to Argo CD and continue deployment.

Deploying API to Argo CD cluster

With the ML model trained and the API set up, the next step is to use Argo CD to expose the trained machine learning model via the API. To do so, you will need to follow the steps below:

1/ Set the current namespace to argocd by running the following command:

    kubectl config set-context --current --namespace=argocd
Enter fullscreen mode Exit fullscreen mode

2/ Create deployment.yaml and svc.yaml file. For your convenience, you won’t need to separately create them as they have been created for you in this repository inside the fastapi folder.

3/ Create the example application with the following command:

    argocd app create fastapi --repo https://github.com/bhattbhuwan13/argocd-example-apps.git --path fastapi --dest-server https://kubernetes.default.svc --dest-namespace default 
Enter fullscreen mode Exit fullscreen mode

To confirm that the application has been created, view its status:


    argocd app get fastapi
    Name:               argocd/fastapi
    Project:            default
    Server:             https://kubernetes.default.svc
    Namespace:          default
    URL:                https://127.0.0.1:8080/applications/fastapi
    Source:
    - Repo:             https://github.com/bhattbhuwan13/argocd-example-apps.git
      Target:
      Path:             fastapi
    SyncWindow:         Sync Allowed
    Sync Policy:        Manual
    Sync Status:        OutOfSync from  (36a9d33)
    Health Status:      Missing

    GROUP  KIND        NAMESPACE  NAME    STATUS     HEALTH   HOOK  MESSAGE
           Service     default    ml-api  OutOfSync  Missing
    apps   Deployment  default    ml-api  OutOfSync  Missing
Enter fullscreen mode Exit fullscreen mode

4/ The application status is initially in OutOfSync since the application has yet to be deployed, and no Kubernetes resources have been created. To deploy the application, run:

    argocd app sync fastapi
Enter fullscreen mode Exit fullscreen mode

5/ You should now be able to see a healthy application running in the UI at https://localhost:8080/applications/argocd/fastapi.
Healthy application running in the Argo CD dashboard

Collaborating and sharing using KitOps

At this point, you have a trained model deployed on a local Kubernetes cluster. Now, if you want to share your code and artifacts with other engineers on the team, how do you do that? Well, one great tool is KitOps.

KitOps is an open-source project designed to enhance collaboration among stakeholders in AI/ML projects. At the heart of KitOps is the ModelKit, an OCI-compliant packaging format that allows smooth sharing of all necessary artifacts involved in the AI/ML model lifecycle. Key benefits of ModelKit include:

  • Version-controlled and secured packaging:
    Combine all project artifacts into a single bundle with versioning and SHA checksums for integrity.

  • Seamless integration:
    Works with OCI-compliant registries (e.g., Docker Hub and Jozu Hub) and integrates with popular tools like HuggingFace, ZenML, and Git.

  • Effortless dependency management:
    Ship dependencies alongside code for hassle-free execution.

Installing KitOps and sharing the project
To install Kit, you need to download the package, unarchive it, and move the kit executable to a location where your operating system can find it. In Linux, you can achieve this by running the following commands:

    wget https://github.com/jozu-ai/kitops/releases/latest/download/kitops-linux-x86_64.tar.gz

    tar -xzvf kitops-linux-x86_64.tar.gz

    sudo mv kit /usr/local/bin/
Enter fullscreen mode Exit fullscreen mode

For Windows and MacOS, please visit the official site which contains the installation instructions.
Verify your installation by running the command kit version. Your output should look something like this:

    Version: 0.2.5-29dbdc4
    Commit: 29dbdc48bf2b5f9ee801d6454974e0b8474e916b
    Built: 2024-06-06T17:53:35Z
    Go version: go1.21.6
Enter fullscreen mode Exit fullscreen mode

Once you have installed Kit, you will need to write a Kitfile to specify different components of your code that need to be packaged. You can use any text editor to create a new file Kitfile without any extension and enter the following details:

    manifestVersion: "1.0"
    package:
      name: Wine Classification
      version: 0.0.1
      authors: ["Bhuwan Bhatt"]
    model:
      name: wine-classification-v1
      path: ./saved_model
      description: Wine classification using sklearn
    datasets:
      - description: Dataset for the wine quality data
        name: training data
        path: ./dataset

    code:
      - description: Code for training
        path: .
Enter fullscreen mode Exit fullscreen mode

There are 5 major components in the code snippet above:

  • manifestVersion: Specifies the version for the Kitfile.
  • package: Specifies the metadata for the package.
  • model: Specifies the model details, which contain the model's name, its path, and human-readable description.
  • datasets: Similar to the model, specifies the path, name, and description for the dataset.
  • code: Specifies the directory containing code that needs to be packaged.

Once the Kit command line tools are installed and Kitfile is ready, you will need to log in to a container registry. To log in to DockerHub, use the below command:

    kit login docker.io # Then enter details like username and password, password is hidden
Enter fullscreen mode Exit fullscreen mode

You can then package the artifacts into a ModelKit using the following command:

    kit pack . -t docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: `kit pack . -t docker.io/bhattbhuwan13/wine_classification:v1`
Enter fullscreen mode Exit fullscreen mode

Finally, you can push the ModelKit to the remote hub:

    kit push docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit push docker.io/bhattbhuwan13/wine_classification:v1 
Enter fullscreen mode Exit fullscreen mode

Now, developers can pull required components from the ModelKit or the entire package using a single command. They can unpack specific components from the ModelKit:

    kit unpack --datasets docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit unpack --datasets docker.io/bhattbhuwan13/wine_classification:v1
Enter fullscreen mode Exit fullscreen mode

Or, they can unpack the entire ModelKit in their own instance:

    kit unpack docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit unpack docker.io/bhattbhuwan13/wine_classification:v1
Enter fullscreen mode Exit fullscreen mode

At this stage, developers can run the necessary tests to verify that the model or code works as expected. Once the tests run successfully, they use Argo CD to deploy the model to the production server by simply changing the cluster location in the above steps or by following this guide.

Get involved with the KitOps community

Thanks to the support and feedback of our community, KitOps is rapidly evolving. In fact, we jus released KitOps v1.0, and are actively looking for design partners to help us shape our roadmap.

If you are interested in learning more about KitOps or meeting our team, please reach out on our community Discord channel.

Top comments (0)