Machine learning (ML) projects often involve numerous dependencies, convoluted model management processes, and frequently updating components like datasets, model parameters, and resulting artifacts. Consequently, effective deployment and management of ML projects is a necessary and intimidating task. As engineering teams embrace microservices architectures and machine learning models grow more complicated, traditional deployment methods often fall short.
To address these issues, this article demonstrates how Argo CD, a Kubernetes continuous delivery tool, can simplify the deployment process and transform how ML engineers and data scientists implement their projects. You will also learn to effectively package and seamlessly share your ML projects using KitOps: a ModelKit-based packaging tool.
The combination of Argo CD and KitOps addresses an important gap in ML. While Argo CD makes deployment smoother, KitOps streamlines the packaging and sharing of ML projects. This combination enables teams to maintain a standardized approach to both deployment and project distribution, enhancing collaboration and reproducibility in ML workflows.
Argo CD
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes that assists with the deployment and management of applications. Unlike similar tools, it embraces the GitOps approach by enabling automated syncing and rollback capabilities. Argo CD's ability to manage numerous applications and environments within various Kubernetes clusters makes it ideal for large-scale ML projects. Other features of Argo CD include:
- Maintaining uniformity in configuration across environments.
- Automating deployments directly from Git repositories.
- Enabling easy rollbacks and version control.
- Minimizing human error and manual intervention.
Let’s install Argo CD and use it to deploy a demo ML project locally.
Installing Argo CD
1/ Before installing Argo CD, make sure to install Docker, minikube, and kubectl. These tools are necessary for Argo CD and serve the following purposes:
- Docker: Provides a container runtime for building and running containerized applications. It is essential to create a local Kubernetes cluster.
- Minikube: Sets up a local Kubernetes cluster for deploying and managing Argo CD.
- kubectl: Facilitates interaction with the Kubernetes cluster for installing, configuring, and managing Argo CD.
2/ Create a new namespace by running the following commands:
minikube start
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
3/ [Install Argo CD CLI](https://argo-cd.readthedocs.io/en/stable/cli_installation/\) using brew (in Mac, , Linux and WSL Homebrew).
brew install argocd
4/ By default, the Argo CD API server is not exposed with an external IP. To access the API server, run:
kubectl port-forward svc/argocd-server -n argocd 8080:443
The API server can then be accessed using https://localhost:8080.
5/ Login to Argo CD using the CLI. The default username is admin
, and you can access the default password by running:
argocd admin initial-password -n argocd
Then, login using the following command:
argocd login 127.0.0.1:8080
6/ Finally, you will need to create a cluster to deploy your applications. For this demonstration, you can create a local cluster. You will need to have Docker up and running before executing the command below:
kubectl config get-contexts -o name
argocd cluster add docker-desktop
Now, you are ready to create an ML application.
Creating a wine quality classifier
With Argo CD installed and your cluster set up, the next step is to train an ML model for deployment. You will also need to serve the trained model via an API. Follow the steps below to train a wine quality classifier and serve it through FastAPI endpoints.
1/ Download the wine quality dataset from Kaggle and save it as winequality.csv
inside the dataset
folder in your workspace.
2/ Install pandas
and scikit-learn
and freeze the requirements for reproducibility by executing the commands below:
pip install pandas scikit-learn
pip freeze > requirements.txt
3/ Create a new file, train.py
, to train and save the final model. The file should contain the following code:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import joblib
# Load the dataset
file_path = 'dataset/winequality.csv' # Update with the correct file path if needed
df = pd.read_csv(file_path)
# Preprocessing: Separate features and target
X = df.drop('quality', axis=1)
y = df['quality']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a Random Forest Classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))
# Save the model to disk
model_path = 'saved_model/wine_quality_model.pkl' # Specify the desired path to save the model
joblib.dump(model, model_path)
print(f"Model saved to {model_path}")
The above code uses libraries like pandas and scikit-learn to load the data and train a random forest classifier. After the training is complete, the code saves the model locally to the directory named saved_model
.
4/ Run the train.py
script using python train.py
. You should now see the final saved model in saved_model
directory.
At this point, your directory structure should look something like this:
.
├── dataset
│ └── winequality.csv
├── requirements.txt
├── saved_model
│ └── wine_quality_model.pkl
└── train.py
5/ You will also need to create an API for deployment, so install FastAPI. Since you have installed a new library, you will also need to freeze the dependencies.
pip install "fastapi[standard]"
pip freeze > requirements.txt
6/ In the main.py file, load the saved model and expose an endpoint for your users:
from fastapi import FastAPI
import numpy as np
import joblib
from pydantic import BaseModel
import starlette
from typing import List
app = FastAPI()
@app.get("/")
def read_main():
return {"message": "Welcome"}
class WineData(BaseModel):
fixed_acidity: float
volatile_acidity: float
citric_acid: float
residual_sugar: float
chlorides: float
free_sulfur_dioxide: float
total_sulfur_dioxide: float
density: float
pH: float
sulphates: float
alcohol: float
@app.post("/winequality/")
def analyze_wine_quality(wine_data: WineData):
classifier = joblib.load("saved_model/wine_quality_model.pkl")
print("The data is \n")
print(wine_data)
predict_data = [
wine_data.fixed_acidity,
wine_data.volatile_acidity,
wine_data.citric_acid,
wine_data.residual_sugar,
wine_data.chlorides,
wine_data.free_sulfur_dioxide,
wine_data.total_sulfur_dioxide,
wine_data.density,
wine_data.pH,
wine_data.sulphates,
wine_data.alcohol,
]
predict_data = np.array(predict_data).reshape(1, -1)
prediction = classifier.predict(predict_data)
print("thre predictin is")
print(prediction, type(prediction))
return_obj = {"quality": int(prediction)}
return return_obj
In the code above, you will see that the function, analyze_wine_quality(wine_data: WineData)
defines the endpoint @app.post("/winequality/")
. The function loads the saved model and uses it to predict the quality of the wine.
7/ You will now be able to send an API request to examine the wine quality. The application exposes a POST method at http://127.0.0.1:8000/winequality/
. You can use Postman to send requests as shown in the image below.
8/ Finally, you will need to create a Docker image for your application so that you can register it with Argo CD. As such, create a Dockerfile
with the following contents:
# Use the official Python base image
FROM python:3.11-slim
# Set the working directory inside the container
WORKDIR /app
# Copy the requirements file into the container
COPY requirements.txt .
# Install the Python dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
# Copy the application code into the container
COPY . .
# Expose the FastAPI app's default port (8000)
EXPOSE 8000
# Command to run the FastAPI app using uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Create a Docker image using the command:
docker build -t fastapi-app .
Now, you should see a new image fastapi-app
in the list of your Docker images.
At this stage, you are ready to go back to Argo CD and continue deployment.
Deploying API to Argo CD cluster
With the ML model trained and the API set up, the next step is to use Argo CD to expose the trained machine learning model via the API. To do so, you will need to follow the steps below:
1/ Set the current namespace to argocd
by running the following command:
kubectl config set-context --current --namespace=argocd
2/ Create deployment.yaml
and svc.yaml
file. For your convenience, you won’t need to separately create them as they have been created for you in this repository inside the fastapi
folder.
3/ Create the example application with the following command:
argocd app create fastapi --repo https://github.com/bhattbhuwan13/argocd-example-apps.git --path fastapi --dest-server https://kubernetes.default.svc --dest-namespace default
To confirm that the application has been created, view its status:
argocd app get fastapi
Name: argocd/fastapi
Project: default
Server: https://kubernetes.default.svc
Namespace: default
URL: https://127.0.0.1:8080/applications/fastapi
Source:
- Repo: https://github.com/bhattbhuwan13/argocd-example-apps.git
Target:
Path: fastapi
SyncWindow: Sync Allowed
Sync Policy: Manual
Sync Status: OutOfSync from (36a9d33)
Health Status: Missing
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
Service default ml-api OutOfSync Missing
apps Deployment default ml-api OutOfSync Missing
4/ The application status is initially in OutOfSync
since the application has yet to be deployed, and no Kubernetes resources have been created. To deploy the application, run:
argocd app sync fastapi
5/ You should now be able to see a healthy application running in the UI at https://localhost:8080/applications/argocd/fastapi
.
Collaborating and sharing using KitOps
At this point, you have a trained model deployed on a local Kubernetes cluster. Now, if you want to share your code and artifacts with other engineers on the team, how do you do that? Well, one great tool is KitOps.
KitOps is an open-source project designed to enhance collaboration among stakeholders in AI/ML projects. At the heart of KitOps is the ModelKit, an OCI-compliant packaging format that allows smooth sharing of all necessary artifacts involved in the AI/ML model lifecycle. Key benefits of ModelKit include:
Version-controlled and secured packaging:
Combine all project artifacts into a single bundle with versioning and SHA checksums for integrity.Seamless integration:
Works with OCI-compliant registries (e.g., Docker Hub and Jozu Hub) and integrates with popular tools like HuggingFace, ZenML, and Git.Effortless dependency management:
Ship dependencies alongside code for hassle-free execution.
Installing KitOps and sharing the project
To install Kit, you need to download the package, unarchive it, and move the kit executable to a location where your operating system can find it. In Linux, you can achieve this by running the following commands:
wget https://github.com/jozu-ai/kitops/releases/latest/download/kitops-linux-x86_64.tar.gz
tar -xzvf kitops-linux-x86_64.tar.gz
sudo mv kit /usr/local/bin/
For Windows and MacOS, please visit the official site which contains the installation instructions.
Verify your installation by running the command kit version
. Your output should look something like this:
Version: 0.2.5-29dbdc4
Commit: 29dbdc48bf2b5f9ee801d6454974e0b8474e916b
Built: 2024-06-06T17:53:35Z
Go version: go1.21.6
Once you have installed Kit, you will need to write a Kitfile to specify different components of your code that need to be packaged. You can use any text editor to create a new file Kitfile
without any extension and enter the following details:
manifestVersion: "1.0"
package:
name: Wine Classification
version: 0.0.1
authors: ["Bhuwan Bhatt"]
model:
name: wine-classification-v1
path: ./saved_model
description: Wine classification using sklearn
datasets:
- description: Dataset for the wine quality data
name: training data
path: ./dataset
code:
- description: Code for training
path: .
There are 5 major components in the code snippet above:
- manifestVersion: Specifies the version for the Kitfile.
- package: Specifies the metadata for the package.
- model: Specifies the model details, which contain the model's name, its path, and human-readable description.
- datasets: Similar to the model, specifies the path, name, and description for the dataset.
- code: Specifies the directory containing code that needs to be packaged.
Once the Kit command line tools are installed and Kitfile is ready, you will need to log in to a container registry. To log in to DockerHub, use the below command:
kit login docker.io # Then enter details like username and password, password is hidden
You can then package the artifacts into a ModelKit using the following command:
kit pack . -t docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: `kit pack . -t docker.io/bhattbhuwan13/wine_classification:v1`
Finally, you can push the ModelKit to the remote hub:
kit push docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit push docker.io/bhattbhuwan13/wine_classification:v1
Now, developers can pull required components from the ModelKit or the entire package using a single command. They can unpack specific components from the ModelKit:
kit unpack --datasets docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit unpack --datasets docker.io/bhattbhuwan13/wine_classification:v1
Or, they can unpack the entire ModelKit in their own instance:
kit unpack docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit unpack docker.io/bhattbhuwan13/wine_classification:v1
At this stage, developers can run the necessary tests to verify that the model or code works as expected. Once the tests run successfully, they use Argo CD to deploy the model to the production server by simply changing the cluster location in the above steps or by following this guide.
Get involved with the KitOps community
Thanks to the support and feedback of our community, KitOps is rapidly evolving. In fact, we jus released KitOps v1.0, and are actively looking for design partners to help us shape our roadmap.
If you are interested in learning more about KitOps or meeting our team, please reach out on our community Discord channel.
Top comments (0)