Forem

Cover image for Automating ML Pipeline with ModelKits + GitHub Actions
Jesse Williams for KitOps

Posted on • Edited on • Originally published at jozu.com

Automating ML Pipeline with ModelKits + GitHub Actions

Building machine learning (ML) applications doesn’t end with training the models. Managing machine learning models often involves juggling multiple components—code, metadata, documentation, and more. Without a clear structure, this complexity can slow down development and create bottlenecks during deployment. To tackle these challenges, you need tools and workflows that simplify the process, ensure consistency, and support automation.

ModelKits offer a way to package models with their artifacts (like code, metadata, and documentation) into a single, consistent unit. When combined with GitHub Actions, these tools help you build CI/CD pipelines to automate key tasks like unpacking, testing, and deployment.

This guide will walk you through integrating ModelKits with GitHub Actions to create reliable workflows for machine learning applications. By the end, you’ll know how to automate model operations and streamline deployment processes.

Prerequisites

To follow along in this tutorial, you need the following:

  1. A GitHub account: Create a GitHub account by following the steps. Similarly, create a GitHub repository. In this article, the repository used is called kitops-githubactions.
  2. A container registry: You can use Jozu Hub, the GitHub Package registry, or DockerHub. This guide uses Jozu Hub.
  3. KitOps: Check out the guide on installing KitOps.
  4. Familiarity with GitHub Actions basics: You’ll be working with workflows, jobs, and runners to automate your pipeline. In particular:
    • Workflows define the pipeline and consist of event triggers, jobs, and steps.
    • Events trigger your GitHub Actions workflow. Triggers could be pushing to a GitHub branch, pull request, workflow_dispatch, etc.
    • Jobs contain steps that execute specific tasks, such as building or pushing ModelKits.
    • Runners are virtual machines that execute your workflows. This guide uses GitHub-hosted runners with an Ubuntu environment.
    • Actions are custom steps you can combine to create jobs. An example of an action could be checking out your repository. KitOps has an action that enables you to download the Kit CLI and add it to the path.

Install KitOps

First, you must make sure you have the Kit CLI installed locally. Once installed, run the command below to verify the installation:

kit version
Enter fullscreen mode Exit fullscreen mode

You should see an output like the one shown in the image:

KitOps version

Login to your Jozu Hub account and create a repository. In this article, the name of the Jozu Hub repository you will use is llama3-githubactions.

Create a Jozu Hub repository

Unpack the LLAMA3 ModelKit

On your local terminal, run the command below:

kit unpack jozu.ml/jozu/llama3-8b:8B-instruct-q5_0

This will automatically create new files for you such as a: Kitfile, LICENSE, README.md, USE_POLICY.md, and llama3-8b-8B-instruct-q5_0.gguf.

If you open the Kitfile, you should see the contents as:

manifestVersion: 1.0.0
package:
  name: llama3
  version: 3.0.0
  description: Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
  authors: ['Meta Platforms, Inc.']
model:
  name: llama3-8b-8B-instruct-q5_0
  path: ./llama3-8b-8B-instruct-q5_0.gguf
  license: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
  description: Llama 3 8B instruct model
code:
  - path: LICENSE
    description: License file.
  - path: README.md
    description: Readme file.
  - path: USE_POLICY.md
    description: Use policy file.
Enter fullscreen mode Exit fullscreen mode

Let’s organize these files into folders and modify the Kitfile. Having an organized directory helps in improving readability and enabling easier collaboration. Your directory structure should look like this:

|-- models
        |-- llama3-8b-8B-instruct-q5_0.gguf
|-- docs
        |-- README.md
        |-- USE_POLICY.md
        |-- LICENSE
|-- Kitfile
Enter fullscreen mode Exit fullscreen mode

Here, create a models folder and move the llama3-8b-8B-instruct-q5_0.gguf file from the root directory to the models folder. Similarly, create a docs directory and move the README.md, USE_POLICY.md, and LICENSE files into it.

Modify your Kitfile to reflect the new directory structure.

manifestVersion: 1.0.0
package:
  name: llama3
  version: 3.0.0
  description: Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
  authors: ['Meta Platforms, Inc.']
model:
  name: llama3-8b-8B-instruct-q5_0
  path: models/llama3-8b-8B-instruct-q5_0.gguf
  license: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
  description: Llama 3 8B instruct model
code:
  - path: docs/LICENSE
    description: License file.
  - path: docs/README.md
    description: Readme file.
  - path: docs/USE_POLICY.md
    description: Use policy file.
Enter fullscreen mode Exit fullscreen mode

There are four major components in the code snippet above:

  • manifestVersion: Specifies the version for the Kitfile.
  • package: Contains the metadata for your LLAMA3 package.
  • model: Specifies the model metadata, such as the model's name, its path, and a description.
  • code: Specifies the directory containing docs that need to be packaged.

Now your Kitfile is ready, let’s create a CI/CD pipeline for automatically packing, tagging, and pushing to the Jozu Hub repository you created earlier.

Integrate with GitHub Actions

Before creating your workflow YAML file, you need to configure some secrets. These secrets are your Jozu Hub email and password, which you will later use for authenticating your GitHub Actions runner with your Jozu account. To create one, go to Settings in your GitHub repository.

Settings

Under the Security section*,* expand the Secrets and variables dropdown and click on Actions.

Actions Secrets

Add two secrets, i.e., JOZU_EMAIL and JOZU_PASSWORD.

Secrets created

Now that your secrets are created, the next step here is to create your workflow file.

Create your workflow file
In the root directory of your repository, create a folder called .github/workflows. Inside this folder, create a file called workflow.yml. Paste the following code into the workflow.yml file:

name: Deploy LLAMA3 to Jozu Hub
on:
  push:
    branches:
      - master
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
  pull-requests: write
  issues: write
  actions: write
env:
  ARTIFACT_NAME: jozu-artifact
  REPOSITORY_NAME: llama3-githubactions
  TAG: latest
  USERNAME: emmanueloffisongetim

jobs:
  unpack-to-model:
    name: unpack-large-model
    runs-on: ubuntu-latest
    steps:
      - name: checkout repository
        uses: actions/checkout@v4
      - name: install kit
        uses: jozu-ai/gh-kit-setup@v1.0.0
      - name: kit version
        shell: bash
        run: |
            kit version

      - name: unpack llama3 model to models folder
        shell: bash
        run: kit unpack jozu.ml/jozu/llama3-8b:8B-instruct-q5_0 --model -d models

      - name: upload-artifact
        id: upload-artifact
        uses: actions/upload-artifact@v4
        with:
            name: ${{env.ARTIFACT_NAME}}
            path: .
            overwrite: true


  push-to-jozuhub:
    name: push-to-jozu
    runs-on: ubuntu-latest
    needs: unpack-to-model
    steps:
      - name: download-artifact
        uses: actions/download-artifact@v4
        with:
            name: ${{env.ARTIFACT_NAME}}

      - name: Display structure of downloaded files
        run: ls -R
      - name: install kit
        uses: jozu-ai/gh-kit-setup@v1.0.0
      - name: login-to-jozuhub
        shell: bash
        env: 
            JOZU_EMAIL: ${{secrets.JOZU_EMAIL}}
            JOZU_PASSWORD: ${{secrets.JOZU_PASSWORD}}
        run: kit login jozu.ml -u $JOZU_EMAIL -p $JOZU_PASSWORD
      - name: pack-modelkit
        shell: bash
        env:
            REPOSITORY_NAME: ${{env.REPOSITORY_NAME}}
            TAG: ${{env.TAG}}
            USERNAME: ${{env.USERNAME}}
        run: kit pack . -t jozu.ml/$USERNAME/$REPOSITORY_NAME:$TAG
      - name: push-modelkit
        shell: bash
        env:
            REPOSITORY_NAME: ${{env.REPOSITORY_NAME}}
            TAG: ${{env.TAG}}
            USERNAME: ${{env.USERNAME}}
        run: kit push jozu.ml/$USERNAME/$REPOSITORY_NAME:$TAG
Enter fullscreen mode Exit fullscreen mode

This pipeline consists of two jobs: unpack-to-model and push-to-jozuhub.

  • unpack-to-model: This job installs the necessary kit, unpacks the LLAMA3 model into a models folder, and uploads the folder as a GitHub artifact.
  • push-to-jozuhub: This job depends on the unpack-to-model job. It logs in to Jozu Hub, packs your artifact and pushes to your Jozu Hub repository.

Since the LLAMA3 model is huge (approximately 5GB), pushing it directly to GitHub would significantly increase the repository's size unnecessarily. To avoid this, the model is unpacked within the pipeline and stored as an artifact, rather than adding it directly to the repository.
This approach optimizes storage and keeps the repository lightweight.

This pipeline is triggered by a push event to the master branch. It also includes the following pipeline-specific environment variables:

  • ARTIFACT_NAME: The name assigned to the GitHub artifact.
  • REPOSITORY_NAME: The name of your Jozu Hub repository.
  • USERNAME: Your Jozu username.
  • TAG: The tag assigned to the packed ModelKit.

Run your pipeline

When you push to the master branch, the pipeline is automatically triggered. You can view its execution by navigating to the Actions section of your repository, where a visual graph of the pipeline's workflow is displayed.

Run your pipeline

After your pipeline run is completed, you will see the ModelKit in your Jozu Hub repository.

Image in Jozu Hub

Adding a new version

In a typical machine learning scenario, you may want to deploy a new version whenever a change occurs. This change could be a modification in the data, an update to the model weights, a code change, or a new version of the documentation. Let’s simulate this type of change and redeploy a new version.

Create a new file in your docs folder named extra_features.md, and add the following content:

This is a random text to test the addition of a new version.
Enter fullscreen mode Exit fullscreen mode

When you push to the master branch, it triggers a new build. Once the build is complete, the new deployment should be available with the tag corresponding to your latest GitHub commit SHA. In this case, the tag is c7ca3f2

Deployment of new version

The example above demonstrates a modification to a documentation file, but similar versioning and deployment workflows apply to various other changes in a machine learning pipeline. These changes can include data modifications, such as updating training datasets, preprocessing steps, or feature engineering techniques. Since data quality directly impacts model performance, any change often requires retraining the model to maintain accuracy and relevance.

By incorporating GitHub Actions with Modelkits, you can streamline the deployment process and automatically create new releases whenever a change is made to your model or its artifacts.

Wrapping up

Manually deploying your AI projects every time you make a change can be time-consuming and frustrating. With GitHub Actions and KitOps, you can automate the building, testing, and deployment of your models and their dependencies seamlessly.

KitOps simplifies packaging models and managing dependencies, while GitHub Actions streamlines the deployment process by automatically triggering workflows whenever changes are pushed. This automation leads to faster, more reliable deployments and enhances team collaboration.

If you have questions about integrating KitOps into your workflow, join the conversation on Discord and start using KitOps today!

Top comments (0)