Christian Lechner

Posted on Feb 25

Automating an Open Source Project with GitHub Actions

#devops #github #githubactions #automation

Introduction

If you are working on GitHub and maybe even maintain an open-source project, GitHub Actions are an excellent tool to automate tasks going far beyond "typical" CI/CD automation.

In this blog post I describe the GitHub Action setup that we are running in open-source projects in the context of developing a tool for working with Terraform/OpenTofu namely the import of existing setups.

Project Parameters

In this blog post we will focus on the GitHub repository https://github.com/SAP/terraform-exporter-btp and the setup therein. The project is an open-source project and contains a CLI that is written in Golang.

The CLI handles some Terraform task around the import of existing resources, so it also contains some Terraform configurations in the Hashicorp Configuration Language (HCL) mainly dealing with the setup for integration tests.

All changes to the repository must be made via pull requests. This is safeguarded by protecting the main branch via a ruleset. The configuration is part of the repository settings.

The CLI is published as a release using GoReleaser to build the release.

All interaction with the community is happening via the repository namely via GitHub issues and GitHub Discussions including milestones that are part of the issues.

From a development project perspective i.e., for managing the backlog, we are using GitHub Projects. We work on one central project that contains the issues of all our Terraform specific GitHub repositories.

The documentation of the CLI is also part of the repository and published to a GitHub page of the repository.

As we are part of an organization that belongs to a corporate, two more requirements come into play:

The repository should be attached to Sonar Cloud to have a standardized way of monitoring code quality
The security vulnerability reporting is managed via a central approach of the organization. Hence, we have not activated GitHubs's private vulnerability reporting.

Our Goal

Based on these parameters and requirements, we wanted to automate as many repetitive tasks that we face in our daily work and keep the overall quality of the repository and its contents as high as possible.

In addition, we wanted to make the life of the users that file issues or contribute code as easy as possible.

This comprises of course the code per se but also more generic tasks like handling of issues and PRs.

As we are on GitHub, it is a natural choice to use GitHub Actions for this goal. Let's see what we automated, how we did that and what to consider.

GitHub Action Areas

We use GitHub Actions (or other automation capabilities of GitHub) in different areas. We will discuss the different areas in dedicated sections.

We will cover the following topics:

Basic project hygiene
Issues
Pull request
CI tasks esp. testing
Integration Tests
Releases
Documentation Generation

Let us walk through the different areas and learn what approach we took.

Basic project hygiene

Some basic functionality is always used when we work on GitHub namely:

Dependabot to keep your projects up to date. We configured this via dependabot.yml to cover the ecosystems used in our project like Golang or GitHub Actions.
CodeQL, for vulnerability scanning of the code. We use a configuration defined in a codeql.yml file. CodeQL is triggered upon pull requests, pushes to the main branch as well as periodically

We also have switched on Secret Scanning for generic secrets. This configuration is part of the repository settings and available under the Code Security settings.

As we have markdown files in our repository that contain links to places in the repository or to further external documentation we want to:

have spell checking of the markdown files
have a periodic checking of the links because what is more annoying than clicking on a link and then getting a 404.

Both requirements are covered by GitHub Actions.

Checking the links that are part of the repository is handled by a periodically executed workflow. The core of the check is the lychee link checking action.

A nice feature of this action is that if issues get detected like a site not being available a report gets created by the action.
We use this report to create an automated issue based on this report via the Create Issue From File action. This way we do not need to check the results of the GitHub Action runs but see it immediately in our issues.

Of course we do not want to have multiple issues for that, so if an issue exists and the action runs again it should update the existing issue. We achieve this, by combining Create Issue From File action with the Find Last Issue. To identify the issue, we use issue labels. Luckily the two action work perfectly together to cover this requirement.

With these basic automations in place, let us look at the next topic namely issues.

Issues

We use issues for bug reports as well as for feature requests. There are also freestyle issues that we might use internally, but the main interaction with the users is happening via bug reports and feature requests.

In general, we offer two different issue templates for bugs and feature requests to make the data entry and filing the necessary information as easy as possible.

For the optimal convenience for our users we use issue forms as they make the data entry much easier by offering features like checkboxes, drop down menus etc. The template also comprises the labels that get attached to the issue once created.

What to automate once an issue is filed. First, we do not want to manually add the issues to our central project board and so we trigger a workflow whenever an issue is opened that does this for us.

Not too much work from our end as GitHub provides an action for this called actions/add-to-project. As GitHub projects are separate entities that are not directly linked to the repository a GitHub token is needed that contains the right permissions.

For feature requests we want to make people aware that we prioritize based on the feedback from the community. To make these rules of the game obvious, we add an automated comment with a community note to every feature request using the Create or Update Comment action that adds a predefined text as a comment.

There is one scenario that probably has crossed your path as a maintainer of a project: what if you need to clarify something that was brought up via an issue?

Whenever we need a clarification from a reporter, we add a tag named needs-author-feedback to the issue. This label gets removed whenever a new comment is added via a (you already guess it) workflow that uses the Action Remove Labels.

What if the reporter does not answer? At some point in time, you probably want to get rid of the issue. We handle this by another workflow using a action provided by GitHub called Close Stale Issues and PRs.

We use this in a periodic workflow to first add a stale message after 15 days of no reply and finally close the issue after additional 5 days. Every step as accompanied by a comment on the issue that gets created by the workflow. This keeps our repository clean without the need to continuously check for stale issues manually.

Note - What about questions? We use GitHub Discussions to cover this area, but there is no automation needed for us so far in this context.

That covers the handling of the issues. Let us look next at pull requests.

Pull requests

As mentioned before, we require pull requests (PRs) for any code change in the repository. To have a structured code review process we have a pull request template in place to keep the PRs uniform and help the provider as well as the reviewer with the main points that need to be described.

There are some repetitive tasks that need to be done for each PR and as we are lazy, the following tasks are delegated to a workflow:

Adding the PR to the central project board
Adding the creator of the PR as assignee except for PRs opened by Dependabot
Adding the next open milestone
Setting the default labels based on the PR title

We add the PR to the project board via the same action that we use for issues. The other three tasks are a bit more coding to do as we did not find any fitting Actions on the GitHub Marketplace. Fortunately, it is quite easy to fulfill the requirements with a bit of Bash or Node.js code.

Note - When diving into the GitHub API to find the right endpoints you might be stunned that there is only a very limited number of dedicated endpoints for pull request. The reason for this is that technically pull requests are issues with some features on top. Check out the endpoints for issues and you will probably find what you are looking for.

To make the creator of the PR the assignee, we use the GitHub CLI that is available in the GitHub-hosted runners with this code snippet:

 - name: Add creator as assignee
   if: ${{ github.actor != 'dependabot[bot]' }}
   env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
   run: |
     gh api \
     --method POST \
     -H "Accept: application/vnd.github+json" \
     -H "X-GitHub-Api-Version: 2022-11-28" \
     /repos/${{github.repository}}/issues/${{github.event.number}}/assignees \
      -f "assignees[]=${{github.actor}}"

We use the information available in the action about the actor (github.actor) and call the API for the assignees. This is straightforward.

What about the next open milestone and the labels?

There is some more logic needed here, so we decided to use a small piece of Node.js code to achieve our goals. As a basis we use the GitHub Script action that contains a pre-authenticated Octokit REST client to call the API endpoints.

For the milestone requirement we must get the next open milestone and assign it to the PR. This is done via:

- name: Add next milestone
  uses: actions/github-script@v7
  with:
   script: |
     const milestones = await github.rest.issues.listMilestones({
       owner: context.repo.owner,
       repo: context.repo.repo,
       state: "open",
       sort: "due_on",
       direction: "asc"
     })

     await github.rest.issues.update({
        owner: context.repo.owner,
        repo: context.repo.repo,
        issue_number: context.issue.number,
        milestone: milestones.data[0].number
     });

The GitHub Script action helps a lot as you can directly interact with the API without wrapping your head around authentication. You also get the context information of the workflow injected. For larger pieces of code, I prefer dedicated files that I then call via Node.js instead of putting the code in the action. For this, this might be a bit too much though.

As a last piece we add the labels derived from the PR title that must follow the conventional commits. We use the keywords of the commit to distinguish which label to set:

- name: Set default labels
  if: ${{ github.actor != 'dependabot[bot]' }}
  uses: actions/github-script@v7
  env:
    TITLE: ${{ github.event.pull_request.title }}
  with:
    script: |
      const title = process.env.TITLE;

      let defaultLabels = [];
      if (title.startsWith("feat:")) {
        defaultLabels.push("enhancement");
      } else if (title.startsWith("fix:")) {
        defaultLabels.push("bug");
      } else if (title.startsWith("docs:")) {
        defaultLabels.push("documentation");
      } else if (title.startsWith("test:")) {
        defaultLabels.push("test setup", "internal", "ignore-for-release");
      } else if (title.startsWith("refactor:")) {
        defaultLabels.push("internal", "ignore-for-release", "refactoring");
      } else {
        defaultLabels.push("internal", "ignore-for-release");
      }

      await github.rest.issues.addLabels({
        owner: context.repo.owner,
        repo: context.repo.repo,
        issue_number: context.issue.number,
        labels: defaultLabels
      });

We exclude PRs filed via Dependabot.

Now the reviewer can focus on the review and does not need to click around to fulfill organizational requirements. Don't get me wrong: the labels and the assignment to the project are important, but this could and should be automated.

Note - Maybe you ask yourself if I am aware of the GraphQL endpoints of GitHub. Yes I am, but up to now I had only one situation where I saw a benefit in using the GraphQL API without breaking my hands to get a result. You can achieve all what is mentioned here with GraphQLs, but I am happy with the REST endpoints ... maybe I am also not smart enough to use GraphQL 😉

As for issues we might run into stale PRs. Their handling uses the same workflow as for stale issues described in the previous section.

When opening a PR you also want to know if the code follows your quality standards, namely if tests pass etc. In the next section we describe what we automated when it comes to this area.

CI

When a PR is opened, the following workflows get triggered:

We execute a spell check to see if any typos made their way into the PR.
We validate the PR title to ensure that it follows the conventional commits logic. This is done via a workflow that uses the action-semantic-pull-request.

In addition we have one main test workflow that bundle several checks, namely:

Can the Go project be built?
Are the unit tests executed successfully?
Is the documentation that is generated for the CLI is up to date?

All these parts are Go CLI commands executed in the runner of the workflow.

As a last step we send the test coverage report from the unit tests to Sonar Cloud using the SonarQube Scan Action.

Note - There is one important thing to mention here when it comes to external tools and Dependabot. You usually use an API key or alike to be able to call external tools like Sonar Cloud. You store this information as a secret in your GitHub repository.

This works fine until a PR gets opened by Dependabot. For security purposes GitHub decided to not propagate the secret if the actor is Dependabot. To make the secret accessible to Dependabot, you must define them as Dependabot secrets.

These checks running for a PR are also defined as required status checks that must pass as part of the main branch protection ruleset.

Integration Test

Besides the unit tests that are part of the test workflow we also implemented a integration test workflow.

This workflow is more complex and interacts with real infrastructure. The idea is to execute the CLI functionality which imports infrastructure into Terraform and compare the created state with the expected state on resource level.

The main flow is:

Install the CLI build from source
Execute the CLI to execute the Terraform code generation
Execute a state import
Transfer the state into JSON format
Compare the newly created state with a reference state

The comparison of the two JSON states is done via a Node.js script. The script is called as part of the GitHub Action and compares the two state files in JSON format.

The workflow configuration to achieve this is mostly around bringing the state files in the right format. What is worth to mention is the storage of the reference state file.

We store this as a secret in the GitHub repository. There is only one small obstacle: you cannot directly store a JSON file as a secret.

Consequently, we use a little trick: we encode the JSON file as a base64 string and store the encoded string as a secret. When the workflow is executed we take the secret, decode it and pass it to the Node.js script.

As we have all tests automated and in place, time for the fun part, releasing a new version.

Release

The moment of truth is for sure the creation of a new release of the CLI. We do so by pushing a new tag to the repository which triggers ...(drumroll) a workflow.
This workflow uses the concept of reusable workflows to trigger the tests before the release via GoReleaser starts. Better safe than sorry!

Be aware that you might need to explicitly propagate secrets to the called workflows (see documentation).

The workflow executes the following steps:

Call of the test workflow
Call of the integration test workflow
Execution of the GoReleaser if the test workflows have been executed successfully
Generation of the documentation and deployment as GitHub page

Here is the screenshot of a release run:

The release notes are also created automatically via the GoReleaser step using GitHub native features. The format is defined in a dedicated configuration mapping the labels to the sections and getting a nice formatting with emojis

Note - There is also a pre-release check workflow that contains the two test workflows. We trigger this one manually when a release date is approaching to avoid unwanted surprises on release day.

Generation of GH Pages

The documentation of the CLI is provided via a GitHub page as part of the repository. We are using MkDocs to generate the content, but I think most of the tools in that area are well integrated with GitHub and GitHub Actions.

The corresponding workflow is therefore short comprising the installation of the MkDocs CLI and then calling it with the gh-deploy option.

This workflow can be triggered manually but is also integrated into the release process as final step. So, no manual hands-on needed for the documentation part.

What about Automating GitHub Projects?

GitHub Projects offer built-in automation that we use to automatically set the status of an issue in the project namely on the events of:

adding
closing
reopening

an issue.

We did not yet run into the need to add GitHub Actions here; however, it would be possible.

Points to Consider

There are two big point to consider when using GitHub Actions:

To execute a workflow the configuration file must land in the main branch. From then on you can run it from a branch. So, whenever you add a new GitHub Action, you are either a GitHub magician or you probably have to have at least two PRs end up with a running action
GitHub does not offer a way to locally run a workflow. There are projects that try to fill this gap. One prominent example is act. However, I had it running on a Windows machine until it did not work anymore (probably a Docker update killed it) and I also had some issues on a Mac with Apple silicon. Again, probably due to Docker, but at the end it is cumbersome for the user, no matter what the root cause is.

These are points that you must take into account when developing workflows. A bit of inconvenience, but from my experience worth it (and maybe you have more luck with act which would remove at least some pain).

Conclusion

Maintaining one or maybe several open-source projects is work. Even if it is part of your job, you probably want to focus on improving the features and functions of the project and not deal with tedious repetitive tasks, that could be automated.

In this blog post, we described the areas where we used GitHub Actions to automate these tasks by either using and combining existing Actions available on the GitHub Marketplace or using some custom bash or Node.js Code in combination with GitHub API to get stuff automated.

Maybe you could get some inspiration on what is possible, maybe you know better ways to achieve this. If this is the case, I am curious to learn about that in the comments!

What comes next for us? We are continuously evaluating if there are other points worth automating. Currently there is no hot take on our list, but let's see what the future will bring

With that - happy automating with GitHub!

DEV Community