Raunak Jain

Posted on Feb 23

How to automatically remove completed Kubernetes Jobs created by a CronJob?

#kubernetes #devops

When you schedule recurring tasks with a CronJob, Kubernetes creates Jobs at the scheduled times. These Jobs run your tasks and then complete. Over time, completed Jobs can pile up and clutter your cluster. In this article, we will explain simple ways to automatically remove these completed Jobs. We use short sentences and simple words so that beginners can follow easily.

Introduction

CronJobs help you run tasks on a schedule in Kubernetes. Each time a CronJob runs, it creates a Job. After a Job finishes, it stays in the system until you remove it. If many Jobs accumulate, they can use cluster resources and make it hard to manage your environment.

It is a common need to clean up these completed Jobs automatically. Kubernetes offers built-in features to do this. You can set limits on how many completed or failed Jobs to keep. You can also use a field called TTLSecondsAfterFinished in the Job specification to remove Jobs after a set time.

For more details on running batch jobs with CronJobs, please see How do I run batch jobs in Kubernetes with Jobs and CronJobs.

Why Remove Completed Jobs?

When a Job finishes, it does not get deleted automatically. Over time, many completed Jobs can build up. This buildup can:

Use extra storage and API resources.
Make it hard to list and manage active Jobs.
Confuse monitoring and logging tools with outdated information.

Automatically removing completed Jobs keeps your cluster clean and reduces resource use. It also makes it easier to see which Jobs are still running or need attention.

Built-in Retention Settings in CronJobs

Kubernetes CronJobs come with settings that help manage the history of Jobs. Two important fields are:

successfulJobsHistoryLimit: This field tells Kubernetes how many successful (completed) Jobs to keep. For example, if you set it to 3, only the three most recent successful Jobs will be retained.
failedJobsHistoryLimit: This field tells Kubernetes how many failed Jobs to keep. If you set it to 1, only the most recent failed Job will remain.

These fields help automatically delete old Jobs. They are defined in the CronJob spec. Here is a simple example of a CronJob YAML that uses these settings:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: "0 * * * *"  # Run every hour
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: my-job
              image: busybox
              args:
                - /bin/sh
                - -c
                - "echo Hello World; sleep 30"
          restartPolicy: OnFailure

In this YAML file, Kubernetes keeps only the three most recent successful Jobs and one failed Job. Older Jobs are automatically removed. This setting is very useful for maintenance.

For guidance on writing Kubernetes YAML files for your deployments and services, check out How do I write Kubernetes YAML files for deployments and services.

Using TTLSecondsAfterFinished

Another method to remove completed Jobs is to use the TTLSecondsAfterFinished field in the Job spec. This field specifies the time (in seconds) that a Job should be kept after it finishes. Once the time is up, Kubernetes automatically cleans up the Job.

Note that TTLSecondsAfterFinished is a beta feature and must be enabled in some clusters. When it is available, you can add it to the jobTemplate in your CronJob. Here is an example:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-cronjob-ttl
spec:
  schedule: "0 * * * *"  # Run every hour
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 3600  # Remove Job 1 hour after completion
      template:
        spec:
          containers:
            - name: my-job
              image: busybox
              args:
                - /bin/sh
                - -c
                - "echo Hello with TTL; sleep 30"
          restartPolicy: OnFailure

In this YAML, each Job will be deleted 1 hour (3600 seconds) after finishing. This setting is handy if you want a time-based cleanup instead of a count-based cleanup.

How It Works

When you use successfulJobsHistoryLimit and failedJobsHistoryLimit, Kubernetes automatically checks the number of Jobs created by the CronJob. If the number exceeds the limits, Kubernetes deletes the oldest Jobs. This helps keep your Job list manageable.

The TTLSecondsAfterFinished field works differently. Kubernetes will wait until the Job has finished. Then, after the specified time has passed, the Job is removed automatically. This allows you to keep a completed Job for a short period, which can be useful for debugging or auditing.

For more on how to manage the lifecycle of pods and Jobs, you might find it helpful to read How do I manage the lifecycle of a Kubernetes pod.

Best Practices

Here are some best practices when configuring automatic removal of completed Jobs:

Set Reasonable Limits

Choose values for successfulJobsHistoryLimit and failedJobsHistoryLimit that fit your workload. Keeping a few old Jobs is useful for debugging but too many can clutter your environment.
Use TTLSecondsAfterFinished for Time-Based Cleanup

If your Jobs complete quickly and you do not need to keep them for long, use TTLSecondsAfterFinished. This is ideal for short-lived tasks.
Monitor Your CronJobs

Even with automatic cleanup, it is good to check your CronJobs regularly. Use kubectl get cronjob and kubectl get jobs to verify that cleanup is working as expected.
Test Changes in a Staging Environment

Before applying changes in production, test your CronJob settings in a development or staging cluster. This helps ensure that your cleanup settings work as intended without causing unintended job deletion.
Review Cluster Resources

Keeping too many completed Jobs can use up cluster resources like etcd storage. Automatic removal helps, but always monitor your cluster resource usage.

For a deeper understanding of how CronJobs work and how to manage batch jobs in Kubernetes, refer to How do I run batch jobs in Kubernetes with Jobs and CronJobs.

Troubleshooting

Sometimes, automatic cleanup settings might not work as expected. Here are a few troubleshooting tips:

Check YAML Configuration

Verify that you have correctly set the successfulJobsHistoryLimit, failedJobsHistoryLimit, or ttlSecondsAfterFinished fields in your CronJob YAML file. Use a YAML validator if necessary.
Inspect Job Objects

Use the command kubectl get jobs to see if old Jobs are being removed. If they are not, review your CronJob configuration.
Review Cluster Version and Feature Gates

The TTLSecondsAfterFinished feature is in beta in some versions of Kubernetes. Ensure your cluster supports this feature and that it is enabled.
Logs and Events

Check the events with kubectl describe cronjob my-cronjob to see if there are any error messages related to Job cleanup.

If you continue to face issues, consider reviewing Kubernetes documentation or seeking help from community forums.

For more ideas on writing and managing Kubernetes YAML, you might find How do I write Kubernetes YAML files for deployments and services very useful.

Advanced Techniques

For advanced users, you can combine both methods—using history limits and TTL. This approach gives you control over both the number of Jobs and the duration they are kept after completion. By fine-tuning these settings, you can optimize cluster performance and resource usage.

Another advanced approach is to use automation tools or scripts that periodically clean up Jobs. Although the built-in settings work well for most cases, custom scripts might be useful in special scenarios. These scripts can run as CronJobs themselves and delete Jobs based on custom criteria.

Summary and Final Thoughts

Automatically removing completed Kubernetes Jobs created by a CronJob is essential for keeping your cluster clean. You have two main options:

Retention Limits:

Use successfulJobsHistoryLimit and failedJobsHistoryLimit in your CronJob spec to limit how many completed Jobs are kept. This method removes the oldest Jobs when the limit is exceeded.
Time-Based Cleanup:

Use ttlSecondsAfterFinished in the Job spec to remove Jobs after a set time once they have finished.

Both methods can be combined to suit your needs. They help free up cluster resources and simplify management. Remember to monitor your CronJobs and test your settings in a safe environment before deploying to production.

For more insights on managing the lifecycle of your pods and Jobs, consider checking out How do I manage the lifecycle of a Kubernetes pod.

By following these practices and using the built-in features of Kubernetes, you can maintain a clean and efficient cluster. With proper setup, your CronJobs will run smoothly, and old Jobs will be automatically removed without manual intervention.

Happy coding and best of luck with your Kubernetes projects!

DEV Community

How to automatically remove completed Kubernetes Jobs created by a CronJob?

Introduction

Why Remove Completed Jobs?

Built-in Retention Settings in CronJobs

Using TTLSecondsAfterFinished

How It Works

Best Practices

Troubleshooting

Advanced Techniques

Summary and Final Thoughts

Top comments (0)

Read next

Deploying ML projects with Argo CD

Asas Kubernetes - Hari 1. Apa itu Kubernetes

Kubernetes Cost-Saving Secrets: A 50% Workload Cost Reduction Story

New DEV++ Deal: 50% Off Trueguard. Early-bird pricing ends this month!