DEV Community

Cover image for How Does Kafka Log Compaction Work?
clasnake
clasnake

Posted on • Originally published at nootcode.com

How Does Kafka Log Compaction Work?

What is Log Compaction?

Log Compaction is Kafka's intelligent way of managing data retention. Instead of simply deleting old messages, it keeps the most recent value for each message key while removing outdated values. This approach is especially valuable when you need to maintain the current state of your data, such as with database changes or configuration settings.

Kafka Log Compaction

How Log Compaction Works

1. Log Storage Structure

Kafka divides the log into two segments:

  • Clean Segment: Data that has been compacted
  • Dirty Segment: New data waiting for compaction

2. Compaction Process

The compaction process consists of two main phases:

  1. Scanning Phase:

    • Scans through all messages in the Dirty segment
    • Creates an index of message keys and their latest positions
  2. Cleaning Phase:

    • Preserves only the most recent record for each key
    • Removes outdated duplicate records
    • Maintains the original message sequence

3. Compaction Triggers

Compaction kicks in when:

  • Uncompacted data ratio exceeds threshold
  • Scheduled time interval is reached
  • Manual compaction is triggered

How to Configure Log Compaction?

Here's how to set up log compaction:

# Enable log compaction
log.cleanup.policy=compact

# Set compaction check interval
log.cleaner.backoff.ms=30000

# Set compaction trigger threshold
log.cleaner.min.cleanable.ratio=0.5

# Set compaction thread count
log.cleaner.threads=1
Enter fullscreen mode Exit fullscreen mode

Use Cases

Log compaction is best suited for the following scenarios:

1. Database Change Records

Example of user information updates:

  • Initial record: key=1001, value=John
  • Update record: key=1001, value=John Smith
  • After compaction: key=1001, value=John Smith

2. System Configuration Management

Example of connection settings:

  • Initial config: key=max_connections, value=100
  • Updated config: key=max_connections, value=200
  • After compaction: key=max_connections, value=200

3. State Data Storage

  • Maintain latest entity states
  • Save storage space

Important Considerations

When using log compaction, keep these points in mind:

  1. Messages Must Have Keys

    • Only messages with keys can be compacted
    • Keyless messages will remain untouched
  2. Impact on System Performance

    • Compaction process consumes system resources
    • Configure parameters appropriately
  3. Message Order Guarantees

    • Messages with the same key stay in order
    • Ordering between different keys isn't guaranteed

Summary

Kafka's log compaction offers a smart way to manage our data retention needs. It's perfect for cases where we only need the latest state of your data, helping you save storage space while keeping your data accessible. When properly configured, it can significantly improve our Kafka cluster's efficiency.

Related Topics:

Top comments (0)