In the realm of technology, ZooKeeper plays a crucial role in maintaining order within distributed systems. However, much like any compelling narrative, I recently encountered a stumbling block marked by an unexpected challenge – an EOFException paired with an intriguing 0-length file.
The Error:
ERROR [main:Util@214] - Last transaction was partial.
ERROR [main:ZooKeeperServerMain@66] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
An EOFException occurred, causing our ZooKeeper server to crash.
The Mysterious 0 Length File:
Navigating through the ZooKeeper folder, I saw something strange:
-rw-rw-r-- 1 zookeeper zookeeper 1342177280 Sep 23 23:17 log.be45c2
-rw-rw-r-- 1 zookeeper zookeeper 0 Oct 1 14:31 log.be85e9
Digging Deeper:
The zero file hinted at trouble with our transaction logs. Looking into the logs, we found the last transaction was a bit messed up, triggering the EOFException. But why did it result in a zero file?
Real Issue: Running Out of Space
There was a 0 Length File because the underlying disk utilization was full and no space was left on the device to write the logs.
Filesystem Size Used Avail Use% Mounted on
/dev/sde 25G 25G 0 100% /var/lib/zookeeper
As per the Zookeeper official documentation here
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
Using the PurgeTxnLog utility
zookeeper@zk-0:/bin$ zkCleanup.sh
Usage:
PurgeTxnLog dataLogDir [snapDir] -n count
dataLogDir -- path to the txn log directory
snapDir -- path to the snapshot directory
count -- the number of old snaps/logs you want to keep, value should be greater than or equal to 3
Keep the latest 5 logs and snapshots
./zkCleanup.sh -n 5
Conclusion:
In the dynamic landscape of distributed systems, unexpected errors often hint at underlying complexities. Our journey through the EOFException and zero file quandary emphasizes the importance of vigilant monitoring of logs and disk space. As tech custodians, we must be prepared to decode mysteries, address core issues, and ensure the seamless operation of systems like ZooKeeper.
ZooKeeper, with its unique quirks and challenges, continues to be an exciting playground for tech enthusiasts eager to navigate the intricacies of system synchronization.
Top comments (0)