Saturday, 14 March 2026

Proxmox Backup Server - Recover a ZFS Datastore that is 100% full

This has happened to me a couple times now where my ZFS pool has filled to 100% causing not only backups to fail, but also the ability to run garbage collection. When this occurs you for example might see errors like:

TASK ERROR: atime safety check failed: update atime failed for chunk/file "/mnt/datastore/ZFS-HDD/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - ENOSPC: No space left on device


The way I have fixed this is to temporarily reduce the reserved ZFS filesystem space, run immediately a garbage collection, then increase the reserved space back to what it was. The steps are:

1. Run a Prune job and flag some backups for deletion, so that the garbage collection routine has something to actually remove.
2. In the shell, run:
echo 10 > /sys/module/zfs/parameters/spa_slop_shift

3. IMMEDIATELY run the garbage collection against the datastore. Expect this to be slow going. Make sure no backups try to run during this time.
4. After the garbage collection has freed up space, make sure you set the ZFS space to what it was before by running
echo 5 > /sys/module/zfs/parameters/spa_slop_shift

5. Run a verify job against your backups to detect any issues from running out of space.


Done! If everything went as planned, your datastore is functional again and backups can now resume.

No comments:

Post a Comment

Proxmox Backup Server - Recover a ZFS Datastore that is 100% full

This has happened to me a couple times now where my ZFS pool has filled to 100% causing not only backups to fail, but also the ability to ru...