Testing the self-healing of ZFS on Ubuntu 20.04

1. Overview

ZFS is a combined file system and logical volume manager. ZFS includes protections against data corruption and built-in disk mirroring capabilities.

This guide will go through the process of installing ZFS on Ubuntu 20.04 LTS, setting up a storage pool with fake disks set up in a mirrored vdev configuration and then deliberately damaging the data on to test ZFS’s self-healing capabilities.

What you’ll learn

  • How to install ZFS
  • How to create a striped pool using image files and how this reacts to (fake) disk corruption
  • How to create a mirrored storage pool using image files
  • How ZFS automatically recovers a mirror from disk corruption
  • How to replace a failed disk (file) in a mirrored vdev

What you’ll need

  • Ubuntu Server or Desktop 20.04 LTS
  • 300MB free space

Disclaimer: while I work at Canonical, I do not have anything to do with ZFS in that capacity and I have authored this simply as an Ubuntu user interested in ZFS.


2. Installing ZFS

The main components of ZFS are maintained as a standard Ubuntu package, so to install simply run:

sudo apt install zfsutils-linux

After that, we can check if ZFS was installed correctly by running:

whereis zfs

You should see output similar to the following:

zfs: /sbin/zfs /etc/zfs /usr/share/man/man8/zfs.8.gz

Now that we’re done installing the required packages, let’s create a storage pool!


3. Create and test a ZFS Pool with a Striped vdev

Creating image files to use as fake disks

We are going to create image files to use as fake disks for ZFS, so that we can test them without worrying about our data.

First, let’s create a folder to work in:

mkdir test_zfs_healing
cd test_zfs_healing

Now let’s create two image files to use as our fake disks:

for FAKE_DISK in disk1.img disk2.img
do
	dd if=/dev/zero of=`pwd`/$FAKE_DISK bs=1M count=100
done

If you do an ls, you should now see two img files:

$ ls
disk1.img  disk2.img

Let’s save our working directory to a variable to make it easier to come back here later:

ZFS_TEST_DIR=`pwd`

Creating a Pool

We are going to create a striped vdev, like RAID-0, in which data is striped dynamically across the two “disks”. This is performant and lets us use most of our disk space, but has no resilience.

To create a pool with a striped vdev, we run:

sudo zpool create test_pool_striped \
  `pwd`/disk1.img \
  `pwd`/disk2.img

If we run zpool list, we should see the new pool:

$ zpool list
NAME                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test_pool_striped   160M   111K   160M        -         -     1%     0%  1.00x    ONLINE  -

Note that the size is 160M from our two 100MB raw disks (200MB total), so we have the use of most of the space.

If we run zpool status test_pool_striped, we should see the details of our fake disks:

$ zpool status test_pool_striped
  pool: test_pool_striped
 state: ONLINE
  scan: none requested
config:

	NAME                                      STATE     READ WRITE CKSUM
	test_pool_striped                         ONLINE       0     0     0
	  /home/user/test_zfs_healing/disk1.img  ONLINE       0     0     0
	  /home/user/test_zfs_healing/disk2.img  ONLINE       0     0     0

errors: No known data errors

Add text to the new pool

We can see where our pool has been mounted with:

zfs mount

and we should see something like:

$ zfs mount
test_pool_striped               /test_pool_striped

First we’ll change the mountpoint to be owned by the current user:

sudo chown $USER /test_pool_striped

Then let’s change into that mountpoint:

cd /test_pool_striped

Then we will create a text file with some text in it:

echo "We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits." > text.txt

We can show the text in the file with:

cat text.txt
$ cat text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits.

And we can look at the hash of the file with:

sha1sum text.txt
$ sha1sum text.txt 
c1ca4def6dc5d82fa6de97d2f6d429045e4f4065  text.txt

Deliberately damage a disk

First we will go back to our directory with the disk images:

cd $ZFS_TEST_DIR

Now we are going to write zeros over one of the disks to simulate a data corruption or partial disk failure of one of the disks in our mirror.

WARNING!
This is a dangerous operation as you are writing random
data over something. Make sure you are writing over the
correct file!

dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100

and we should see output like:

$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.173905 s, 603 MB/s

Now change back to the mountpoint:

cd /test_pool_striped/

and read the file:

cat text.txt
cat text.txt 
We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits.

Oh. Let’s check the hash:

sha1sum text.txt
$ sha1sum text.txt 
c1ca4def6dc5d82fa6de97d2f6d429045e4f4065  text.txt

Everything seems fine…

Export the pool

I believe that, when the pool is online, it detects and prevents the corruption. That is great, but interferes with our testing! So we need to export the pool first:

cd $ZFS_TEST_DIR
sudo zpool export test_pool_striped

And if we run a zpool list, test_pool_striped should no longer appear.

Damage the disk again

So now let’s try damaging the “disk” again:

dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100

which gives:

$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.173001 s, 606 MB/s

We now try to re-import the pool with:

sudo zpool import -d $ZFS_TEST_DIR/disk2.img

And we are told that it could not do so because the pool has damaged devices or data:

$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
   pool: test_pool_striped
     id: 3823113642612529477
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

	test_pool_striped                         FAULTED  corrupted data
	  /home/user/test_zfs_healing/disk2.img  ONLINE

If you are lucky, you may instead see something more like:

$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
   pool: test_pool_striped
     id: 706836292853756916
  state: ONLINE
 status: One or more devices contains corrupted data.
 action: The pool can be imported using its name or numeric identifier.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:

	test_pool_striped                         ONLINE
	  /home/user/test_zfs_healing/disk1.img  UNAVAIL  corrupted data
	  /home/user/test_zfs_healing/disk2.img  ONLINE

But in fact trying to import this with:

sudo zpool import test_pool_striped -d $ZFS_TEST_DIR/disk2.img

still does not import (on my system this runs for a very long time, never works and makes zpool commands not work in the meantime).

Clean up

Let’s delete our disk files:

cd $ZFS_TEST_DIR
rm disk1.img disk2.img

4. Create and test a ZFS Pool with a Mirrored vdev

Creating image files to use as fake disks

Let’s create two image files to use as our fake disks again:

for FAKE_DISK in disk1.img disk2.img
do
	dd if=/dev/zero of=`pwd`/$FAKE_DISK bs=1M count=100
done

Again, if you do an ls, you should now see two img files:

$ ls
disk1.img  disk2.img

Creating a Pool

This time, we are going to create a mirrored vdev , also called RAID-1 , in which a complete copy of all data is stored separately on each drive.

To create a mirrored pool, we run:

sudo zpool create test_pool_with_mirror mirror \
  `pwd`/disk1.img \
  `pwd`/disk2.img

Note the addition of the word mirror between the pool name and the disk names.

If we run zpool list, we should see the new pool:

$ zpool list
NAME                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test_pool_with_mirror    80M   111K  79.9M        -         -     3%     0%  1.00x    ONLINE  -

But note that this time the size is only 80M, half what it was before. This makes sense, as we are storing two copies of everything (one on each disk), so we have half as much space.

If we run zpool status test_pool_with_mirror, we should see that the disks have been put into a mirror vdev named mirror-0:

$ zpool status test_pool_with_mirror
  pool: test_pool_with_mirror
 state: ONLINE
  scan: none requested
config:

	NAME                                        STATE     READ WRITE CKSUM
	test_pool_with_mirror                       ONLINE       0     0     0
	  mirror-0                                  ONLINE       0     0     0
	    /home/user/test_zfs_healing/disk1.img  ONLINE       0     0     0
	    /home/user/test_zfs_healing/disk2.img  ONLINE       0     0     0

errors: No known data errors

Add some data

We can see where our pool has been mounted:

$ zfs mount
test_pool_with_mirror          /test_pool_with_mirror

First we’ll change the mountpoint to be owned by the current user:

sudo chown $USER /test_pool_with_mirror

Then let’s change into that mountpoint:

cd /test_pool_with_mirror

Again we will create a text file with some text in it:

echo "We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall." > text.txt

We can show the text in the file with:

cat text.txt
$ cat text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall.

And we can look at the hash of the file with:

sha1sum text.txt
$ sha1sum text.txt 
aad0d383cad5fc6146b717f2a9e6c465a8966a81  text.txt

Export the pool

As we learnt earlier, we first need to export the pool.

cd $ZFS_TEST_DIR
sudo zpool export test_pool_with_mirror

And, again, if we run a zpool list, test_pool_with_mirror should no longer appear.

Deliberately damage a disk

First we will go back to our directory with the disk images:

cd $ZFS_TEST_DIR

Now again we are going to write zeros over a disk to simulate a disk failure or corruption:

dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100

We see something like the following output:

$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.172324 s, 608 MB/s

Re-import the pool

Now we are going to re-import our pool:

sudo zpool import -d $ZFS_TEST_DIR/disk2.img

And we see something like the following output:

$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
   pool: test_pool_with_mirror
     id: 5340127000101774671
  state: ONLINE
 status: One or more devices contains corrupted data.
 action: The pool can be imported using its name or numeric identifier.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:

	test_pool_with_mirror                       ONLINE
	  mirror-0                                  ONLINE
	    /home/user/test_zfs_healing/disk1.img  UNAVAIL  corrupted data
	    /home/user/test_zfs_healing/disk2.img  ONLINE

As expected, disk1.img is showing as corrupted, as we wrote over it with zeros, but, in contrast to the pool with the striped vdev earlier, instead of failing to import as FAULTED, the pool is instead showing ONLINE, with disk2.img showing as ONLINE and only the disk1.img that we overwrote showing as UNAVAIL because of its corrupted data.

The output tells us that we can import the pool by using its name or ID, so let’s do that:

sudo zpool import test_pool_with_mirror -d $ZFS_TEST_DIR/disk2.img

Checking Pool Status

We can check the pool status with:

zpool status test_pool_with_mirror

And the output should look something like:

$ zpool status test_pool_with_mirror
  pool: test_pool_with_mirror
 state: ONLINE
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

	NAME                                        STATE     READ WRITE CKSUM
	test_pool_with_mirror                       ONLINE       0     0     0
	  mirror-0                                  ONLINE       0     0     0
	    4497234452516491230                     UNAVAIL      0     0     0  was /home/user/test_zfs_healing/disk1.img
	    /home/user/test_zfs_healing/disk2.img  ONLINE       0     0     0

errors: No known data errors

So the pool is online and working, albeit in a degraded state. We can look at the file we wrote earlier:

$ cat /test_pool_with_mirror/text.txt 
We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall.

Replacing the failed device

The status is telling us that we are missing a device and the pool is degraded, so let’s fix that.

Let’s create a new “disk” in our working directory:

cd $ZFS_TEST_DIR
dd if=/dev/zero of=`pwd`/disk3.img bs=1M count=100

Then, let’s follow the instructions from the zpool status and replace the disk:

sudo zpool replace test_pool_with_mirror $ZFS_TEST_DIR/disk1.img $ZFS_TEST_DIR/disk3.img

Check the zpool status again

We can see how this disk replacement has affected things by checking zpool status test_pool_with_mirror:

$ zpool status test_pool_with_mirror
  pool: test_pool_with_mirror
 state: ONLINE
  scan: resilvered 274K in 0 days 00:00:00 with 0 errors on Sat Nov 27 22:43:37 2021
config:

	NAME                                        STATE     READ WRITE CKSUM
	test_pool_with_mirror                       ONLINE       0     0     0
	  mirror-0                                  ONLINE       0     0     0
	    /home/user/test_zfs_healing/disk3.img  ONLINE       0     0     0
	    /home/user/test_zfs_healing/disk2.img  ONLINE       0     0     0

errors: No known data errors

disk1.img has been replaced by disk3.img and it tells us that it has “resilvered” the data from the mirror (disk2.img) to the new disk (disk3.img).

Removing the pool and cleaning up

We can now remove the test pool:

sudo zpool destroy test_pool_with_mirror

and it should no longer show in a zpool list.

Then we can remove the fake “disks” we created:

cd $ZFS_TEST_DIR
rm disk1.img disk2.img disk3.img
cd ..
rmdir $ZFS_TEST_DIR

5. That’s all!

Congratulations! We have covered:

  • How to install ZFS
  • How to create a striped pool using image files and how this reacts to (fake) disk corruption
  • How to create a mirrored storage pool using image files
  • How ZFS automatically recovers a mirror from disk corruption
  • How to replace a failed disk (file) in a mirrored vdev

Further reading