Replace Disk

Notes on replacing a disk in a ZFS Pool

If you want to replace a disk in a ZFS pool you normally do it from the UI, selecting a spare drive that is at least the same size as the one to replace, then use it as the replacement for the original drive. An hour or so later the new drive has been populated with the data from the old one and the original disk is removed from the pool so that it can be replaced.

The problem

In my instance I was upgrading a two-drive pool from two 2TB drives to two 4TB drives, so I had to replace each one, one at a time by installing the new drive in a spare bay, then get ZFS to replace the original disk with the new one.

This works fine as long as the new drive is the same size or larger than the original.

In TrueNAS Core 13.0-Release, that process is broken and you have to use a script to perform this task.

Now that script works fine and the link above does show how to do it, but one issue I had was that TrueNAS uses gptid's as the identifiers in the pool, so how do I map the actual gptid to remove to the disk?

If this was replacing a degraded disk then that's not a problem as it would show, but in my case I was increasing the size so the drive's state is ONLINE so not easily identifiable.

The solution

List the specific pool with the disk to replace

Run zpool status <pool name> where <pool name> is the name of the Pool. In this example it's Pool1

root@nas1[~]# zpool status Pool1 pool: Pool1 state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 01:43:47 with 0 errors on Mon Jun 6 22:55:22 2022 config: NAME STATE READ WRITE CKSUM Pool1 ONLINE 0 0 0 gptid/746cf510-8712-11ea-b4bb-28924a2f0b10 ONLINE 0 0 0 gptid/041516c7-e5b1-11ec-bd44-28924a2f0b10 ONLINE 0 0 0 errors: No known data errors

Identify the disk to replace

Here we have 2 disks, ada1 which is the one to replace and is a member of the pool and ada2 which is our new one. In the TrueNAS UI we don't see gptid's so we need to identify which one of the above relates to the ada1 disk.

To do this we can use the glabel status command which lists the mappings:

root@nas1[~]# glabel status Name Status Components gptid/95191534-e5ce-11ec-8a81-28924a2f0b10 N/A ada0p1 gptid/746cf510-8712-11ea-b4bb-28924a2f0b10 N/A ada1p2 gptid/041516c7-e5b1-11ec-bd44-28924a2f0b10 N/A ada3p2 gptid/951ce3a4-e5ce-11ec-8a81-28924a2f0b10 N/A ada0p3 gptid/745a2d41-8712-11ea-b4bb-28924a2f0b10 N/A ada1p1 gptid/0404cfd6-e5b1-11ec-bd44-28924a2f0b10 N/A ada3p1

From this we can see that drive ada1 partition 2 (ada1p2) maps to gptid/746cf510-8712-11ea-b4bb-28924a2f0b10 which is what we need.

Run the script to replace the disk

First download the tool, either with curl -s https://raw.githubusercontent.com/truenas/gist/main/replace_disk.py -o replace_disk.py or the copy listed in the Resources' section at the top right of this page.

Next run it with python3 replace_disk.py <pool_name> <gptid/####> <ada#> substituting the pool_name, gptid and ada parameters:

root@nas1[~]# python3 replace_disk.py Pool1 gptid/746cf510-8712-11ea-b4bb-28924a2f0b10 ada2 Replace initiated.

You should then see it appear in the UI that it's now replacing the disk and the ETA on when it will complete.