Assumptions, Constraints and Considerations
I had a Linux software RAID5 (MD) device with three SATA disks
which I wished to expand to four disks to increase capacity.
The layering is as follows:
----------------------------------
Filesystem | ext4 |
----------------------------------
LVM logical volume | /dev/mapper/vg-extdisk1/ortho |
----------------------------------
LVM volume group | VG vg-extdisk1 |
----------------------------------
LVM physical volume | PV /dev/md0 |
----------------------------------
Software RAID | /dev/md0 (RAID5) |
----------------------------------
Physical disks | /dev/sda | /dev/sdb | /dev/sdc |
----------------------------------
In the following example, it is assumed that the new disk
will be /dev/sdd, and is a SATA disk of the same capacity
as the other disks in the array.
In the following, we manipulate the MD array /dev/md0.
The status of the array is available from the virtual
file /proc/mdstat which can be accessed as a text file,
for example:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0: active raid5 sdc[3] sdb[2] sda[0]
1952870400 blocks super 1.2 [2/2] [UU]
bitmap: 1/15 pages [4KB], 65536KB chunk
Method
This filesystem is structured as several layers, each
abstracting the layer below.
- Physical disks
- SATA devices
- Software RAID device (mdadm)
- Abstracts the array of physical disks to a single
disk device with redundancy.
- LVM (Logical Volume Manager)
- Abstracts one or more disk devices as groups of logical
volume block devices on a logical physical device.
- Filesystem (ext4 in this case).
- File system that manages the storage of files in
a logical volume block device.
With this layered filesystem architecture, each layer needs
to be expanded in turn to use the additional capacity.
- Physically add the new physical disk to the system
and prepare its formatting.
- Add the disk to the existing MD array device (/dev/md0)
as a component device. (Note that the role of the
component device in the array is not yet specified; it is
simply made available.)
- Grow the RAID5 array to add the capacity of the new disk.
- Resize the LVM physical volume to use the additional
capacity.
- Expand the LVM logical volume to use the additional
capacity.
- Expand the filesystem to use the additional capacity.
1. Prepare the new disk
Physically adding a new disk is an exercise for the reader but
the disk should be recognised by relevant disk controller and
be presented to the operating system as a physical storage
device. In this article, we assume that the existing SATA
disks in the RAID5 array are /dev/sda, /dev/sdb and /dev/sdc,
and the new disk is /dev/sdd.
The disk then needs to be prepared to add it to the software
RAID array by creating a partition table. On Linux, the parted
or fdisk programs can be used. In this example below, I use parted.
This is taken from Tobais Hoffman .
To create a new partition table on the new disk:
# parted -s -a optimal <device> mklabel gpt
where the arguments have the following meanings:
- -s
- Do not prompt for user interaction.
- -a optimal
- Align disk partitions to a multiple of the physical
block size that gives optimal performance.
- <device>
- The device corresponding to the newly-added disk;
/dev/sdd in this example case.
- mklabel gpt
- Create a new disk partition table (called a disklabel)
of the type 'GPT'.
(In Hoffman's blog , the author has a software raid device
made up of disk partitions. In my case, I use the disk device
itself (that is, /dev/sdd rather than /dev/sdd1) so his step
to create a disk partition is omitted.)
2. Add the disk to the software RAID device
The SATA disk device (/dev/sdd) can now be added to the
software RAID (MD) device (/dev/md0) as a component device
with a command:
# mdadm --add <raid_device> <disk_device>
where:
- --add
- Tells the mdadm command to add a disk. (The MANAGE mode
is assumed when --add is used so it does not have to be
explicitly specified)
- <raid_device>
- Specifies the target MDADM device; in this case /dev/md0
- <disk_device>
- Specifies the target disk device to add to the array;
in this case /dev/sdd
This does not yet actually use the disk, it just makes it
available to the target array as a component device.
3. Grow the array
This is the most time-consuming step, where the component
device is added to the RAID array and the array is extended.
# mdadm --grow <raid_device> --raid-devices=<n>
where:
- --grow <raid_device>
- Instructs mdadm to change the size/shape of the
MD device <raid_device>; in this case /dev/md0
- --raid-devices=<n>
- Specifies the number of active devices in the array.
The number of active devices <n> plus the number of
spare devices must be equal to the number of component
devices. In this case, n=4 since we are growing the array
from three to four disks.
The status of the array sync progress and target time can
be monitored by inspecting the /proc/mdstat
virtual text file, as described above. With my setup, this
took around 72 hours (three days) with slow (5,600 rpm) SATA
disks in an external array connected by an e-SATA interface,
with the default MD system parameters.
Rebuild Speed Tuning
Expanding the array is a fairly time-consuming process. There
are several kernel parameters that affect array performance
documented in the md(4) man page can be adjusted. Because of
the variation in system configurations, the default parameters
are set so that the MD device "basically works" on a wide
range of systems, but could benefit from tuning to match the
characteristics and usage of your own system.
One parameter that affects the data sync (RAID rebuild) is the
strip_cache_size, which specifies the size (in pages per
device) of the "strip cache" used for synchronising all write
operations to the array, and has a default value of 256
(valid values being 17 to 32,768). There is a classic memory-speed
trade-off for this setting, and setting it too high in a system
without enough RAM can result in memory exhaustion.
To see the current value:
$ cat /sys/block/md0/md/strip_cache_size
256
Other parameters that affect the rebuild speed are speed_limit_min
and speed_limit_max, which are per-/device/ rates in kilobytes/sec
that specify the "goal" rebuild speed for times when there is
and there is not non-rebuild activity on the array, respectively.
Defaults are 1,000 and 200,000 kb/s respectively.
These parameters can be set before a rebuild, for example:
$ echo 32768 > /sys/block/md0/md/strip_cache_size
$ echo 50000 > /proc/sys/dev/raid/speed_limit_min
"Your mileage may vary".
4. Grow the LVM physical volume
When an LVM physical volume is overlaid on a logical block
device that changes in size, it must be resized with the
pvresize command to recognise the additional capacity. The
current status of the LVM physical volumes and volume groups
can be inspected with the pvdisplay and vgdisplay commands,
and it is recommended to do so before and after manipulating the
logical volumes.
Resizing is accomplished with the command:
# pvresize <pv>
where:
- <pv>
- Is the physical volume. In this case, the LVM physical
volume is created on the top of the software RAID device,
so is /dev/md0.
This increases the size of volume group hosted on the
physical volume.
5. Grow the LVM logical volume
In my case, I had a single logical volume on the expanded
LVM volume group, and wished to expand it to fill the
available space.
# lvextend -l +100%FREE <logical_volume>
where:
- -l +100%FREE
- specifies to extend the logical volume
to occupy all available free space in the volume group.
- <logical_volume>
- is the logical volume device
(in this case, /dev/mapper/vg-extdisk1/ortho)
6. Grow the filesystem
Finally, the filesystem needs to be expanded to use the
additional space in the expanded logical volume. For the
ext4 filesystem I was using in this case:
# resize2fs <logical_volume>
where:
- <logical_volume>
- is the name of the block device containing the filesystem
(in this case, /dev/mapper/vg-extdisk1/ortho)
This can be confirmed by using the df -h command.
Winding Up
Expanding a software RAID device is fairly straightforward
once you recognise that each "layer" on a layered filesystem
must be expanded in turn, and work out the fairly baroque
mdadm and LVM management commands.