REQ: Linux, Raid, Heartbeat
Backstory: Upon entering Company X it turns out one of the major projects my predecessor left unfinished was a highly available shared storage container. I use the term container as a general label, because the system itself consists of two completely separate servers. These machines are Penguin Computing Relion 2600SA 2U servers populated with six 2TB drives. We’ll call these machines alpha and beta. My job was to finish this project and get alpha syncing data via drbd with beta, and make sure that the build-out was clean and stable. The reason for the request of thorough auditing of performance for alpha and beta are the result of a lot of ‘up in the air’ questions regarding the equipment being used, mainly because the prior sys admin was incapable of completing the project due to hardware problems. I’ll explain the history of these compatibility issues later as it is extremely relevant to the build out of the final product.
For the most part the services this system will offer are very basic. It’ll be used as a NFS export where production data backups will be stored. The storage layout will be simple, on each machine we have the following mountpoints for the base Linux (Ubuntu server LTS) installation. The system related partitions/RAID arrays will look like the following, across 6 drives using Linux software raid.
/dev/sdx1 BIOS_BOOT (1MB)
/dev/sdx2 /boot RAID1 (200MB x 6)
/dev/sdx3 SWAP RAID10 (1GB x 6)
/dev/sdx4 / RAID10 (20GB x 6)
The first partition as you can see doesn’t have a mount point, it’s labeled as a bios boot partition. It is a partition on a data storage device that may be used by standard BIOS-based machines in order to boot when the partition table of the device is a GPT label. This partition is used by GRUB2 only in BIOS-GPT setups. No such partition type exists in case of MBR partitioning (at least not for GRUB2). This partition is also not required if the system is UEFI based, as no embedding of bootsectors takes place in that case. It only needs to be 1MB large, large enough for Grub to install its image to, and it doesn’t require a filesystem. Once you have created the partition it needs to be marked as a BIOS BOOT partition, it’s a specific ‘type’ of FS mark in Ubuntu’s partition manager.
If you do not create this partition, GRUB will not be able to install itself as the boot loader. I personally ran into this problem and spent over a day trying to troubleshoot it. This is also, what I believe, what was culpable of the systems administrator before me from getting this system up and running. I was able to figure it out after 12 hours of troubleshooting and research, mainly because a lot of the documentation and forum threads I was reading while troubleshooting didn’t suggest it as a concrete solution to the BIOS/GPT compatibility problem. If you’ve reached this page because you’re getting the
Lastly, the boot partition must be RAID1, because GRUB does not have RAID drivers. Once you’ve provisioned the appropriate environment on the disks, use Ubuntu’s software raid configuration in it’s partition manager and create the appropriate MDs. If you’re not sure how to do this, there are plenty of howto’s available on the Internet, including most likely on your particular Linux distributions web site. I’m not going to recreate the steps in achieving the creation of a software raid array during a Linux installation because there is already a plethora of walk-through’s on this particular subject already, just use your Google skills.
Once you’ve partitioned your drives, created your MD devices, and set their respective mount-points, you can proceed with the Linux installation. You’ll most likely get prompted to format the new RAID devices accordingly, and you will authorize it so the raid devices have readable file systems on them. In Ubuntu (and probably all other distros) you’ll be asked how to handle degraded arrays. I typically make the systems wait on degraded status so that I know exactly what’s going on when it occurs, am directly responsible for the health checking of the drives involved, and am aware when an array needs attention/is being rebuilt. If you don’t do this, you run a greater risk of corrupting your RAID data, so be ware when setting this option.
The end result of this portion should be a working Linux server with a /boot partition of ~200MB, a ~3GB SWAP array, and a ~52GB / partition. Your environment will look something like this:
root@alpha:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 56G 907M 52G 2% /
none 4.9G 336K 4.9G 1% /dev
none 4.9G 0 4.9G 0% /dev/shm
none 4.9G 40K 4.9G 1% /var/run
none 4.9G 0 4.9G 0% /var/lock
none 4.9G 0 4.9G 0% /lib/init/rw
none 56G 907M 52G 2% /var/lib/ureadahead/debugfs
/dev/md0 179M 17M 154M 10% /boot
You can test the redundancy of the system by pulling the first drive in the array (/dev/sda) out of it’s bay, and boot the machine up. It will prompt you on the degraded array, you may then boot, insert the drive back into the bay, and repair the arrays with mdadm, which is the MD device manager in Linux. You can use mdadm for everything that you would typically use a real raid appliance for, including expanding, monitoring, and auditing your software RAID arrays. For demonstration purposes we’ll use mdadm to show in detail the status of md0 md1 and md2.
root@alpha:~# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90
Creation Time : Wed Oct 12 10:29:49 2011
Raid Level : raid1
Array Size : 195520 (190.97 MiB 200.21 MB)
Used Dev Size : 195520 (190.97 MiB 200.21 MB)
Raid Devices : 6
Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Oct 18 16:55:25 2011
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
UUID : b4a2dffa:8d6571b4:3ca875b3:fd9af448
Events : 0.92
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
3 8 82 3 active sync /dev/sdf2
4 8 50 4 active sync /dev/sdd2
5 8 66 5 active sync /dev/sde2
root@alpha:~# mdadm --detail /dev/md1
..
..
root@alpha:~# mdadm --detail /dev/md2
..
..