sslbootmgr.cs.unm.edu documentation

I. What is it?

sslbootmgr.cs.unm.edu (from now on referred to as the “SSL Boot Manager” or just “the server”) is the continuation of Mud Douglas' system imaging project done as an Independent Study under Patrick Bridges. The goal is a scalable, automated, server-driven system for managing client disk images.

Practically speaking, the use for this is “experiment management”: I'm trying to provide a method for other SSL members to back up the contents of one of the Spider or Simpsons machines, so that they can safely reinstall or reconfigure the system, knowing they can revert to a previous state with relative ease.

II. Who supports it?

Ah, the $64000.00 question. I do. You can reach me at douglas@unm.edu. If it's something apocalyptic, it might take some time for me to get down to UNM to fix it, as I'll be back at my job next semester.

III. How does it work?

It's based on fairly straightforward community software, namely, Debian Linux (latest stable). I've set it up using established methods to handle diskless booting of clients, which is how I've made the backup and restore process server-driven and very friendly to headless machines. However, I've written a suite of scripts to take most of the headache and pain out of managing the various daemons and configuration files required to use a diskless boot environment. I've also created specific diskless boot images (with a few nifty tricks) which provide specific restore and backup functionality. I'll walk you through those in just a minute.

IV. System Overview

There are two main parts to the sslbootmgr system. They interact somewhat, but are most easily described separately. They are:

a. Diskless Boot System and Supporting Scripts

As you can find out reading on the Internet, there are four pieces required on the server side to support diskless booting of Linux clients. They are: dhcp, tftp (and tftp bootloading), a diskless-compatible (i.e. monolithic, static) kernel, and an NFS-mounted root directory. Of course, sslbootmgr has all of these basic items.

Just in case you need to tweak something, and please be aware that tweaking these configurations could completely destroy the server if you aren't careful… I have placed everything you would likely need to change in /root. Observe that /etc/exports and /etc/dhcpd.conf are links to root's etc directory. The TFTP root directory is shared from /root/tftpboot, again to keep things in one central place. And all NFS exports shared from the sslbootmgr reside in /root/export. I've made some tweaks to /etc/init.d and so forth to make the system more secure, i.e. I've turned off some services.

What makes the diskless boot system on sslbootmgr really snazzy is the /root/scripts directory. While I'll get into the exact details of how to use it momentarily, you should know that there are four scripts:

  • dhcpgen - generate a dhcpd.conf file
  • pxegen - generate appropriate PXE bootload config files
  • nfsgen - generate an appropriate /etc/exports file
  • reload-all - reload all associated daemons and force rereads of their config files

b. Backup and Restore System

The whole point of this Independent Study was to deliver a system that would perform automated, server-driven backups and restores. Now, “automated” has a lot of different meanings, and in this case, it means “it doesn't do everything for you, because as a power user, you probably require flexibility”. So moving forward, I will expect a moderate amount of Linux knowledge, and some very basic scripting ability. Don't worry; I've scripted away most of the nasty stuff.

The B/R system has two parts. The first part lies inside of the diskless boot trees “sslbootmgr-restore” and “sslbootmgr-backup”. These are basically clones of the “sslbootmgr-rescue” image, which is nothing more than a plain vanilla, barebones Debian build. I've made minor adjustments to it, such as shutting off most of the services in rc2.d, and especially noteworthy is the fact that I use /etc/init.d/mkvar to use a ramdisk for /var (so that multiple clients can share the same NFS-exported root directory simultaneously). The only other “special” item in these two boot trees is the existence of either /etc/init.d/run-backup or /etc/init.d/run-restore, respectively. These are called out of rc2.d one the system has completely come up. They are the brains of the automation, and rely heavily on NFS mounts to perform their tasks.

The other part of the Backup/Restore system is the mount /root/export/sslbootmgr. Inside this area, we keep logs of backups and restores (server-side, so that you can review them without plugging into a headless node), and this area is where you configure the parameters that drive the run-backup, run-restore scripts.

V. How to configure the diskless boot system

This is the first of three “practical” sections; the other two will show you how to backup and restore clients.

ssh into sslbootmgr.cs.unm.edu with the “root” account and the standard root password. Then, cd into /root/scripts. You'll find some files:

  • dhcpgen
  • pxegen
  • nfsgen
  • reload-all

These above four are scripts, as I mentioned before. But let's talk first about the two configuration files:

  • options.conf.simpsons
  • clients.conf.simpsons

As the names imply, they're written for the Simpsons machines, on which this system was tested. It was designed to port to the spider machines with an absolute minimum of effort.

Looking inside the “options” file, you'll find some pretty obvious and well-commented parameters available for you to tune. They mostly affect the operation of the dhcp server. Please exercise caution when changing this file – for doing work on the simpsons machines, you shouldn't need to change this file. A misconfigured dhcpd.conf is a pain in the butt to troubleshoot, but if you create one, check /var/log/syslog for more details.

The file you're more likely to change is the “clients” file. Each client has exactly (EXACTLY) four lines associated with it. You order clients in ascending numerical order from 1: “1 2 3 … 11 … 34” and so on. Don't skip numbers or other funny stuff; you will most likely just break the script.

The four directives are:

!client-X <hostname> <mac_addr> <ip_addr> !kernel-X <filename_in_tftpboot_dir> !nfsroot-X <nfs_root_for_this_client> !option-X [zero or more kernel arguments passed at boot]

Even if you don't want to pass kernel args - you still need a blank options line.

The rest of these are pretty obvious answers. The ones on the !client line are things you want to assign to your client. The !kernel line had better match a valid kernel bzImage located in the /root/tftpboot directory. I would recommend using my precompiled 2.6.18.2-diskless for most of your needs. It supports many devices well. If you need to recompile this kernel, you can find the config files and kernel source in /root/build/src/linux-2-6.18.2.

Now, once you set up these files, I've made configuring the diskless boot system a snap. Simply run, in order, “dhcpgen”, then “pxegen”, then “nfsgen”, then “reload-all”. Just like that, you will have reconfigured all of the subsystems required for diskless booting of your new client. And of course these scripts will give you (rudimentary) help with your config files, bailing out if you make an egregious error, and backing up the previous versions before committing changes.

VI. Backing Up Your System

… is really a pretty straightforward task. Here's what you'll need to do:

  • First, set up the diskless subsystem to boot you to a known IP address (pick one, I use .242)
  • Next, reconfigure your client-X to boot to the sslbootmgr-backup image
  • Then, regenerate and reload all diskless boot system files as described above
  • Finally, you'll need to add yourself to some configs on the sslbootmgr-backup side. Here we go..
cd to /root/export/sslbootmgr/backup

First, you need to insert yourself into the scheme.map file. Make sure that your IP's not already in there, if it is, then somebody didn't houseclean when they finished backing up. Under the “IP Address” column, put the IP that you've instructed DHCP to reliably give your client. Then, under the “specfile” column, give a meaningful name for your future specfile. Try something like “spec.IPADDRESS”.

Now, you need to create a specfile. This very simple file tells the sslbootmgr-backup script which partitions on the local disk to back up. I'd advise you to copy one of the existing ones and modify it. There are two examples I've left for you: dolph-unipart is set up to backup a one-partition Linux install, with everything lumped onto sda1. If you have a more complicated install, refer to dolph-multipart for guidance. Pay special attention to the !root directive, for this is how you tell the script where the root directory exists.

Once you've set up these files correctly, simply turn on your client, and enable network booting from the PXE-compatible network card. You'll boot directly to the sslbootmgr server, back up, log to the server, and halt when finished. We don't reboot, so as to avoid an infinite loop of backups.

One last note; please go into /root/export/sslbootmgr/saved-images and do an “mv” to give your saved tree a more meaningful name. This helps to reduce clutter and makes culling unused images easier.

VII. Restoring Your System

… requires slightly more knowledge of scripting, because it's hard to offer power and flexibility AND have the system read your mind. But that aside, let's go over how to restore the tree you saved previously.

Step 1. Go into the diskless boot system configuration, as described above, and have your target system boot with a known IP, into the “sslbootmgr-restore” image.

Step 2. Change into the directory /root/export/sslbootmgr/restore

Step 3. This scheme.map file is different. You'll need to map the IP address you expect your client to receive, to two scripts, a “partitioner” and a “postinstall”, as well as to provide an image name.

The partitioner is a script which abstracts away the process of partitioning, making filesystems, and mounting the disk partitions. We do this in order to provide maximum flexibility to the end user. In order to make the substantial task of writing such a script easier, we have provided an example for your reference. Please refer to the manual page for sfdisk “man sfdisk” for more information on how to partition using sfdisk.

The postinstall is a script which cleans up any loose ends after the filesystem has been dumped to the client's disk. These usually include installing the bootloader in the MBR, and generating a sane fstab file. Again, we have provided an example for your reference. We recommend the use of LILO instead of GRUB for the easiest scripting, but if you are comfortable automating a GRUB install, or wish to do it manually through a post-dump rescue disk boot, that's fine.

The image name is simple; it's just the directory in /root/export/sslbootmgr/saved-trees that you want to have dumped to your new directory tree rooted at /mnt/local_root during restore.

Once you have written these configuration files, reboot your client, and it should pull down the parameters you have provided. It will then partition the disk using your shell script, mount the filesystems, and dump the disk. Of course, you will again have a server-side log for review.

Final Remarks

I didn't go into overwhelming detail here, because all the scripts and configuration files that support the sslbootmgr have been commented as well. Please refer to the comments in those files for specific advice on how and how not to use those files.

 
/var/www/ssl/data/pages/machine/sslbootmgr.txt · Last modified: 2008/01/07 12:37 (external edit)     Back to top