OSDI'06 Work-in-Progress Statement

Operating system interference with parallel application performance is receiving a significant amount of attention in the high-performance computing (HPC) community. Custom lightweight operating systems from IBM and Cray allow applications to scale extremely well on several thousand to more than one hundred thousand processors. In contrast, studies have shown that interference in commodity operating system environments can lead to a 50% performance penalty compared to specialized solutions. Unfortunately, there is no general consensus as to the interference characteristics that are important to the performance of applications. These characteristics include the types of interference (e.g. CPU, network, cache), the frequency and duration of the noise, the scale at which noise becomes significant, and the impact of the balance of the system (e.g. relative processing and network performance).

Our goal is to build an infrastructure to synthetically generate a known quantity and type of noise in order to measure its performance impact on HPC applications. We are building this infrastructure into the Catamount lightweight operating system, which runs on nearly twenty-six thousand AMD Opteron cores on the Cray Red Storm system at Sandia National Laboratories. The extremely low noise profile of Catamount is ideal for creating an infrastructure to generate the noise signatures of many different HPC operating systems, such as Linux, so that the impact of noise can be properly analyzed and studied. Additionally, since Red Storm is a large-scale, highly balanced system, we can also explore the impact of system balance by using hardware-based mechanisms that degrade the compute and network performance of the machine.

Thus far, we have constructed and integrated our framework into Catamount, and verified that the framework itself has no significant impact on system performance or noise signature. Moreover, we have verified that we can use this infrastructure to emulate the CPU noise profiles of current HPC operating systems under many different workloads. For example, we have generated a CPU noise profile with our framework similar to that of Linux on both a standard desktop as well as a Red Storm service node. Further work on this project include extensions to our framework as well as allowing changes to certain hardware characteristics. Currently, our noise framework generates only one type of interference, CPU interference. We are extending our framework to emulate other noise types, for example network and cache interference. In addition, we are examining performance effects of noise at higher node counts and augmenting the current software environment to allow certain hardware characteristics, such as processor speed and HyperTransport bus bandwidth to be modified on a per-job basis.

The results of this work will provide significant insight into several important areas, including the discovery of the important noise characteristics of operating systems for HPC platforms, the sensitivity of applications to noise in combination with system balance, and the ability to project the behavior of applications given a particular noise signature and set of balance measurements. Moreover, this tool will be critical in directing the design and implementation of future lightweight operating systems tailored specifically for large-scale multi-core parallel processing systems.

 
/var/www/ssl/data/pages/kurt/os_noise/wip-statement.txt · Last modified: 2008/01/07 12:37 (external edit)     Back to top