Reducing Memory Pressure for Chip-Multiprocessors using Cache Injection
Overview
The goal of this project is to reduce memory bandwidth utilization for incoming network messages in multiprocessor multi-core architectures. Lack of available memory bandwidth is likely to limit application performance in these type of systems for high-performance applications. We are using cache injection to address this problem. Cache injection was originally proposed by the Novel Systems Architecture group at IBM Austin Research Laboratory in 2004 and was used to improve TCP protocol processing in commodity systems.
Project Lead
Edgar Leon and Arthur Maccabe are the main investigators. Edgar is the PI for this project. Our external collaborators include Rolf Riesen and Ron Brightwell from Sandia National Laboratories, Michal Ostrowski and Lixin Zhang from IBM Research.
Description
Current and future high-performance systems will be constructed using multi-core chips. Systems with tens of cores impose higher demands to the memory system. Lack of adequate memory bandwidth will limit application performance. To reduce memory bandwidth utilization, we propose to use cache injection of incoming network messages. Using cache injection, incoming network data is placed directly into a processor's cache. When a processor requests the data, there is no need to fetch it from memory as it is already in the cache. The objective of this work is to demonstrate that different OS policies related to cache injection can reduce memory bandwidth utilization and improve application performance on high-performance systems.
Current architectures place incoming network data into main memory letting a processor fetch data into its cache. Fetching data into a cache is usually done by prefetching which anticipates accesses to blocks of memory based on usage patterns. Cache injection provides an alternative approach by placing incoming network data directly into a processor's cache from the I/O bus. This technique reduces memory bandwidth by eliminating fetching data from main memory.
One of the main challenges of cache injection is to decide when and where to inject data. If the application does not use the data promptly, cache injection can induce pollution by taking the application's working set out of the cache. Therefore, injection policies are dependent upon the usage pattern of an application. Furthermore, the performance benefits of this technique rely on a good injection policy.
Good injection policies can only be made having the right information. This information is spread throughout the system in the application data usage patterns, the OS, the compiler and the caches. Our investigation on injection policies include using the following information: (1) hints from the application passed through the compiler; (2) compiler and hardware hints normally used for prefetching; (3) scheduling information from the OS; and (4) cache line status from the cache's directory.
Policy selection uses this information to decide when and where to inject and can be placed in several locations: (1) the network interface controller (NIC); (2) the I/O bridge; and (3) the caches. Two examples may clarify how policy selection works: (1) A smart NIC with OS scheduling information can determine which processor's cache to inject to, according to the processor the consumer thread is running on. Furthermore, the NIC could potentially schedule a thread on a particular processor to consume the data injected into its cache; (2) A cache with hardware support to store a list of cache line addresses the application is expected to use in the next few cycles. The snooping cache determines whether to store a particular cache line traveling on the bus based on this list. This work is relevant as multi-core systems continue to increase pressure on the memory system.
Road-Map and Milestones
To achieve the goal of this work we will show:
- Benefits of cache injection on memory bandwidth
- Cache injection outperforms prefetching
- Cache injection can be harmful without a good injection policy, and
- Cache injection improves application performance using OS and compiler information.
Through simulation, we have demonstrated (1) and (2), showing that cache injection outperforms prefetching and reduces memory bandwidth utilization by up to 96%. Milestones (3) and (4) remain as future work.