The purpose of this page is to enumerate, with a few simple benchmarks, the overheads associated with running an application in a non-NUMA aware environment. In each of the benchmarks below we use the CPUSet and NUMA functionality of the Linux kernel to jail the computation and data to specific CPUs/Memory Nodes. Note: both the instruction and data of an application are jailed to the specified memory node.
STREAMS
The STREAMS benchmark measures the sustainable memory bandwidth of a system.
Memory Node 0, CPU 1
In this configuration our application is runs out of the memory that sits behind the other CPU (i.e distant memory).
| Function | Rate (MB/s) | Avg time (sec) | Min time (sec) | Max time (sec) |
|---|---|---|---|---|
| Copy | 1765.9066 | 0.0181 | 0.0181 | 0.0181 |
| Scale | 1654.8434 | 0.0194 | 0.0193 | 0.0194 |
| Add | 1841.1884 | 0.0261 | 0.0261 | 0.0261 |
| Triad | 2017.7251 | 0.0238 | 0.0238 | 0.0238 |
Memory Node 0, CPU 0
In this configuration our application runs in the memory directly connected to the CPU (i.e. near memory).
| Function | Rate (MB/s) | Avg time (sec) | Min time (sec) | Max time (sec) |
|---|---|---|---|---|
| Copy | 2253.3742 | 0.0142 | 0.0142 | 0.0143 |
| Scale | 2190.8806 | 0.0146 | 0.0146 | 0.0146 |
| Add | 2441.7726 | 0.0197 | 0.0197 | 0.0197 |
| Triad | 2719.8578 | 0.0177 | 0.0176 | 0.0177 |