The purpose of this page is to enumerate, with a few simple benchmarks, the overheads associated with running an application in a non-NUMA aware environment. In each of the benchmarks below we use the CPUSet and NUMA functionality of the Linux kernel to jail the computation and data to specific CPUs/Memory Nodes. Note: both the instruction and data of an application are jailed to the specified memory node.

STREAMS

The STREAMS benchmark measures the sustainable memory bandwidth of a system.

Memory Node 0, CPU 1

In this configuration our application is runs out of the memory that sits behind the other CPU (i.e distant memory).

Function Rate (MB/s) Avg time (sec) Min time (sec) Max time (sec)
Copy 1765.9066 0.0181 0.0181 0.0181
Scale 1654.8434 0.0194 0.0193 0.0194
Add 1841.1884 0.0261 0.0261 0.0261
Triad 2017.7251 0.0238 0.0238 0.0238

Memory Node 0, CPU 0

In this configuration our application runs in the memory directly connected to the CPU (i.e. near memory).

Function Rate (MB/s) Avg time (sec) Min time (sec) Max time (sec)
Copy 2253.3742 0.0142 0.0142 0.0143
Scale 2190.8806 0.0146 0.0146 0.0146
Add 2441.7726 0.0197 0.0197 0.0197
Triad 2719.8578 0.0177 0.0176 0.0177
 
/var/www/ssl/data/pages/kurt/opteron/numa_cost.txt · Last modified: 2008/01/07 12:37 (external edit)     Back to top