RDTSCP inaccuracy at Intel i7

来源：互联网发布：旅游必备软件编辑：程序博客网时间：2024/04/28 16:53

In 2010, an Intel guy Gabriele Paoloni wrote a white paper "How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures", describing precise methods to measure the clock cycles required to execute specific C code in a Linux environment by RDTSC/RDTSCP. In his paper, he addresses 3 problems that may harm measurement reliability. The 1st is instruction cache; the 2nd is CPU preemption (task scheduling ,interrupt.. ); The 3rd is out of order execution. He resolves those problems by using some kernel functions as well as CPU instructions and finally demonstrates a extremely reliable measurement cost (i.e. measuring no instruction) with the min time 44 cycles and the variance of 2~3. That's awesome!

Since Gabriele announces his source code in the paper, it is straightforward for me to replicate his experiment, but I couldn't get the comparable result. The experiment consists of two loops. The inner loop measures no instruction for 100K times. The outer loop repeats 1K times of the inner loop. However, even with identical source code, I couldn't gain stable result set on my Intel i7 workstation. In a trial, the min cycles varies from 40 to 122; the variances varies from 15326 to 138.

The CPU on my platform is Intel Core i7-4779K CPU @ 3.50GHz. Maybe the i7 introduces new features that harm RDTSCP? Can anybody provide some ideas?

0 0