What is pstate and turbo?

来源:互联网 发布:破坏公司网络罪 编辑:程序博客网 时间:2024/06/06 10:05

Recently I found an interesting article about how intel pstate works, it is easy to understand and is written

by Arjan van de Ven:

https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL


Some basics on CPU P states on Intel processors

there seems to be a lot of things people don't realize on how P state selection works on Intel processors, and arguably the documentation is slightly confusing in this regard... and things have been changing generation to generation.

First.. why use the word "P state" and not "frequency"? This is important in terms of thinking about how this works.

"Clock frequency" is something that you measure over some period of time, basically an average on how fast a clock signal went up/down.It's something you can measure, but it's backwards looking. Intel CPUs expose two counters (aperf and mperf) via MSR registers, and if you look at these two registers at two separate times (far enough apart to avoid rounding effects), the ratio of the delta in these two registers gives you a very nice "average frequency" over your measurement interval. (The official SDM documentation has the exact formula for this)

A P state is a number the OS tells the hardware regarding how much performance it would like to see on a certain (logical) cpu; a P state request is very much something forward looking.

So how are these related? 
In the ten year old, single core, no hyperthreading world, things were relatively simple. You could basically map a P state to some "frequency" that you'd get, and as the marketing folks told us, a higher frequency means more performance.

Today, things are much more complex in several key ways.

First of all, and this is important and different from 10 years ago... no matter which P state you ask for, when a logical processor is idle (C state), its frequency is typically 0. The exception to this "typically" is the lightest of the C states (C1), where the frequency is the lowest frequency the CPU supports, and not zero. (but going into C1 is pretty rare, and very short lived, so for this posting, I'm going to ignore C1).

A second important aspect is that of "coordination". For practical reasons, on current Intel processors, all the cores in a package share the same voltage. And because running at a lower frequency than possible at a certain voltage is inefficient, all the cores will also share the same clock frequency at any one time. Of course, except the cores that are idle, because their frequency is zero!
Because the OS will ask each individual logical processor for a separate P state, some reconciliation is needed between the different cores. This reconciliation is actually very simple, at any point in time, the frequency of all the cores is the maximum of what each of the individual cores wants. Of course, minus the idle cores. Their frequency is zero, and the maximum of "something" and "zero" is "something". 

A simple example is appropriate here.
Lets take a two core system (core A and core B, that are initially both busy).
Core A would want to have a clock that ticks at 1 Ghz, and Core B wants a clock that ticks at 2 Ghz.
The maximum of 1Ghz and 2Ghz is .. 2Ghz, so Core A and Core B will both run at 2 Ghz, even though core A only asked for 1 Ghz.
But now at time X, Core B is going idle. Since an idle core has a frequency of zero, and the maximum of zero and 1Ghz is 1Ghz... Core A now runs with a clock of 1 Ghz.

The key thing here is that Core A gets a very variable behavior, independent of what it asked for, due to what Core B is doing.
Or in other words, the forward predictive value of a P state selection on a logical CPU is rather limited.

Sound complex? Now imagine that the GPU on die is in many ways like a CPU core.... and realize that what I described above is actually a simplification of reality.

Another development in the last few years has been that of "Turbo".
Some people call it "overclocking", but it isn't overclocking, it's all within the specs of the hardware. Turbo exists because in a multi-core system, it's possible to run a single core faster than the frequency that is on the label of the box when you buy the processor. This has to do with power budgets; when you buy a 35 Watt TDP cpu, the CPU isn't supposed to use more than 35 Watts. So if you have, say, 4 cores, that means each core by itself can use a little less than 9 Watts to fit that budget.
But if 3 of the 4 cores are idle... the one remaining core can use the whole 35 Watts. (Now add in that the GPU also counts into this 35 Watts as do several other shared resources, and it gets much more complex).
If this single core would be limited to 9 Watts instead of the full 35W even when the others are idle, a lot of potential performance is left on the table.

Now in the first processors that supported Turbo, the available "extra range" was limited, but this range has been growing and growing as core counts have gone up, power sensors have been added to the CPU and power levels have come down. (don't be surprised to see that your CPU has more levels in the turbo range than it has outside the turbo range)

What does this mean? Well, when the OS asks for a P state value that is in the "Turbo Range", it may not actually get the performance that maps to that level; the sum of the power in the system could be exceeding the allowed TDP value if that performance (clock frequency) was granted to all cores (remember from above that all running cores share clock frequency).
What you do get at any one point in time depends on what other cores and the GPU etc are doing.... and this will vary over time as cores go idle or become active, or as the GPU finishes a frame or starts a new complex frame... and even with temperature.
Or in other words, what frequency you get is highly dependent on other things including the C state selection policy and the graphics subsystem.

Another fun angle is that when a task is running completely memory bound, the performance of this task is basically independent of the clock frequency.... and some systems will detect this condition and temporarily lower the clock frequency to save power without reducing performance too much (all within the bounds of all the things I described above).

If it wasn't clear yet, a lot of what I described above varies from generation to generation quite a bit... and its going to change quite a bit more in the next few years.

In the 3.9 kernel we've introduced a new controller driver for the P states, simply because the previous, 10+ year old algorithm wasn't cutting it anymore; too much has changed. By making the driver CPU generation specific, we can now select and tune algorithms for each specific generation, and do significantly better (30%+) than when we used a very generic algorithm.

Another thing to realize from all of this is that while it's easy to talk and look at performance looking backwards (aperf/mperf allow us to do that), predicting performance going forward, even if you are very deliberately picking a P state value, is often near impossible since what you will actually get depends a LOT on what the other parts of the system are doing.
204 plus ones
204
39 comments
39
99 shares
99
Shared publiclyView activity
View 33 previous comments
  • Arjan van de Ven's profile photo
    Arjan van de Ven
    +Magdalena Dobosz there is no difference between i3/i7 like this. Very likely you are confused in cpufreq telling you incorrect things; I do not know how you measure what the frequency actually is, but if you rely on the cpufreq sysfs field, it just tells you what the OS asked the hardware to do, not what you actually got....
    Nov 10, 2014
  • Magdalena Dobosz's profile photo
    Magdalena Dobosz
    Thanks for your reply. I am reading the value from sysfs. Not from the "scaling_cur_freq" file though (which I know is connected to the frequency that the OS asked for), but from the "cpuinfo_cur_freq" file. I analysed the acpi-cpufreq code (for 3.2 kernel) and the value written to "cpuinfo_cur_freq" is taken from MSR_IA32_PERF_STATUS register. It should accurately indicate the actual p-state, shouldn't it? On the other hand, I wonder how to interpret this value when CPU is not in C0 state.
    Nov 11, 2014
  • Arjan van de Ven's profile photo
    Arjan van de Ven
    +Magdalena Dobosz it will accurately present what the kernel asked it ;-)
    Not the moment-to-moment actual...
    Nov 12, 2014
  • Dirk Brandewie's profile photo
    Dirk Brandewie
    PERF_STAUS (0x198)will show the P state the core will run at if not idle after HW coordination.  PERF_CTL (0x199) shows what the OS asked for.  The current turbostat will display the frequency the core ran at along with the effective frequency taking idle into account over the sample time.
    Nov 21, 2014
  • Magdalena Dobosz's profile photo
    Magdalena Dobosz
    Arjan: according to ACPI specification "Success or failure of the processor performance transition is determined by reading a Performance Status Register (PERF_STATUS) to determine the processor’s current performance state." and according to Intel Manual: "Reads of IA32_PERF_CTL determine the last targeted operating point. The current operating point can be read from IA32_PERF_STATUS. IA32_PERF_STATUS is updated dynamically." You are saying that MSR_IA32_PERF_STATUS will accurately present what the kernel asked. So does it mean that the documentation does not accurately describe the actual behaviour? 

    Also, you described cores coordination in modern Intel cpus ("all the cores will also share the same clock frequency at any one time" and "the frequency of all the cores is the maximum of what each of the individual cores wants." ). This is very interesting and I would like to write about it in my BSc thesis, but I would need some reference. Could you please provide some?
    Nov 29, 2014
  • Cyril Ingenierie CyrIng's profile photo
    Cyril Ingenierie CyrIng
    Hello
    Just let you know that I've programmed XFreq : an Intel Core i7 monitoring GUI which gets cstates data straight from the Processor through the MSR registers
    Source code and build instructions at http://code.google.com/p/xfreq
    Dec 16, 2


0 0
原创粉丝点击