Clarification of General Concepts- User Space , Kernel Space and PGD and CR3

来源:互联网 发布:越南语发音软件 编辑:程序博客网 时间:2024/05/22 06:32

PGD points to page directory table of the currently executing process which contains a mapping table from page directory to page tables and then to actual physical pages (of both user space pages and kernel space pages). If another process is scheduled then how the kernel mappings are handled? I have studied that kernel space is common across all processes. This means that the lower 2 GB allocated to each process is common across all processes. Thus PGD and PGTs of all processes have similar entries for kernel space memory.
Please throw some light on this.. I am confused in these concepts.

Johnny Levin •Namaste Anshul,

Let's stick to 32-bit case, for this explanation:

You are correct that CR3 is, essentially, a pointer to the process' virtual memory. Threads of the same process have the same value of CR3. However, it is the UPPER 2GB (Windows, Linux: Upper 1GB) that is common. This means that kernel space addresses start at 0x8000000 on Windows (Linux: 0xC0000000), and are, indeed shared among processes.

Now, WHY do it? Because at the cost of 2GB on Windows, iOS or (Linux, Android: 1GB), you gain the advantage that the cost of a user/kernel context switch is the same as of a thread (because CR3 doesnt change). Additionally, you don't have to flush the Translation Lookaside Buffer (TLB). So that optimizes the process of system calls. So much, in fact, that even Mac OS X is now doing it in its 64-bit version.

Because the CR3 points to a table of tables, it's actually very efficient. In PAE addressing (which is pretty much default now), the 32-bit address is treated as a 2+9+9+12:

- 2: pointer to a table of 4 entries. Each entry = 1GB. So sharing the top 1GB involves really sharing the last entry of this table! Sharing the 2GB involves sharing the last two entries of the table.

- 9: Pointer to 512 entries (2^9), of 64-bits (8-bytes) each. Fits snugly in a 4K page
- 9: Pointer to 512 entries (2^9) of 64-bits physical page address. Fits, again, in a 4K page.
- 12 bits: offset in page

64-bit is similar, breaking the address to 9+9+9+9+12, i.e. 48 bits. You don't use the entire 64-bit. The address lookup is therefore slower, making the TLB more important.

Hope this helps, (reply/IM me if it doesn't)


Anshul Makkar •Namaste Johnny , :)
Thanks for the answer.. Its really helpful and made the concept quite clear.. Thanks again..
Please can you elaborate on this "the cost of a user/kernel context switch is the same as of a thread (because CR3 doesnt change). ".

Also request you to please answer one last query "On similar lines how the stack is handled. We are aware that each thread has its own local stack for user mode operation. Then whats the role of kernel stack, is it different for each kernel thread.."
Thanks
Anshul Makkar
www.justkernel.com

Johnny Levin •Sure:

During a context switch, the kernel changes threads (modern operating systems schedule only threads, not processes). Now, when two threads are related to the same process, this means their CR3 (pointer to the page table, as discussed before) is the same value, and the page tables need not changed. This is great - because this means that the Translation Lookaside Buffer (TLB), which is often used as a cache (shortcut) for the virtual to physical mapping - need not be flushed, and there is a good chance the actual full lookup (2+9+9 for 32PAE, 9+9+9+9 for 64-bit) can be spared.

If, however, the context switch is between two unrelated threads - different values of CR3, i.e. different page tables - what we call a switch between two "processes" - then CR3 will need to be changed, and the TLB will be likely flushed. This means that subsequent lookups in the TLB will miss, and it will take time to repopulate the TLB.

So, this is (yet another) reason why context switches between processes are considerably slower between those of threads. Now - back to the kernel case - because it doesnt matter what process you are in , the top 1/2GB is the same anyway, then moving to kernel space is always the same cost as switching to a related thread. In other words, the kernel address always technically counts as if it is in *your* process, no matter what said process is. (Of course, it's not as simple as that - you cant just dereference a pointer to kernel space - this will get you a hardware-level fault, because you must be in Ring 0 to do that).

As for the other query - the kernel stack is the area the kernel uses for its function calls (and their associated automatic variables, etc). This is NOT the same as the user mode stack, as it is in kernel memory , and can only be accessed in kernel mode.

Now, as to whether the kernel stack is different, etc: that's a tad more complicated, and dependent on OS "interpretation". Generally, in the process model (common to UNIX and others), it's safe to say threads will have their own kernel stack. This is because threads can be suspended in kernel mode (and, in fact, commonly *are*) in mid-system call (e.g. blocking), and you have to store that thread state somewhere! A single stack for all threads simply wont cut it. Rather , the OS allocates some memory (8k on Linux, 12k on Win32, 24k on Win64, etc. depending on OS) per thread.

You might benefit from this: http://msdn.microsoft.com/en-us/windows/hardware/gg463190


Caveat: Some OSes take crazy shortcuts (e.g. OS X (=XNU) which uses mechanisms such as continuations to give threads an option to discard their stack and use a common 16K stack). This uses a common kernel stack to ALL threads, and makes for an even more efficient context switch.

Queries are free, you know. it doesnt have to be your last. I opened this group SO that people ask questions! As an old Hebrew saying goes - "Shy people will never learn". So keep them coming.

Hope this helps,

J