Archive | Process Scheduling RSS feed for this section

The Tortuous rode to extending the joys of extending Physical Memory Ranges and Page Sizes

My dear Brethren and Fellow travellers in the land x86 and the Linux Kernel, it is time to come clean on all modes that extend x86 page sizes and physical memory addressing.

So here we go Ka Boom Ka Beem: In 32-bit non-PAE mode, PSE and PSE36 (AKA PSE40), whereby PSE36/40 is a cheap way to get the extended memory addressing for 64GB/1TB, 4K/4M Paging that we created in PSE (Really!) with 32-bit PGDs and PTEs.

In 32-bit PAE mode, the architecture is 32-bit, but the PDPTEs, PDEs and PTEs extend to 64-bit, thereby extending Page sizes to 4K/2M (if PTE is not used), and memory addressing to 52-bits (4PB).

And lastly, but not leastly, we have IA-32E mode (true blue 64-bit architecture), with 48-bit linear addressing, 4K/2M/1G (if PDEs and PTEs are both not used) page sizes, 4PB of Memory addressing, and 64-bit PDPTEs, PDEs and PTEs.

John, did I get it all, and right also, this time ? DID you get the Cigar too ? Haaah ?

We all … must admit to an acute case of x86-itis, so the next post onwards, time to switch gears.  We will start looking next at the various implications to the Kernel of some of these x86-isms for which we have espoused eloquence in this and prior posts.

I will also note that we do do a good job of explaining these concepts in some of our training sessions, this feedback measured by course evaluations, and client (some of which are ISPs) feedbacks also.

I do hope everyone enjoyed this post. Thanks again

Comments { 0 }

Mitigating the Performance impacts of TLB Flushes on Context Switches

We all know, presumably, that MOV CR3 (the PDBR) is an essential part of the Linux Kernel’s context_switch routing. This is necessary, since the tables may have switched, but the MOV CR3 also flushes the TLB thereby forcing Page Table Walks.

Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of address space caching in the TLB across context switches.

In “pure architectures” (which the x86 is NOT, and for good reason of backward compatibility etc), the PID (Process ID) would have been “hashed” with TLB addressing, thereby avoiding the need for TLB Flushes on context switches. Not so with x86, since the PID is not part/parcel of the x86 Architecture.

Process-context identifiers (PCIDs) are a facility in x86 by which a logical processor may cache information for multiple linear-address spaces in the TLB, and preserve it across context switches.

As we noted above, The processor  retains cached Page-Table  information  when software switches to a different linear-address space by loading CR3, and presumable to a different Process (We ARE executing a context_switch)

A PCID is a 12-bit identifier, and may be thought of as a “Process-ID” for TLBs. If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3. Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4).
When a logical processor creates entries in the TLBs (Section 4.10.2 of the x86 prog reference manual) and paging structure caches (Section 4.10.3), it associates those entries with the current PCID (Oh … such a loose association of PCID with PID). Note that this means that where the PGD is located is somehow being interpreted in the  PID “process context”.  When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID, and hence flushes of the TLB  are avoided.

With the x86, my dear brothers and sisters in grief and joy, we take what you can get, and run. In this case, where TLB flushes are avoided for what will turn out to be 99% of the *current* address space, that is more than we can bargain for with Intel. I say.. Good Job Intel.

 

Comments { 0 }

LinuxEco Cross-Functional Quiz#1

OK… we have talked of Huge Pages in an earlier blog, and we have referred to Kernel Preemption.

So we know now a Huge TLB Page can be GB sized…. and so therefore it stands to reason that a preemption is involved somewhere in  kernel API hugetlb_fault. Right ? GF-rins.

SO … where is it that the Kernel is Possibly attempting a Preemption  on a page fault on a huge page ? More importantly, WHY would we do that ? Do we dig it now ? Haaaaaah ? Tell me then.

We explain these specific x86 and Linux Kernel features, concepts and more in detail in our class Advanced Kernel Memory Management , the second offering of which is to be made Corporate-direct. We suggest also   the Spring Session 2012 UCSC-Extension Advanced Linux Kernel Programming

Please take note of these and other upcoming Linux kernel training sessions. As always, Feedback, Questions and Comments are appreciated and will be responded to.

Comments { 0 }

Avoiding TLB Flushes on Context Switches on x86 Processors: The PCID

CR3, CR4 are Control registers on x86 processors that are used configure and manage protected-mode functionality. These instructions may only be executed at CPL(Current Privilege) == 0 . i.e. in Kernel Mode.
In the x86, linear addresses are translated by the TLB into Physical addresses (assuming Page Table Walks have been done prior to look up etc).
A Cr3 load switches out Page Tables. In the Linux Kernel, it is executed during scheduling new process at context switch time (context_switch). Before the days of the PCID (see below), a load of CR3 flushed the TLB.
Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of multiple address spaces in the TLB across context switches.
Process-context identifiers (PCIDs) are a facility in x86 processors  by which a logical processor may cache information for multiple linear-address spaces in the TLB.

The processor may retain cached information when software switches to a different linear-address space with a different PCID e.g., by loading CR3.

When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID

i.e. A PCID is a 12-bit identifier, and be thought of as a “Process-ID” for TLBs.

If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3.

Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4). Rules do apply on enabling PCID on x86 processors. Caveats emptor. Naturally, restrictions on the operating system may apply to take advantage of this mechanism. Context switches that require isolation of Linear addresses between processes must be done  with care. And/or  linear addresses between processes with PCID‘s enabled may overlap, and so will translations of Linear Addresses to Physical memory ! Ouch.

More on the implications of this for Linux will follow.

I  explain Linux Kernel concepts and more in my classes ( Advanced Linux Kernel Programming @UCSC-Extension, and also in other classes that I teach independently). As always, Feedback, Questions  and Comments are appreciated and will be responded to.

Comments { 0 }