Archive | Linux Kernel x86 RSS feed for this section

Mitigating the Performance impacts of TLB Flushes on Context Switches

We all know, presumably, that MOV CR3 (the PDBR) is an essential part of the Linux Kernel’s context_switch routing. This is necessary, since the tables may have switched, but the MOV CR3 also flushes the TLB thereby forcing Page Table Walks.

Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of address space caching in the TLB across context switches.

In “pure architectures” (which the x86 is NOT, and for good reason of backward compatibility etc), the PID (Process ID) would have been “hashed” with TLB addressing, thereby avoiding the need for TLB Flushes on context switches. Not so with x86, since the PID is not part/parcel of the x86 Architecture.

Process-context identifiers (PCIDs) are a facility in x86 by which a logical processor may cache information for multiple linear-address spaces in the TLB, and preserve it across context switches.

As we noted above, The processor  retains cached Page-Table  information  when software switches to a different linear-address space by loading CR3, and presumable to a different Process (We ARE executing a context_switch)

A PCID is a 12-bit identifier, and may be thought of as a “Process-ID” for TLBs. If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3. Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4).
When a logical processor creates entries in the TLBs (Section 4.10.2 of the x86 prog reference manual) and paging structure caches (Section 4.10.3), it associates those entries with the current PCID (Oh … such a loose association of PCID with PID). Note that this means that where the PGD is located is somehow being interpreted in the  PID “process context”.  When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID, and hence flushes of the TLB  are avoided.

With the x86, my dear brothers and sisters in grief and joy, we take what you can get, and run. In this case, where TLB flushes are avoided for what will turn out to be 99% of the *current* address space, that is more than we can bargain for with Intel. I say.. Good Job Intel.

 

Comments { 0 }

Give me your X86 Mode, make it Real, or else forget about it !

OK… I am talking about my friend Carlos Santana, and also about the difference between x86 Intel Real Mode Addressing and Protected Mode Addressing.

We will hit it up with Carlos later on in this blog. But business first for now, shall we ?

We would like to eventually explore just why .. oh why … Real Mode is limited to 1MB Memory.

And while doing so, also understand x86 “Protected Mode” addressing. We start with an x86 assembly instruction (AT&T syntax, because that is what the Linux Kernel uses), and illustrate address formation at the most important fundamental levels:

movl (%ebp), %eax

Let us look at Protected Mode Address formation first:

In this instruction shown above, We use ebp to form the “effective” address (ebp IS the “effective” address), then we add the “base” from the (default) segment register (DS) to the “effective address” to come up with the “linear address”. Which will be used to look up the TLB to translate into a Physical Address (if Paging is enabled), or the Linear Address becomes the Physical Address if Paging is NOT enabled.

The “Selector” shown IS the value of the “Selector” component of the Segment Register “DS” (Each Segment register has a “Selector” component). In Protected mode, the 3rd-most lsb of the “Selector” is called the “TI” / Table Indicator. It is use to select one of two tables (GDT or the LDT), bits (31:3) then index into the GDT or the LDT, and we come with a 8-byte “Segment Descriptor”.

That “Segment Descriptor” we just came up with has a 32-bit “Base”, which will be added to the Effective Address (which in our assembly instruction, is the value of General Purpose Register EBP), and the net result becomes the Linear Address. And that Linear Address either goes through the Paging Translation to determine Physical Mmeory Address (If Paging IS enabled)…. or IS the Physical Address if Paging Mode is NOT enabled (as determined by the CR0.PG bit)

All of the above applies to x86 Protected Mode. How about “Real Mode” which does not have Descriptors etc or even Paging Mode ?

Well, the Selector (DS) and Register Sizes sizes (ebp) can only be 16 bits in real mode. The “Base” of the descriptor is replaced by the Selector Left-Shifted by 4 (* 16), and that is added to the effective address (bp).

Note that this implies that Linear Address is

And we have just corroborated 60 minutes that x86 Real Mode is limited to 1MB Memory addressing.

Ofcourse Carlos, x86 has now gone to Infinity and Beyond with Huge Pages and 48-bit Linear Addressing. And the Implications are huge also with moving beyond the measly (and respected) 4K Page sizes to 2M/4M/1G Page sizes. We have a blog below on this, and more coming.

As well as with some other key areas, Memory Management plays a key and consuming role in Optimizing for Multiprocessing / Multicore and Multithreaded execution models.

There is no substitute in this regard for our Course Advanced Kernel memory Management. Hit us up on it.

We explain these specific x86 features, Linux Kernel concepts and more in detail in my classes ( Advanced Linux Kernel Programming @UCSC-Extension), and also in other classes that I teach independently. Please take note, and take advantage also, of upcoming training sessions. As always, Feedback, Questions and Comments are appreciated and will be responded to. Cheers !

Take it away Carlos Santana !

Comments { 0 }

Linux Boot: The Beginning was startup_32, and it was with Linus, and it WAS Linus also

In an earlier post, we referred to startup_32. Well, __start and start_of_setup the target of the int 19h within the boot context, the target of the first start_32 is located at the 1M watermark as we had mentioned.

What is also special about __start is that we see here the first instruction executed within the context of the kernel (in the case of the objdump shown, we have opted out of the “SAFE RESET of the Controller at config time”, because INT19H presumably has done a good thorough job in the boot context), and we are onto creating a nice clean stack and then check out the magic codes of the boot sectors etc –>

By the time we get to the first startup_32, we are in protected mode, memory > 1MB can be accessed (the target of the decompress). The second startup_32 can therefore be located at the 1M watermark (0010 0000h), and with VA relocation at c010 0000h.

What is so special about this watermark ? It is … the first time we have executed instructions beyond the addressing limits set by x86 real-mode (which as we all know is limited in memory access to 0xFFFFF ..

We discuss Linux Kernel startup/boot concepts and more in my classes with kernel code walk throughs and programming assignments ( Advanced Linux Kernel Programming @UCSC-Extension, and also in other classes that I teach independently). Please take note, and take advantage also, of upcoming training sessions. Anand has also written production x86 protected-mode microcode, so is in a unique position to educate on that front. As always, Feedback, Questions and Comments are appreciated and will be responded to.

Comments { 0 }

Avoiding TLB Flushes on Context Switches on x86 Processors: The PCID

CR3, CR4 are Control registers on x86 processors that are used configure and manage protected-mode functionality. These instructions may only be executed at CPL(Current Privilege) == 0 . i.e. in Kernel Mode.
In the x86, linear addresses are translated by the TLB into Physical addresses (assuming Page Table Walks have been done prior to look up etc).
A Cr3 load switches out Page Tables. In the Linux Kernel, it is executed during scheduling new process at context switch time (context_switch). Before the days of the PCID (see below), a load of CR3 flushed the TLB.
Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of multiple address spaces in the TLB across context switches.
Process-context identifiers (PCIDs) are a facility in x86 processors  by which a logical processor may cache information for multiple linear-address spaces in the TLB.

The processor may retain cached information when software switches to a different linear-address space with a different PCID e.g., by loading CR3.

When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID

i.e. A PCID is a 12-bit identifier, and be thought of as a “Process-ID” for TLBs.

If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3.

Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4). Rules do apply on enabling PCID on x86 processors. Caveats emptor. Naturally, restrictions on the operating system may apply to take advantage of this mechanism. Context switches that require isolation of Linear addresses between processes must be done  with care. And/or  linear addresses between processes with PCID‘s enabled may overlap, and so will translations of Linear Addresses to Physical memory ! Ouch.

More on the implications of this for Linux will follow.

I  explain Linux Kernel concepts and more in my classes ( Advanced Linux Kernel Programming @UCSC-Extension, and also in other classes that I teach independently). As always, Feedback, Questions  and Comments are appreciated and will be responded to.

Comments { 0 }