Archive | Memory Management RSS feed for this section

The Tortuous rode to extending the joys of extending Physical Memory Ranges and Page Sizes

My dear Brethren and Fellow travellers in the land x86 and the Linux Kernel, it is time to come clean on all modes that extend x86 page sizes and physical memory addressing.

So here we go Ka Boom Ka Beem: In 32-bit non-PAE mode, PSE and PSE36 (AKA PSE40), whereby PSE36/40 is a cheap way to get the extended memory addressing for 64GB/1TB, 4K/4M Paging that we created in PSE (Really!) with 32-bit PGDs and PTEs.

In 32-bit PAE mode, the architecture is 32-bit, but the PDPTEs, PDEs and PTEs extend to 64-bit, thereby extending Page sizes to 4K/2M (if PTE is not used), and memory addressing to 52-bits (4PB).

And lastly, but not leastly, we have IA-32E mode (true blue 64-bit architecture), with 48-bit linear addressing, 4K/2M/1G (if PDEs and PTEs are both not used) page sizes, 4PB of Memory addressing, and 64-bit PDPTEs, PDEs and PTEs.

John, did I get it all, and right also, this time ? DID you get the Cigar too ? Haaah ?

We all … must admit to an acute case of x86-itis, so the next post onwards, time to switch gears.  We will start looking next at the various implications to the Kernel of some of these x86-isms for which we have espoused eloquence in this and prior posts.

I will also note that we do do a good job of explaining these concepts in some of our training sessions, this feedback measured by course evaluations, and client (some of which are ISPs) feedbacks also.

I do hope everyone enjoyed this post. Thanks again

Comments { 0 }

Mitigating the Performance impacts of TLB Flushes on Context Switches

We all know, presumably, that MOV CR3 (the PDBR) is an essential part of the Linux Kernel’s context_switch routing. This is necessary, since the tables may have switched, but the MOV CR3 also flushes the TLB thereby forcing Page Table Walks.

Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of address space caching in the TLB across context switches.

In “pure architectures” (which the x86 is NOT, and for good reason of backward compatibility etc), the PID (Process ID) would have been “hashed” with TLB addressing, thereby avoiding the need for TLB Flushes on context switches. Not so with x86, since the PID is not part/parcel of the x86 Architecture.

Process-context identifiers (PCIDs) are a facility in x86 by which a logical processor may cache information for multiple linear-address spaces in the TLB, and preserve it across context switches.

As we noted above, The processor  retains cached Page-Table  information  when software switches to a different linear-address space by loading CR3, and presumable to a different Process (We ARE executing a context_switch)

A PCID is a 12-bit identifier, and may be thought of as a “Process-ID” for TLBs. If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3. Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4).
When a logical processor creates entries in the TLBs (Section 4.10.2 of the x86 prog reference manual) and paging structure caches (Section 4.10.3), it associates those entries with the current PCID (Oh … such a loose association of PCID with PID). Note that this means that where the PGD is located is somehow being interpreted in the  PID “process context”.  When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID, and hence flushes of the TLB  are avoided.

With the x86, my dear brothers and sisters in grief and joy, we take what you can get, and run. In this case, where TLB flushes are avoided for what will turn out to be 99% of the *current* address space, that is more than we can bargain for with Intel. I say.. Good Job Intel.


Comments { 0 }

The Tortuous Road to extending Page Size -and- Memory Size Addressing

When the earth was young, and you were not born, Unca Intel created Real Mode (Read previous article).

Real mode had the capacity to address a whopping 1 MB of Memory (10-bit address space), with each “program” (there were no processes) addressing 64 KB of space (since the registers etc were 16-bits in size. Believe you me, in those days, it was mongo memory. Ah those were the days …

Then came progress, and the first KING of the Microprocessor Family the 80386, and with it 32-bit addressing and 4K paging. Someone had a bright idea at Intel, why not use the 10-bits reserved to index into the PTE as extensions of page-size ? Viola ! And so were born 4M Pages.

However, memory sizes were still set at 4GB (so only 1000 4M pages, no Cigar John !). AND … it will be noted along with 4M Pages came  with no new paging hierarchy.

SO .. the next step was to extend Memory sizes … why not ? 52 bits it was (1 TB PAE, go for it John, what the frig !) with a new paging hierarchy, and we got 4K and 2M Page sizes. Did we hear NEW Paging hierarchy with 64-bit PDEs and PTEs  ?


Ouuch. No John, No way, we NEED our cake (OLD Paging Hierarchy with 32-bit PDEs and PTEs) and wanna eat it too (Larger memory Addressing, Larger anyway then 4G).

SO .. we created PAE-36 (36-bit Memory Addressing,  64 GB of DDRx addressing  with the “old yet modified” 32-bit PDEs and PTEs) with 4M Paging

But John, we TOLD YOU really needed LARGER Page sizes AND … Larger Memory. And lets not worry, Just  GO with paging Hierarchy changes , OK ?

And Until, then no kiss-kiss no Bang Bang.

SO .. John created PSE-4M Page sizes with 40-bit Memory (256 GB DDRx Addressing) and called it a Job Well Done. This was the original 4MB Page-Size PSE format extention WHICH was not intended for extending Physical Addresses by using new 64-bit paging hierarchies in the first place, but eventually did wind up doing just so. Amen.

BUT did he not get 1TB memory addressing also ? Thanks for correcting me John. With the new paging hierarchy, (64-bit PDPTEs/PDEs etc) that comes with PA52, Memory addressing extends out to 4TB – 4PB. More than enough for the EXT3/4 file system.And onto extended file systems.

SO then … would one of my esteemed colleagues please enlighten me on the maximum size of a EXT3/4 file system ? And why it is so ? More when we discuss File Systems on this blog. We discuss these concepts during our talks on Linux Kernel Programming Advanced at UCSC-extension, and during the course on Memory Management taught, please note Event Calender on right.

Comments { 0 }