Archive | July, 2012

The Tortuous Road to extending Page Size -and- Memory Size Addressing

When the earth was young, and you were not born, Unca Intel created Real Mode (Read previous article).

Real mode had the capacity to address a whopping 1 MB of Memory (10-bit address space), with each “program” (there were no processes) addressing 64 KB of space (since the registers etc were 16-bits in size. Believe you me, in those days, it was mongo memory. Ah those were the days …

Then came progress, and the first KING of the Microprocessor Family the 80386, and with it 32-bit addressing and 4K paging. Someone had a bright idea at Intel, why not use the 10-bits reserved to index into the PTE as extensions of page-size ? Viola ! And so were born 4M Pages.

However, memory sizes were still set at 4GB (so only 1000 4M pages, no Cigar John !). AND … it will be noted along with 4M Pages came  with no new paging hierarchy.

SO .. the next step was to extend Memory sizes … why not ? 52 bits it was (1 TB PAE, go for it John, what the frig !) with a new paging hierarchy, and we got 4K and 2M Page sizes. Did we hear NEW Paging hierarchy with 64-bit PDEs and PTEs  ?


Ouuch. No John, No way, we NEED our cake (OLD Paging Hierarchy with 32-bit PDEs and PTEs) and wanna eat it too (Larger memory Addressing, Larger anyway then 4G).

SO .. we created PAE-36 (36-bit Memory Addressing,  64 GB of DDRx addressing  with the “old yet modified” 32-bit PDEs and PTEs) with 4M Paging

But John, we TOLD YOU really needed LARGER Page sizes AND … Larger Memory. And lets not worry, Just  GO with paging Hierarchy changes , OK ?

And Until, then no kiss-kiss no Bang Bang.

SO .. John created PSE-4M Page sizes with 40-bit Memory (256 GB DDRx Addressing) and called it a Job Well Done. This was the original 4MB Page-Size PSE format extention WHICH was not intended for extending Physical Addresses by using new 64-bit paging hierarchies in the first place, but eventually did wind up doing just so. Amen.

BUT did he not get 1TB memory addressing also ? Thanks for correcting me John. With the new paging hierarchy, (64-bit PDPTEs/PDEs etc) that comes with PA52, Memory addressing extends out to 4TB – 4PB. More than enough for the EXT3/4 file system.And onto extended file systems.

SO then … would one of my esteemed colleagues please enlighten me on the maximum size of a EXT3/4 file system ? And why it is so ? More when we discuss File Systems on this blog. We discuss these concepts during our talks on Linux Kernel Programming Advanced at UCSC-extension, and during the course on Memory Management taught, please note Event Calender on right.

Comments { 0 }

x86 Segmentation Protection Mechanisms: #1

At the onset, please accept my apologies for the X86 protection mechanisms.

I did not create it. It IS difficult to understand (huge understatement). It IS elegant also (Oucchh). I am merely a messenger that is making an attempt at explaining it. The complaint department on the x86 architecture is with the FBI.. just kidding.

X86 protections revolve around the Current Privilege Level (CPL). CPL is a two-bit field that can take on four privilege levels, 0,1,2,3.

NOTE here please the types of instructions that enable moves between different privilege levels etc.

A future post will follow that describes just how and when the CPL changes, but for now let us take for granted the following: That when x86 processors wake up, they are in in REAL MODE (Give Me Your Real Mode), and CPL is set to numerical “0″. This is the Highest Privilege a x86 processor can operate at, and the Linux Kernel operates at this level. All applications on x86 processors operate at Level 3.

This article focuses on a MOV into a Data Segment register (accomplished with a “MOV DS,AX” for example), and the protection mechanisms associated with that.

Note that a Code Segment CS is “just” another segment register, however a MOV into CS is done with a “transfer control” instruction (CALL FAR, JMP, Software-INT, RET, IRET etc), or an Asynchronous Event (Hardware Interrupt).

In the MOV DS, AX, the AX contains a 16-bit SELECTOR (Bits 1:0), which has a two-bit Requestor Privilege Level (RPL) Value. As explained in an earlier post, the “TI” bit of that SELECTOR has a Table Indicator Bit (“TI”) that is used to select wither the GDT (Global Descriptor Table) or the LDT. The LDT and GDT point to arrays of 8-byte DESCRIPTORs, and one of these DESCRIPTORs is chosen by the SELECTOR to be loaded into the Segment Register DS.

That Chosen DESCRIPTOR has a two-bit (surprise surprise) DPL (Descriptor Privilege Level) field.

SO … the major “players” in Protection Mechanisms are the CPL, DPL and the RPL.

Here, it is important to note the special role that the Segment SELECTOR plays.

In addition to identifying the GDT or the LDT that will be used (as indicated by the “TI” bit of the SELECTOR), and the descriptor itself within the GDT or LDT (as identified by SELECTOR[15:3]), the RPL fiels of the selector plays a role of “WEAKENING” the CPL (Current Privilege Level).

The thinking is that the RPL represents a Privilege level of sorts, and the numerically larger of CPL and SELECTOR’s RPL is the CPL.Effective (Effective CPL) for the purposes of the Segment register Load.

The “MOV DS, AX” (into Segment Register) sets up the memory space that will become accessible to the program doing the MOV DS. The protection check performed (CPL.effective <= Descriptor.DPL) will either enable the MOV DS.

IF the Protection checks fail for the MOV DS, the processor hardware will generate a #GP (General Protection) Fault.

This methodology for Segment register Loads is used, with some modifications, for the gamut of Segment register Loads within X86, wherever they may happen.

“Far” Jumps etc load the Code Segment (CS), when there is a change in Privilege Levels, or on a MOV SS, the stack segment is loaded. On a MOV SS, the protection fault generated on protection failure is the Stack Fault (#SS).

From the above, it become clear that the Kernel has full and complete access to the User Mode memory. This is a capability that is necessary, for example, when in response to a READ system call, the Kernel reads data from Disks (In Privilege Level 0), then copies the Data into User Space (operating in Privilege Level 3). But the vice versa is not true.

Understanding the Linux x86 Server memory model is not possible without understanding x86 Segmentation, the Protection Mechanisms, and the Paging /TLB Mechanisms (To be discussed later).

Comments { 0 }