Archive | Uncategorized RSS feed for this section

The Tortuous rode to extending the joys of extending Physical Memory Ranges and Page Sizes

My dear Brethren and Fellow travellers in the land x86 and the Linux Kernel, it is time to come clean on all modes that extend x86 page sizes and physical memory addressing.

So here we go Ka Boom Ka Beem: In 32-bit non-PAE mode, PSE and PSE36 (AKA PSE40), whereby PSE36/40 is a cheap way to get the extended memory addressing for 64GB/1TB, 4K/4M Paging that we created in PSE (Really!) with 32-bit PGDs and PTEs.

In 32-bit PAE mode, the architecture is 32-bit, but the PDPTEs, PDEs and PTEs extend to 64-bit, thereby extending Page sizes to 4K/2M (if PTE is not used), and memory addressing to 52-bits (4PB).

And lastly, but not leastly, we have IA-32E mode (true blue 64-bit architecture), with 48-bit linear addressing, 4K/2M/1G (if PDEs and PTEs are both not used) page sizes, 4PB of Memory addressing, and 64-bit PDPTEs, PDEs and PTEs.

John, did I get it all, and right also, this time ? DID you get the Cigar too ? Haaah ?

We all … must admit to an acute case of x86-itis, so the next post onwards, time to switch gears.  We will start looking next at the various implications to the Kernel of some of these x86-isms for which we have espoused eloquence in this and prior posts.

I will also note that we do do a good job of explaining these concepts in some of our training sessions, this feedback measured by course evaluations, and client (some of which are ISPs) feedbacks also.

I do hope everyone enjoyed this post. Thanks again

Comments { 0 }

The Tortuous Road to extending Page Size -and- Memory Size Addressing

When the earth was young, and you were not born, Unca Intel created Real Mode (Read previous article).

Real mode had the capacity to address a whopping 1 MB of Memory (10-bit address space), with each “program” (there were no processes) addressing 64 KB of space (since the registers etc were 16-bits in size. Believe you me, in those days, it was mongo memory. Ah those were the days …

Then came progress, and the first KING of the Microprocessor Family the 80386, and with it 32-bit addressing and 4K paging. Someone had a bright idea at Intel, why not use the 10-bits reserved to index into the PTE as extensions of page-size ? Viola ! And so were born 4M Pages.

However, memory sizes were still set at 4GB (so only 1000 4M pages, no Cigar John !). AND … it will be noted along with 4M Pages came  with no new paging hierarchy.

SO .. the next step was to extend Memory sizes … why not ? 52 bits it was (1 TB PAE, go for it John, what the frig !) with a new paging hierarchy, and we got 4K and 2M Page sizes. Did we hear NEW Paging hierarchy with 64-bit PDEs and PTEs  ?

 

Ouuch. No John, No way, we NEED our cake (OLD Paging Hierarchy with 32-bit PDEs and PTEs) and wanna eat it too (Larger memory Addressing, Larger anyway then 4G).

SO .. we created PAE-36 (36-bit Memory Addressing,  64 GB of DDRx addressing  with the “old yet modified” 32-bit PDEs and PTEs) with 4M Paging

But John, we TOLD YOU really needed LARGER Page sizes AND … Larger Memory. And lets not worry, Just  GO with paging Hierarchy changes , OK ?

And Until, then no kiss-kiss no Bang Bang.

SO .. John created PSE-4M Page sizes with 40-bit Memory (256 GB DDRx Addressing) and called it a Job Well Done. This was the original 4MB Page-Size PSE format extention WHICH was not intended for extending Physical Addresses by using new 64-bit paging hierarchies in the first place, but eventually did wind up doing just so. Amen.

BUT did he not get 1TB memory addressing also ? Thanks for correcting me John. With the new paging hierarchy, (64-bit PDPTEs/PDEs etc) that comes with PA52, Memory addressing extends out to 4TB – 4PB. More than enough for the EXT3/4 file system.And onto extended file systems.

SO then … would one of my esteemed colleagues please enlighten me on the maximum size of a EXT3/4 file system ? And why it is so ? More when we discuss File Systems on this blog. We discuss these concepts during our talks on Linux Kernel Programming Advanced at UCSC-extension, and during the course on Memory Management taught, please note Event Calender on right.

Comments { 0 }

Before the world of Linux Kernel Preemptions: a cooperative world

Kernel preemption tries to ensure fair usage of limited CPU resources. One way to understand kernel preemption is to explore its opposite. i.e. the way it WAS before the 2.6 linux kernel (which is preemptive). Well, the way it WAS .. was a cooperative world: A Process that got the CPU resources was expected to play nine-nine, and let cooperatively hand over the cpu (i.e. at EXIT system call), or the kernel could also, when it switched to user-mode, decide to schedule a new process. Processes were expected to be graceful in letting others use CPU resources.

Processes had to deal with many issues to try to ensure this model “worked”, however in additional to some hard problems created , there are some architectural features that just plumb stall out CPU resources and ensure suboptimal CPU resource usage regardless of whatever processes could have done about it. In other words, Processes do not have visibility into the underlying mechanisms of the operating system / kernel itself.

Linux Kernel preemption ensures what I consider to be somewhat fair (aka somewhat arbitrary) reallocation of CPU resources between more processes with “all the knowledge under the sun” on the underlying “goings on”. As an example, if the Kernel KNOWs the system is taking interrupts at HZ rate (see blog below), then why not try to prioritize between existing processes, and give others a chance to run ? If the Kernel KNOWS a process is to stall on a resource that may take some time to come available…why not put the process to “sleep” and “wake” someone fortunate process up ?

Well.. there are many reasons to NOT do that also (if processes are Real-time processes for example). Or to prevent “lockouts” etc etc

In the end, it boils down to a tradeoff between latency for the lucky few, .vs. throughput for the very many. And all shades in between, with considerations galore: A few of which are listed below->

Explicit and Implicit “Blocking”, Critical Code Section Synchronizations, Network and Block Device Processing Latencies and Throughputs, Interrupt Latencies, “Deferred Processing”, Safety in Preemptability (preventing lockouts because we have preempted tasks that should not have been), and there denials of preemptions / recursive depths of denials, relationships to interrups and recursive relationships to the above, system-programming architectural considerations and requirements (Scheduler Priorities, Classes etc), SMP / Cross Processor considerations, memory management, x86 Architectural considerations in Interrupt Latencies  etc etc

Again, all this is probably a good review for our past students. We explain these specific x86 features, Linux Kernel concepts and more in detail in my classes ( Advanced Linux Kernel Programming @UCSC-Extension), and also in other classes that I teach independently. Please take note, and take advantage also, of upcoming training sessions. As always, Feedback, Questions and Comments are appreciated and will be responded to.

Comments { 0 }

Some of these posts are so friggin hard to read !

That is the feedback I get. Yes, true. Put me on that list also.

This is true even to the initiated. This IS hard stuff. Make no bones about it. The fact that some of these posts may be difficult to read is perhaps one litmus test of the fact that good information is being communicated here.

And also we all learn more when we revisit Kernel topics “100% understood”. No such thing.

These posts are supposed to get the “click” to those that may be thinking on these issues. And to get the interested members of our workforce trained up in the collaborative stupefying complexity that makes up the Linux kernel.

Kernel developers who wish to extend their reach beyond their immediate pales of influence and Systems/Applications developers will be the first-level benefactors. Also, the line between a great Linux System Admin and Systems programmer is beginning to blur..

Additionally, I do hope the following taken in the right light. I am merely being correct when I say that instructive posts such as those posted here are more than what I got when I got stared with Unix, and then Linux. We had to, and still have to struggle, though I will also add the ROI gets better with the years.

Now we have training sessions for the depths of the Kernel itself. We also discuss systems-level issues, and let the discussion go where it may (within the bounds of reason and time allocated to us). This is true with us, and elsewhere also we hope. These training sessions are very helpful, and this info based on student feedback.

Also, in these sessions, it then becomes clear just why an in-person instruction and Q&A may help clarify matters beyond what may be even reasonably or remotely possible in these blogs.

Some of our upcoming posts will deal the overall issues related to MPX / Multithreaded / Multicore systems. And their relationships to the Linux kernel itself. We also have a talk coming up at the Dojo on this topic. Please put in on your calender.

And we will be blogging a bit(! / lot) more on Huge Pages etc…Cheers !

Comments { 0 }