Archive | Kernel Threads RSS feed for this section

Mitigating the Performance impacts of TLB Flushes on Context Switches

We all know, presumably, that MOV CR3 (the PDBR) is an essential part of the Linux Kernel’s context_switch routing. This is necessary, since the tables may have switched, but the MOV CR3 also flushes the TLB thereby forcing Page Table Walks.

Avoiding TLB flushes on Loads of CR3 are key to avoiding performace hits  on context switches.  In other words, a processor really needs to facilitate the storage of address space caching in the TLB across context switches.

In “pure architectures” (which the x86 is NOT, and for good reason of backward compatibility etc), the PID (Process ID) would have been “hashed” with TLB addressing, thereby avoiding the need for TLB Flushes on context switches. Not so with x86, since the PID is not part/parcel of the x86 Architecture.

Process-context identifiers (PCIDs) are a facility in x86 by which a logical processor may cache information for multiple linear-address spaces in the TLB, and preserve it across context switches.

As we noted above, The processor  retains cached Page-Table  information  when software switches to a different linear-address space by loading CR3, and presumable to a different Process (We ARE executing a context_switch)

A PCID is a 12-bit identifier, and may be thought of as a “Process-ID” for TLBs. If CR4.PCIDE = 0 (but 17 of CR4), the current PCID is always 000H; otherwise, the current PCID is the value of bits 11:0 of CR3. Non-zero PCIDs are enabled by setting the PCIDE flag (bit 17 of CR4).
When a logical processor creates entries in the TLBs (Section 4.10.2 of the x86 prog reference manual) and paging structure caches (Section 4.10.3), it associates those entries with the current PCID (Oh … such a loose association of PCID with PID). Note that this means that where the PGD is located is somehow being interpreted in the  PID “process context”.  When using entries in the TLBs and paging-structure caches to translate a linear address, a logical processor uses only those entries associated with the current PCID, and hence flushes of the TLB  are avoided.

With the x86, my dear brothers and sisters in grief and joy, we take what you can get, and run. In this case, where TLB flushes are avoided for what will turn out to be 99% of the *current* address space, that is more than we can bargain for with Intel. I say.. Good Job Intel.

 

Comments { 0 }

The linux boot sequence: start_kernel, and there was light

Continuing with our discussion Kernel Boot, start_kernel() is implicit Process 0 (the PID was initialized to zero, lets not go looking for a process 0 with “ps”), AKA the “root thread”, the grand daddy of all processes to come. And the fall-back guy, as we will see.

Process 0 (PID 0, root thread) spawns off Process 1, known as the kernel_init (kernel thread) process, which will /sbin/init as the thread we just created (created, not scheduled to run.. not just yet).

Then the kernel process to start off other kernel processes is created (kthreadd) aka Process 2.

Process 1 (PID 1) could become the idle thread for the CPU when it executes.

Regardless, we then schedule() (we DID create a process or two hopefully). Nota Bene: when the /sbin/init kernel thread was created, we only have Processes 0 and 1 in the run queue. i.e. Process 1 becometh /sbin/init. Per 2.6.32.2 atleast and for awhile now.

As part of schedule(), processes may have pooped out or popped in, so we will probably find something to run, right ? After all we have just gone through, we have a right to expect to have something to run. Grins.

However, in the unlikely event that we (eventually) no-can-do schedule() in or out, then we cpu_idle for as long as it takes to find a process to run. Oh … this time, under the aegis of Process 0, which, by our own definition, is the last man standing. Also which, we are well aware of, cannot be “ps”-ed.

Please do note, a system-specific idle could have been created by /sbin/init, with PID being what we get, since the init thread and its children created could, in principle, have fork-ed till kingdom come.

We did mention that we needed to, and actually did enable/disable, preemption along the way depending on whether we were ready to schedule() or not. Given that we are within the pale of initialization, and memory locations are of origins and values unknown, caution is indeed the better part of valor. So why not apply that principle to thread info’s also ? Right choice.

Such indeed are the joys of Linux Kernel Programming. Grins ?

We explain these specific Linux Kernel concepts and more in detail in my classes ( Advanced Linux Kernel Programming @UCSC-Extension), and also in other classes that I teach independently. Please take note, and take advantage also, of upcoming training sessions. As always, Feedback, Questions and Comments are appreciated and will be responded to.

Thanks

-Anand

Comments { 0 }

Give me another jiffy, only a bit later

Given the description below, jiffies + 10*HZ means (now + 10 Seconds). And jiffies * 1000 would mean the elapsed time in milliseconds etc.

An example of usage in the Linux Kernel would be in the drivers for XD, where, jiffies + 8*HZ later (as initialized in the member element expires of static struct timer_list) a timer interrupt handler is programmed to execute.

The “watchdog handler” xd_watchdog, in this particular instance, wakes up a sleeping process.

Needless to say, all appropriate caveats emptor apply. Appropriate Processes need to have been sleeping, Timers must have been declared and initialized etc etc

I will blog on this at length later. Please subscribe to our mail list for automated updates on new blog entries.

I explain this specific Linux Kernel concept and more in my classes ( Advanced Linux Kernel Programming @UCSC-Extension, and also in other classes that I teach independently). As always, Feedback, Questions and Comments are appreciated and will be responded to.

Thanks

Comments { 0 }