Skip to content

[kernel] Remove most irq blocking, add early EOI in ne2k driver#237

Open
Mellvik wants to merge 1 commit intomasterfrom
netdrivers
Open

[kernel] Remove most irq blocking, add early EOI in ne2k driver#237
Mellvik wants to merge 1 commit intomasterfrom
netdrivers

Conversation

@Mellvik
Copy link
Owner

@Mellvik Mellvik commented Mar 17, 2026

Thanks to the new 'controlled' kernel interrupt regime, it has been possible to remove almost all IRQ protection from the ne2k network driver. The remaining 'concurrency risk' - an interrupt happening while a kernel write call is sending data to the NIC - is handled via NIC interrupt blocking: Masking all NIC interrupts.

Also, since there is very little activity in the driver interrupt routine that may be disturbed in bad ways by other interrupts, an EOI is sent to the PIC shortly after reaching the interrupt handler. This not only allows other interrupts to happen, it also opens op for the driver interrupt handler to perform time consuming operations without affecting overall system responsiveness, like filling multiple buffers from the NIC.

The changes have been heavily tested on slow hardware and turns out to be very stable. While extremely slow under heavy load, there have not been hangs or lockups. ktcp needs more tuning for such situations, but that's a different story.

These driver changes pave way for experimenting with new buffer strategies and other optimizations in the ne2k driver, as discussed in #212.

@ghaerr
Copy link

ghaerr commented Mar 17, 2026

This not only allows other interrupts to happen, it also opens op for the driver interrupt handler to perform time consuming operations

While I haven't done a code review and assume that removing most IRQ protection is OK since it most were not required anyways, I want to point out that disabling NIC interrupts and then sending an early EOI is definitely not the desired design pattern for writing drivers for kernels supporting a top half/bottom half interrupt execution structure.

It may be that disabling NIC interrupts while performing app writes is required, I would have to think more about that.

As discussed previously, sending an early EOI is almost never a good idea, and can only be fully mitigated by disabling the ability to create another interrupt on the hardware device. But doing so negates the entire purpose of the sophisticated kernel code handling and scheduling the top-half/bottom half interrupt handling structure. The idea is that the driver itself should be structured such that top half interrupt execution is as fast as possible, while all other work is performed by the kernel-scheduled bottom half handler. The system itself handles the bottom half scheduling, rather than the driver author manipulating the PIC and hardware device in a one-off way in every driver. Playing with system components (e.g. the PIC) specially in a one driver can affect other driver operation in ways not assumed by the other driver authors. This kind of reminds me of the wild west DOS days where every driver was on its own, bringing with it unintended compatibility issues.

Sending an early EOI to supposedly "help the system" handle other interrupts then introduces its own new problem of requiring the NIC to be reprogrammed to not generate another interrupt - effectively muting the entire top half/bottom half design, and just force-pushing bottom half execution into the top half rather than rewriting the driver. A good analogy would be modifying the serial driver to disable the receive interrupt on every receive interrupt, then sending an early EOI to allow other interrupts, then reading the received byte... a total kluge and what for? Will it work? Yes, but such a design might only be required for systems like DOS where there is no system-wide method of scheduling kernel activity, such as we now have with working top/bottom half scheduled interrupt handlers.

It may be that adding the early EOI and disabling NIC interrupts is only one step in a multi-step process to getting to a real bottom half handler, but thought to mention this. If not, well it doesn't matter except that it's not the top/bottom half model, as each driver is apparently written "on its own", rather than becoming part of the new integrated kernel interrupt handling design for device drivers. The original intention of the top/bottom half driver model is to actually simplify driver writing - though various previous driver code does need to be sectioned out quite differently.

@Mellvik
Copy link
Owner Author

Mellvik commented Mar 17, 2026

Thanks @ghaerr - and yes, this is indeed experimental - and work in progress.

This not only allows other interrupts to happen, it also opens op for the driver interrupt handler to perform time consuming operations

While I haven't done a code review and assume that removing most IRQ protection is OK since it most were not required anyways, I want to point out that disabling NIC interrupts and then sending an early EOI is definitely not the desired design pattern for writing drivers for kernels supporting a top half/bottom half interrupt execution structure.

This is not what happens. NIC interrupts are disabled only when the kernel writes directly to the NIC, that is, the interrupt handler is not involved. As the PR notes say, in such cases there may be an interrupt while the write is in progress which is what the CLI/STI pair was there to protect against before. The change is less intrusive in a system wide perspective, that's all.

It may be that disabling NIC interrupts while performing app writes is required, I would have to think more about that.

They are, as described above - since direct writes are allowed. And they are hard to get rid of.

As discussed previously, sending an early EOI is almost never a good idea, and can only be fully mitigated by disabling the ability to create another interrupt on the hardware device. But doing so negates the entire purpose of the sophisticated kernel code handling and scheduling the top-half/bottom half interrupt handling structure. The idea is that the driver itself should be structured such that top half interrupt execution is as fast as possible, while all other work is performed by the kernel-scheduled bottom half handler. The system itself handles the bottom half scheduling, rather than the driver author manipulating the PIC and hardware device in a one-off way in every driver.

Like I said, this is experimental. Also it seems to me we've had this discussion before, where I pointed out that this is not how the system works. First, yes - this may not be the best place to put the EOI and I'm interesting in better alternatives. Second, the device is implicitly protected from making new interrupts until the driver (and driver writer) allows it to. AFAIK this applies to most, maybe all devices. Until the driver has acknowledged all pending interrupts, the device will not filed a new one. This means that - on a busy system, a single interrupt may cause the interrupt handler to handle many incoming and outgoing packet before exiting.

On a slow system in particular, but really on any system, this is going to affect overall system response in a noticable way. I realise that the intention behind the bottom half is to take care of such situations, but a run every 10 ms is not going to fix it for a business NIC. Also, the work to be done is the same regardless of whether it's done here or there, the point being to find the optimal way to utilize the system resources. That said I certainly agree with you that there has to be an architecture, a standard way of doing things.

IOW - while I find the TH/BH structure advantageous and very well suited for the serial IO requirements (as a recently got to experience firsthand with the now completed flow control implementation) it doesn't lend itself well to the NIC situation where there are no realtime requirements (buffers at both ends) but 'big' and (sometimes) plenty fast transactions per interrupt.

I find this discussion interesting and valuable - there are always other considerations to take into accounts, and I'd certainly getting my share of surprises along the way.

@ghaerr
Copy link

ghaerr commented Mar 17, 2026

but a run every 10 ms is not going to fix it for a business NIC.

That's not how bottom halves work, they aren't scheduled every 10ms. BHs are typically scheduled directly after the TH runs, and can't be put off further, unless a BH is already running. You're likely confusing the chosen aggregation of the serial bottom halves to a single serial bottom half run during the clock tick BH for reasons of efficiency. This was done to remove the overhead of tons of serial BHs run at a high frequency directly after the TH interrupt. But this is quite different than running BH code in the TH after an early EOI is sent, completely bypassing the intended system.

Also, the work to be done is the same regardless of whether it's done here or there,

That's also definitely not the way to look at it - precise knowledge of exactly when any given code will be executed is key to both understanding the TH/BH system as well as writing code for it - where, properly designed, the PIC is never touched by the interrupt handler, nor are device interrupts disabled by a handler. Handling those two items in a well-defined and consistent way is one of the main benefits of the whole system, and takes the driver author out of the hack-my-own-way approach to drivers which introduce needless complexity, since the problem of fully understanding exactly when certain code can and should be executed is never figured out.

while I find the TH/BH structure advantageous and very well suited for the serial IO requirements (as a recently got to experience firsthand with the now completed flow control implementation) it doesn't lend itself well to the NIC situation where there are no realtime requirements (buffers at both ends) but 'big' and (sometimes) plenty fast transactions per interrupt.

Completely disagree: first, the serial I/O isn't actually handled with a normal BH routine, instead an aggregated clock tick BH is used, which isn't really normal for what the new TH/BH system was designed for. Second, all of Linux drivers use the TH/BH methodology, which forces the driver writer into handling high priority interrupts as TH code, which have to be, while force-removing all slow code from executing in the TH. Imagine what a mess Linux would be with its hundreds of device drivers if everyone hacked their own PIC/device access directly. Again - that's kind of my point - the design was previously proven by Linux, and adopted after careful thought into ELKS so that drivers (hopefully from TLVC) could be easily incorporated. The BH design was very thoroughly tested, and also allows for easy measurement of when top half execution overtakes bottom half scheduling. See Linux 2.0 clock tick source for details.

It doesn't really matter - one doesn't have to use elegant designs, and there are many ways to make things work. I'm only speaking for ELKS, not TLVC. My advice would be to read the code and fully understand BH execution before hacking the PIC in a driver to get around it. Another alternative is to just write NIC drivers however you want, using a different interrupt approach for every driver type. That's pretty much where we were five years ago though. I could go on, but I think I've said enough.

@ghaerr
Copy link

ghaerr commented Mar 17, 2026

This is not what happens. NIC interrupts are disabled only when the kernel writes directly to the NIC,

I see, got it. I was thinking that NIC interrupts were being disabled during the top half interrupt handler, which is quite different.

If the transmit operation is buffered, then this case can be relaxed, as the app can always transfer the user data to an available buffer without interrupts disabled, and the next transmit interrupt will use that buffer. If no buffers are available, the the process can sleep until a buffer is available, then do the above. In both cases interrupts never need to be disabled and system throughput is higher due to the large possibility of not dropping packets while transferring ~1500 bytes to the NIC. That operation would happen in either the top or bottom half of the driver.

First, yes - this may not be the best place to put the EOI and I'm interesting in better alternatives.

The better alternative is to let the low-level system IRQ handler send the proper EOI, after the C interrupt (top half) handler returns. This also guarantees no kernel stack overflow when the driver author goofs up. The top half should perform only what is absolutely necessary during interrupt time and let the rest execute in the bottom half. I would think that this would include all I/O being moved to the bottom half, but don't know for sure yet. And don't forget that by sending an "early EOI", the low level sends another one, which will screw things up big time if/when the system is in a nested interrupt state. This is because we're always sending a general end-of-interrupt command, 0x20 (or 0xA0 to the slave controller). I would assume you're already writing more code to send the proper EOI command based on the IRQ number (yet another reason not to do all this in the driver).

Second, the device is implicitly protected from making new interrupts until the driver (and driver writer) allows it to.

Well, yes, the whole idea is that the system handles the business of allowing new interrupts, not the driver author, who shouldn't be trying to reprogram the PIC by sending out possibly stray EOIs, etc. Of course, driver authors can do whatever they want, with the system taking the increased risk without others knowing. That's kind of why I'm arguing this point, because I don't want ELKS to become unstable because of improper driver design. Frankly, for ELKS, if a driver was submitted that reprogrammed the PIC, it would be rejected for stability reasons.

I would say the best way to think about what the split top/bottom half kernel design does very well is just what we're talking about - managing interrupt handlers. The TH always runs with interrupts enabled but with handler re-entry guaranteed not to happen since the hardware PIC is controlled by the kernel, and the BH also runs with interrupts enabled without the driver author ever having to worry about handler reentry, except at a lower priority. From a design perspective, there's no point (except introducing possible error cases) in a driver author worrying about or dealing with PIC or interrupt priorities since the mechanism handles it all for them. Note interrupts are always enabled during TH or BH, so we're not talking about disabling interrupts, but rather driver and/or kernel reentry issues.

AFAIK this applies to most, maybe all devices. Until the driver has acknowledged all pending interrupts, the device will not filed a new one.

If this is indeed true for all devices, then there's just no need to hack the PIC or disable a device from generating an interrupt in an interrupt handler - just reorganize the code into top and bottom halves and everything will work, along with all the other drivers in the system, without worry.

@Mellvik
Copy link
Owner Author

Mellvik commented Mar 18, 2026

This is not what happens. NIC interrupts are disabled only when the kernel writes directly to the NIC,

I see, got it. I was thinking that NIC interrupts were being disabled during the top half interrupt handler, which is quite different.

I don't understand why this would be a point. Interrupts are always disabled as the interrupt handler is entered. Typically, the controller, whether a NIC som something else, cannot field a new interrupt until all unmasked interrupt bits have been explicitly cleared by the driver.

If the transmit operation is buffered, then this case can be relaxed, as the app can always transfer the user data to an available buffer without interrupts disabled, and the next transmit interrupt will use that buffer. If no buffers are available, the the process can sleep until a buffer is available, then do the above. In both cases interrupts never need to be disabled and system throughput is higher due to the large possibility of not dropping packets while transferring ~1500 bytes to the NIC. That operation would happen in either the top or bottom half of the driver.

Yes, that's the point I was making in my latest post in the interrupt structure discussion issue #212 . However, it's not that simple. And let's keep the NIC as an example. If no transmission is ongoing, filling a buffer will not get the transmission started as there is no interrupt. IOW the transmit has to be at least initiated, possibly run to completion from the kernel. Given hos things work in the system, this is what happens most of the time even when output buffers are available. Similarly, on the read side, since the driver buffer capacity is typically (a lot) less than the NIC buffer size, there will some times be data available on the NIC but no interrupt to transfer more to the driver buffers. In both cases, the NIC must be prevented from interrupting while the transfer is going on. Which is now handled 'locally' without cli/sti.

First, yes - this may not be the best place to put the EOI and I'm interesting in better alternatives.

The better alternative is to let the low-level system IRQ handler send the proper EOI, after the C interrupt (top half) handler returns. This also guarantees no kernel stack overflow when the driver author goofs up. The top half should perform only what is absolutely necessary during interrupt time and let the rest execute in the bottom half. I would think that this would include all I/O being moved to the bottom half, but don't know for sure yet. And don't forget that by sending an "early EOI", the low level sends another one, which will screw things up big time if/when the system is in a nested interrupt state.

Yes, this is an important point - and one I haven't thought about simply because that's the way the serial interrupt handler does it. But - presumably - the serial interrupt handler is operating at a lower level which is why it does its own EOI?

Also, my understanding of the conclusions when you were experimenting with the TH/BH implementation was limited to the serial_BH handler which ended up in the timer. Now that I read the softirq code more carefully, I see how it works.

As to the NIC, and as commented in the #212 discussion, nothing really needs to be done in the top half, which is why the current setup works in the first place. The changes needed to 'switch' from TH to BH are minimal.

Based on this architecture the question above comes back: Can the bottom half handler be triggered in any other way than by an interrupt? Which - if yes - could be taken advantage of to kickstart an idle sender or receiver. [There are other ways to do this in the ne2k driver, like enabling RDMA Complete Interrupts, but it becomes complicated and resource consuming really fast.]

I'm wondering @ghaerr - you keep referring to 'hacking the PIC' and ranting about how bad that is. I still don't understand what this refers to. Unless sending an EOI like in serfast.S is ´hacking the PIC´, in which case we have vastly diverging interpretations of the word 'hacking'.

@Mellvik
Copy link
Owner Author

Mellvik commented Mar 18, 2026

FWIW, running the entire ne2k interrupt handler in BH works well.

/*      
 * Interrupt handler
 */     
        
static void ne2k_int(int irq, struct pt_regs *regs)
{               
        mark_bh(NETWORK_BH);
}
void ne2k_int_bh(void) 
{                       
        word_t stat, page;
                        
        //outb(0x20,0x20);      /* EOI to primary controller */
        //set_irq();    
        while (1) {     
...

ne2k_open():

                ne2k_start();
                //outb(0xff, net_port + EN0_ISR); /* EXPERIMENTAL; DELETE */
                init_bh(NETWORK_BH, ne2k_int_bh);
        }               
        return 0;       

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants