Mike Shaver said:
> because heavy ethernet traffic occasionally generates bus errors that
> lock up the box. I'm going to take a look at what causes those
> tomorrow, hopefully.
These finally annoyed me to the point I started looking at it today. I
was able to hack the buserr irq handler to set up a gdb frame so that
gdb gets control at the instruction that was interrupted. The problem
appears to be in sgiseeq.c which is no great surprise since it occurs
during times of heavy network traffic. The bus error irqs always occur
when interrupts are reenabled in the ret_from_sys_call after a sgiseeq
irq. The hpc_ethregs tx_ctrl value is 0x1 indicating that transmit was
inactive, but there was an underflow. The tx_ndptr value is 0xffffffff.
The latter I think leads to the bus error. Look at kick_tx() being
called from sgiseeq_tx() during the handling of the interrupt. With
that value of tx_ndptr, kick_tx would end up writing to 0xbffffff0
which is not we want to do. I don't have the HPC docs, so I'm probably
not going to be able to come up with a proper fix...