On Tue, Jul 30, 2002 at 08:59:17AM +0200, Carsten Langgaard wrote:
> We have been discussing this before, but I really don't like the idea of
> solving the hazard problem with a branch. The branch will on some CPUs
> (especially if they have a long pipeline) be a much bigger penalty than
> we actually wants to solve the hazard. On other CPU (with branch
> prediction) we may not even solve the hazard problem.
The branch - which is used by other OSes btw. - for the R4000 / R4400 where
this kind of taken branch implies a total delay of three cycles. One for
the branch delay slot plus two extra cycles for the killed instructions
following the branch delay slot. For R4600, R4700, R5000 and a bunch of
derivates I've verified that according to the documentation this extra
penalty of two cycles does not exist nor we need two extra cycles to handle
the hazard. In other words the branch trick - which also is used by
some other commercial OS btw. - is providing best possible performance on
a wide range of processors.
> The 'nop' I used is not the solution either, instead we should use
> 'ssnop' instructions, which will make sure we also solve the hazard
> problem on superscalar CPUs. We also need to have a hazard barrier in
> the code labeled "not_vmalloc".
Above trick was written with single issue CPUs in mind. I'd have to
verify the pipeline timing again against CPU manuals but off my memory
at least SB1 and R1x000 are fully protected against the hazards in