> regarding the hardware implementation of a 4KE (r4k style mmu
> if I remember) I'm wondering about the performance difference
> when the TLB has 16 pairs of entries (covering 128KBytes of
> data) or 32 pairs (covering 256KBytes).
> Does someone have a useful advise regarding the `nice spot'
> for TLB size?
As you expected, there is no really simple answer. The TLB is a
relatively large piece of logic, so it often isn't a trivial decision.
Applications - particularly embedded applications, which I suspect is
what you mean - vary a lot in the size of the mapped, user-space
working set. Some Linux-powered embedded devices do nearly all their
work in the kernel...
However, the measurements we've done at MIPS suggest that for
moderate-size workloads where the user-space programs are working
hard, a 16-entry TLB can thrash quite badly, making a significant dent
So the advice I'd give is that if:
1. Your application has a non-trivial user space of any size;
2. The performance of userland code is significant;
then you should pick a 32-entry TLB, until and unless you have
measurements of your own application to show you don't need it.