On Tuesday, February 5, 2002, at 10:10 AM, Jay Carlson wrote:
(Quick background for the list: Because there's such a large code size
penalty to PIC/abicalls, I resurrected the bad old Linux/SVR3
statically linked, dynamically loaded libraries, which are linked at
absolute locations. Shane Nay took this from a cute demo to a working
distribution for the Agenda VR3; Brian Webb helped. Typical code
reduction is ~25-40%, eg 391k->272k.)
Oh yes, performance. Apps on the Agenda VR3 built in the snow ABI are
dramatically faster/more responsive. If you don't believe me, go search
the agenda-dev list and read the testimonials :-)
I don't fully understand why, though. Here are my speculations; bear in
mind that the VR3 and some of the other small boxes have 16-bit memory
interfaces with small i/d caches.
1) Better icache efficiency.
2) Fewer loads (and stalls) to get typical work done. In PIC, you need
a load per symbol reference, and that's every function call.
3) Better dcache efficiency. The GOT no longer needs to be hit for
those symbol references.
4) Reduced TLB usage. The GOT pages for each module are quite hot, so
now that we're no longer touching them, their 4k (ouch) TLB entries can
point somewhere more useful.
5) No symbol resolution at load time. For C++ apps, this can help
startup a lot. (prelinking fixes this too)
6) Better scheduling from gcc. egcs seemed to do a better job of
arranging loads ahead of use when building non-pic; on the TX39, this
helps even more due to non-blocking loads.