Cross mailed to every arch mailing list I could find, this patch hits
every arch. Please trim replies to the relevant mailing list.
Please cc: email@example.com on replies, I am not on every list.
At the moment (2.4.0-test5) all architectures have a common definition
for softirq_state, it occupies its own cache line with a lot of unused
padding. The definitions for local_irq_count and local_bh_count are
all over the place, some architectures use a cache aligned structure,
some use arrays of integers (lets you play cache ping/pong), some even
Each arch defines its own set of mapping macros to get to
local_irq_count and local_bh_count. To add insult to injury, not all
architectures use the mapping macros, there are bits of code scattered
around that use hard coded array lookups which are not optimized for
SMP vs non-SMP.
ftp://ftp.ocs.com.au/pub/2.4.0-test5-softirq-bh-merge.gz is a merge of
softirq, local_irq_count and local_bh_count into one cache aligned
structure. The aim was :-
* Put all these fields in a per-cpu cache line. They are hit by the
same code paths and are only ever updated from one cpu. Softirq was
already cache aligned but the other fields were not, in most cases we
were using multiple cache lines for this data. On most archs this
will save a cache line. In the worst case we use the same number of
lines but still get a code cleanup.
* Replace all the explicit references like __local_irq_count[cpu] with
wrapper macros. Some arch's had already done this but there was
quite a bit of crud left.
* Replace multiple arch definitions and export of irq_stat with a
common one, optimized for SMP or non-SMP. I know that gcc should do
this, but proving it will always optimize was too messy so I went for
the safe case and hand optimized the definitions. Besides, s390 is
completely different (the story of my life ;).
* Create a standard definition for the wrappers that can be used by
almost all architectures. Only s390 and sparc64 are special cases
but every arch had its own slightly incompatible definition.
* Add a per-cpu syscall_count, ready for performance reporting on soft
interrupts as well as hard interrupts. This field is not being set
at the moment but will be easy to update once the above patch is
I have updated all architectures, but have only been able to test this
patch on ix86 and IA64. Could the other arch maintainers try this
patch, it is almost guaranteed to contain some Assembler errors. If
your arch supports SMP, please try both UP and SMP.
I want to know if the patch works or not on each arch so I can send the
patch to Linus. So I would appreciate feedback on tests, whether
successful or not.
All architectures except s390 and sparc64 use the same definitions, see
include/linux/irq_cpustat.h. For all but s390, sparc64 and m68k,
please check the Assembler changes, the common code should be fine.
S390 No irq_stat structure, the equivalent fields are stored in
S390_lowcore. That moved cpu_data, ipl_device and the SMP info
area up by 16 bytes. There was no need for local_bh_count and
local_irq_count to be atomic. Please check the changes to
__LC_ in lowcore.h. Also the use of lowcore instead of
irq_stat[NR_CPUS] means that asm-s390/hardirq.h is quite
different from the other archs, it has its own unique
definitions for all the wrapper macros.
sparc64 For UP, it uses the common code, local_irq_count is an int in
irq_stat. For SMP, local_irq_count is a brlock in cpu_data.
asm-sparc64/hardirq.h defines a special SMP mapping for
m68k The change to arch/m68k/atari/ataints.c is a complete guess.
There were no examples of how to reference SYMBOL+8 from asm in