On Tue, 3 Feb 2015, Daniel Sanders wrote:
> From: Toma Tabacu <firstname.lastname@example.org>
> Change the type of csum_ipv6_magic's 'proto' argument from unsigned
> short to __u32.
> This fixes a type mismatch between the 'htonl(proto)' inline asm
> input, which is __u32, and the 'proto' output, which is unsigned
> This is the error message reported by clang:
> arch/mips/include/asm/checksum.h:285:27: error: unsupported inline asm: input
> with type '__be32' (aka 'unsigned int') matching output with type 'unsigned
> "0" (htonl(len)), "1" (htonl(proto)), "r" (sum));
> The changed code can be compiled successfully by both gcc and clang.
This definitely looks like a bug in clang to me. What this construct
means is both input #5 and output #1 live in the same register, and that
an `__u32' value is taken on input (from the result of the `htonl(proto)'
calculation) and an `unsigned short' value produced in the same register
on output, that'll be the value of the `proto' variable from there on. A
perfectly valid arrangement. This would be the right arrangement to use
with the MIPS16 SEH instruction for example. Has this bug been reported
to clang maintainers?
And I'd prefer to leave the declaration of `proto' alone as IPv6 network
protocol numbers are 16-bit quantities.
That said this code is indeed weird if not wrong, which is probably why
this arrangement resulted, in an attempt to prevent GCC from messing up
the registers used.
First and foremost both outputs, and especially #1, lack an earlyclobber.
This I imagine may have prompted GCC to overwrite one of the inputs, which
in turn is why whoever poked at this code decided to alias input #5 to
output #1. But as you can see in the asm there's no real aliasing between
input #5 and output #1. Input #5 is consumed early on (and even referred
to with `%5' rather than `%1', which would be the norm in the case of
actual aliasing), and the containing register reused for something else.
So the two operands can be separated. This is unlike input #4 vs output
#0, that is both read and written right away (and just as one'd expect
there's no reference to `%4' anywhere).
Output #0 can do without an earlyclobber as it is aliased to input #4 and
therefore cannot be assigned by GCC to another input. But it won't hurt
to have one too and it will set a good practice and serve a documentation
I suggest a fix like this then:
static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
const struct in6_addr *daddr,
__u32 len, unsigned short proto,
: "=&r" (sum), "=&r" (tmp)
: "r" (saddr), "r" (daddr),
"0" (htonl(len)), "r" (htonl(proto)), "r" (sum));
Try and see if it works for you.
I wonder why this is an asm in the first place though. There's no rocket
science here that GCC couldn't handle. I guess it must have been very bad
at optimising a C equivalent then.