[Top] [All Lists]

Re: [PATCH 1/8] add lib/gcd.c

To: Alan Cox <>
Subject: Re: [PATCH 1/8] add lib/gcd.c
From: James Cloos <>
Date: Sat, 13 Jun 2009 15:54:38 -0400
Cc:, "Linux-MIPS" <>, Florian Fainelli <>, Andrew Morton <>, Takashi Iwai <>, Ralf Baechle <>
Copyright: Copyright 2009 James Cloos
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=eagle; t=1244922917; bh=kWA6FhUvXkiNBOy0eJ1vtkoExJjvHUCvkv41MaCmVWo=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=kTloQ8j6uipX0qTiuiK8inJGSOYHtM63aI5rX+BO4nv8v6+SkwhemdM1974LufNee zRHZOsLpJC3A6OI8F1LUtxSITHQkfY/l9WAZG3kFTDRwVDxEirQ35fCIdR56+Fri3m Z7u4PCGwQ3/Bos9nCyy+jFCwi8v4P0Z6IIcsyW6g=
In-reply-to: <> (James Cloos's message of "Sat, 13 Jun 2009 11:50:15 -0400")
Openpgp: ED7DAEA6; url=
Openpgp-fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Original-recipient: rfc822;
References: <> <> <> <>
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.92 (gnu/linux)
>>>>> "|" == James Cloos <> writes:
>>>>> "Alan" == Alan Cox <> writes:

|> Would the binary gcd algorithm not be a better fit for the kernel?

Alan> Could well be the shift based one is better for some processors only.

|> Very likely, I suspect.

|> In any case, I do not have the hardware to do any statistically
|> significant testing;

I take that back.  Just in case speed is a relevant issue, I ran a test
on my MX, which is a small xen domU running on a:
| EFamily: 0 EModel: 0 Family: 6 Model: 15 Stepping: 11
| CPU Model: Core 2 Quad 
| Processor name string: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
I got, compiling with gcc-4.4 -march=native -O3:

408.39user 0.05system 6:52.75elapsed 98%CPU

quick (the code in the kernel)
600.96user 0.16system 10:19.06elapsed 97%CPU

contfrac (the typical euclid algo)
569.19user 0.12system 9:35.50elapsed 98%CPU

extended euclid (calculates g=ia+jb=gcd(a,b))
684.53user 0.13system 11:32.77elapsed 98%CPU

I also tried on an old Alpha at freeshell; it had gcc-3.3; gcc's -S
output looks like it uses hardware div there, just like it does on
x86 and amd64.  The bgcd, though, was 10-16 times faster than either
version of euclid's algo.

On my laptop's P3M, binary gcd was about twice as fast as euclid.

So, although modern processors are *much* better at int div, the
binary gcd algo is still faster.

The timings on the alpha and the laptop were of:

    for (a=0xFFF; a > 0; a--)
        for (b=a; b > 0; b--)

For the core2 times quoted above, I started with a=0xFFFF.

And I forgot to mention:  the bgcd code I posted was based on
some old notes of mine which most likely trace to TAoCP.

James Cloos <>         OpenPGP: 1024D/ED7DAEA6

<Prev in Thread] Current Thread [Next in Thread>