This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC Power PC G3 optimized sqrtf function.


libc-alpha-owner@sourceware.org wrote on 12/14/2006 01:05:57 PM:

> Hi everybody,
>
> This is my 1st post and attempt at contributing to glibc
>
Thanks Conn. To get started, submittions to libc are normally in the form
of a patch with a changelog header. Please review
http://www.gnu.org/prep/standards/standards.html section 6.8.

>   I have written a sqrtf function that is much faster on a PowerPC G3
than
> the original one used. It uses the frsqrte instruction and Newton
> Goldschmidt iterations to get a result. I have had testing done on G3 and
G4
> processors and it has results that conform to ieee . Testing on G5's has
> shown it gets within at least 2 bits of the correct answer. This
shouldn't
> matter because the G5 has a hardware sqrtf instruction. It may work on
603,
> 603e, 604, and 604e processors as well but I have not tested on them. It
> will not work on a  601 processor.
>
Next, I assume you intend to add this to the powerpc-cpu add-on using
--with-cpu=g3 configuration?

In this case we need to place your e_sqrtf.S file an appropriate directory
so that it does not impact PowerPCs that do have fsqrt. For example:

./powerpc-cpu/sysdeps/powerpc/powerpc32/g3/fpu/e_sqrtf.S

You will also need an Implies file in the sysdeps/unix/sysv/linux tree to
make sure your new directory is early enough in the search order to
override the e_sqrtf in libc trunc.

For example:

./powerpc-cpu/sysdeps/unix/sysv/linux/powerpc/powerpc32/g3/fpu/Implies

would contain:

powerpc/powerpc32/g3/fpu

If you want g4 to default to the g3 implementation, create
powerpc/powerpc32/g4/fpu directories with Implies files referencing the
powerpc/powerpc32/g3/fpu directories. Similarly for 603, 604, ...

See the powerpc-cpu README for more details.

You patch should reflect this directory detail.

>   The limiting factor on ieee conformance is the frsqrte instruction must

> produce a result that is within 1/59th of the correct value. A timing
test
> on all valid values using the current glibc function takes about 26
minutes
> on a iMac g3 400MHz machine. With my implementation it takes about 21
> minutes.
>

Not sure what you are getting at here. The PowerPC Arch 2.0x (V1.x also)
states that frsqrte is "correct to one part in 32". Does you algorithm
require better precision then the Arch provides? The Arch does say that
results may vary between implementations. So does G3/G4 frsqrte provide
better then 1/32 precision?

> Please read the header for more details and give me some feedback.
>
> P.S. Do I need to file copyright assignment papers for this?
>
Yes you do.

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]