This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC][BZ #17943] Use long for int_fast8_t
- From: Rich Felker <dalias at libc dot org>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 11 Feb 2015 08:59:37 -0500
- Subject: Re: [RFC][BZ #17943] Use long for int_fast8_t
- Authentication-results: sourceware.org; auth=none
- References: <20150208110426 dot GA28729 at domone> <20150209181324 dot GE23507 at brightrain dot aerifal dot cx> <20150211133409 dot GA24480 at domone>
On Wed, Feb 11, 2015 at 02:34:09PM +0100, OndÅej BÃlka wrote:
> On Mon, Feb 09, 2015 at 01:13:24PM -0500, Rich Felker wrote:
> > On Sun, Feb 08, 2015 at 12:04:26PM +0100, OndÅej BÃlka wrote:
> > > Hi, as in bugzilla entry what is rationale of using char as int_fast8_t?
> > >
> > > It is definitely slower with division, following code is 25% slower on
> > > haswell with char than when you use long.
> >
> > This claim is nonsense. It's a compiler bug. If the 8-bit divide
> > instruction is slow, then the compiler should use 32-bit or 64-bit
> > divide instructions to divide 8-bit types. (Note: there's actually no
> > such thing as a division of 8-byte types; formally, they're promoted
> > to int, so it's the compiler being stupid if it generates a slow 8-bit
> > divide instruction for operands that are formally int!) There's no
> > reason to use a different type for the _storage_.
> >
> That is also nonsense, you cannot get same speed as 32bit instruction
> without having 8bit instruction with same performance.
>
> Compiler must add extra truncation instructions to get correct result
> which slows it down, otherwise it gets wrong result for cases like (128+128)%3
The additional need to truncate can arise when using 32-bit types on a
64-bit arch, but normally you defer this truncation for until storage
(in which case it's usually automatic), promotion, or division (in
which case it's hopefully dominated by the time to divide). If your
concern is the need to truncate before division/remainder operations,
I think that cost is dwarfed by the added cache/memory pressure of
wasting half your space.
Rich