This is the mail archive of the
mailing list for the binutils project.
Re: [Patch, avr] Relax LDS/STS to IN/OUT if symbol is in I/O address range
- From: Erik Christiansen <dvalin at internode dot on dot net>
- To: binutils at sourceware dot org
- Date: Sun, 12 Oct 2014 00:01:46 +1100
- Subject: Re: [Patch, avr] Relax LDS/STS to IN/OUT if symbol is in I/O address range
- Authentication-results: sourceware.org; auth=none
- References: <20140929104545 dot GA984 at atmel dot com> <CADOs=zaSvw-8Lpg5Te1v0YE19KBh6uN8udCpsNAWuvfpL7YHbA at mail dot gmail dot com> <20140930021023 dot GA14314 at atmel dot com> <5431C74C dot 5030500 at gjlay dot de> <20141006042945 dot GA1261 at atmel dot com> <20141009103449 dot GF3842 at atmel dot com> <CADOs=zbn053kb85j7GxSAVpbyGJcjRTwF5x70H0zhFrb=UrXhw at mail dot gmail dot com> <20141010043448 dot GA1213 at atmel dot com> <54390E30 dot 3060108 at gjlay dot de>
- Reply-to: dvalin at internode dot on dot net
On 11.10.14 13:02, Georg-Johann Lay wrote:
> I must admit that I don't like this kind of "optimization"...
Pushing that bias goes against specific behaviour and policy documented
in the source code. Please read in <avr/sfr_defs.h>:
"and GCC will do the right thing (use short I/O instructions if
I.e, the duality of IN/OUT and LDS/STS is recognised as fact, and
optimisation is mandated.
So either the LTO is OK, or policy and documentation must be changed.
But there are stronger reasons for proceeding with the optimisation.
> 1) There are applications that rely on exact instruction timing,
> e.g. USB-device in software. If just one instruction has a
> timing other than expected the driver won't work.
Yes, we have probably all written them, and relied on them - when they
are written in assembler. You might like to refamiliarise with:
$ man 3 assembler # or if not in manpath:
$ man -l /usr/share/doc/avr-libc/man/man3/assembler.3.gz
where its use for "Code for very time-critical applications." is
recommended, for obvious reasons. If foolhardy enough to attempt such a
thing in C, with the vagaries of successive versions of gcc between us
and the generated code, the rational programmer knows and understands
that --relax is designed and specified to change code timing. Resort to
a bit of in-line assembler is the only reliable and professional way to
do it in C.
The proposed relaxation is in reality no different from any other
> 2) OUT is not a shorter version of STS:
In addition to the avr-gcc source code, Atmel does not agree with you.
Please read the ATmega328p datasheet section "7.3 SRAM Data Memory",
where it explicitly points out that OUT works for lower I/O space, but
I/O must morph to STS in Extended I/O space. I.e. the datasheet confirms
that OUT _is_ a shorter version of STS, with addressing constraints,
much like rcall vs call.
> There was a silicon bug in some devices where OUT /STS behaved
> differently (or IN / LDS).
That is a non-sequiteur, which can not be presented as a rational
argument against a simple LTO. Given time to reconsider, I am confident
that you will not continue to insist that avr-gcc generate code
primarily to support faulty silicon - unless you can show where Atmel
continues to market that faulty silicon in preference to the working
> 3) The benefit is limited to very special cases of libraries that
> use I/O where the access instruction is not known at compile time.
Please think again - the case is in reality not special at all. A core
optimisation which falls to the linker is optimisation when any of many
unresolved externals fall into an address range which allows a shorter
jump, call, or memory access, than generated by the compiler to cover
worst case. The proposed LTO falls exactly into the latter case, and is
applicable in ALL unresolved externals in IO space.
[ some confusion about sbi elided ]
It is true that sbi also uses I/O addresses, but there are no
optimisations available, since there is no alternative addressing mode.
So sbi is not relevant to this discussion.
> Even if Load / Store can be done as IN / OUT in the end, it's not
> smart to write a library that way...
This unsubstantiated personal view has not been supported by any cogent
argument. It is in fact IN / OUT which _must_ be done as Load / Store in
Extended I/O Memory, just as a call is needed when a rcall won't reach.
Given a little more time to refer to Atmel datasheets, where the duality
is described, and think it over, I am confident that you will understand
that the shorter addressing mode reaches the same on-chip register, and
the optimisation is exactly analogous to dropping a long jump back to a
relative, once it is known that the address fits in the shorter
Let's try to be real here - the optimisation has been working for me for
14 years or more. It is time that avr-gcc fixed the missed optimisation
A real person has two reasons for doing anything ... a good reason and
the real reason.