This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Supporting quoted symbol/label names


On Tue, Aug 18, 2015 at 10:36 AM, Nick Clifton <nickc@redhat.com> wrote:
> Hi Guys,
>
>   I have been looking at PR 18581, which complains about the ARM
>   assembler not accepting function names containing a dash:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=18581
>
>   Initially I was going to reject the bug as invalid, since normally
>   symbol names do not contain dashes, but two things changed my mind -
>   the symbols were being provided inside double quotes, and technically
>   there is nothing in any file format standard to forbid such names.
>   (It also helped that this output was being produced by LLVM, so this
>   is not just a theoretical problem).
>
>   Although the PR talks about the ARM port, the problem is generic, and
>   in fact that filer even includes an x86 test case.  So I have been
>   looking at a solution.  It turns out however that there are some
>   pretty deep assumptions in GAS about how symbol names can be extracted
>   from the input stream, and changing them all can be quite daunting.
>   Not one to be stymied however I gave it a go and the result is the
>   attached patch.
>
>   The solution I have chosen is to modify the get_symbol_end() function
>   so that it will allow any text enclosed between double quotes.  This
>   is the function that is used in most places in GAS to read a symbol,
>   label or operand name, but it does have one drawback: the function is
>   expected to return the character that terminated the symbol name -
>   typically a colon, comma, newline or space.  But in the case of a
>   quoted string, this character will be a double quote.  Much of the
>   code that calls get_symbol_end is not expecting this, so I have had
>   to do a lot of hacking to handle this eventuality.
>
>   I have tested the patch on 127 different toolchains with no
>   regressions, so I think that it probably does work.  But what do
>   people think ?  Is supporting "double quoted" symbol names a good idea ?

>From an end user perspective I would have found this useful when I was
working on (and still would if I were to try again), the objective-c
mangling format...

The current mechanism for mangling in objective-c suffers from some
limitations, chiefly it cannot be demangled unambiguously
given a selector of:
    +[Class(Category) foo:bar:]
or -[Class(Category) foo:bar:]
it turns the first plus and minus character into _c_ and _i_ respectively,
followed by Class_Category_
so _i_Class_Category_
where we run into problems is: it then turns the colon's into _'s
so we end up with _i_Class_Category_foo_bar_

which works fine until you get methods like +[Class _foo:bar:] which
contain underscores.

being able to quote them and use the full +[](): characters in the
symbol would be the ideal way to do this.  And I believe this is what
apple does in their symbol naming, so it wouldn't surprise me if this
was the reason behind this llvm behaviour.

there are some characters gas accepts we can use $ among them, but i
wasn't exactly convinced it was worth the effort of attempting to
switch mangling formats and introducing a 3rd mangling format besides
the gnu and the apple ones.

I believe gdb already supports the apple mangling format along side
the gnu one, even if it is untriggered on gnu targets

Thanks


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]