This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BUG?] Re: [patch]: Fix auto-import for pe targets


2009/1/4 Kai Tietz <ktietz70@googlemail.com>:
> Hi Dave,
>
> 2009/1/3 Dave Korn <dave.korn.cygwin@googlemail.com>:
>> [ In case this post fails to pick up the correct References: header, the
>> original thread can be seen in the the mailing list archive at
>> http://sourceware.org/ml/binutils/2008-09/threads.html#00204 ]
>>
>> Kai Tietz wrote:
>>> Hello,
>>>
>>> this patch fixes the broken auto-import feature for pe(and pe+) targets.
>>>
>>> ChangeLog
>>>
>>> 2008-09-26  Kai Tietz  <kai.tietz@onevision >
>>>
>>>         * ld/pe-dll.c (make_import_fixup_entry): Make sure reloc is
>>> addend.
>>
>>    Hello Kai (and happy new year),
>>
>>  Funnily enough, in my usage, this patch breaks the fixed auto-import
>> feature for PE targets!  You didn't give any details in your post, so let me
>> explain the problem I've been having (in depth and at length, since there may
>> be bystanders who aren't familiar with how PE DLL auto-import works):
>>
>>  Here's my simple testcase:
>>
>> -----------------------------------<cout.C>-----------------------------------
>> #include <iostream>
>>
>> main() {
>> std::cout << "Invalid vector range " << 3
>>  << " caught in f()" << std::endl;
>>  return 0;
>> }
>> -----------------------------------<cout.C>-----------------------------------
>>
>> and it compiles to this:
>>
>> -----------------------------------<cout.s>-----------------------------------
>>        .text
>> .globl _main
>>        .def    _main;  .scl    2;      .type   32;     .endef
>> _main:
>> LFB923:
>>        .loc 1 4 0
>>        leal    4(%esp), %ecx
>> LCFI9:
>>        andl    $-16, %esp
>>        pushl   -4(%ecx)
>> LCFI10:
>>        pushl   %ebp
>> LCFI11:
>>        movl    %esp, %ebp
>> LCFI12:
>>        pushl   %ecx
>> LCFI13:
>>        subl    $20, %esp
>> LCFI14:
>>        .loc 1 4 0
>>        call    ___main
>>        .loc 1 6 0
>>        movl    $LC0, 4(%esp)
>>        movl    $__ZSt4cout, (%esp)
>> -----------------------------------<cout.s>-----------------------------------
>>
>>  Now, that reference to __ZSt4cout works fine when you're linking against
>> static libstdc++, but I've got a DLL.  So, this is a data import, and it needs
>> a fixup: we need to point an import table (of size 1) at the address field of
>> the movl opcode, so that the run-time loader points it at the data export in
>> the DLL.
>>
>>  However at runtime we can see this hasn't happened:
>>
>> -----------------------------------<gdb.txt>----------------------------------
>> (gdb) disass
>> Dump of assembler code for function main:
>> 0x00401176 <main+0>:    lea    0x4(%esp),%ecx
>> 0x0040117a <main+4>:    and    $0xfffffff0,%esp
>> 0x0040117d <main+7>:    pushl  -0x4(%ecx)
>> 0x00401180 <main+10>:   push   %ebp
>> 0x00401181 <main+11>:   mov    %esp,%ebp
>> 0x00401183 <main+13>:   push   %ecx
>> 0x00401184 <main+14>:   sub    $0x14,%esp
>> 0x00401187 <main+17>:   call   0x401288 <__main>
>> 0x0040118c <main+22>:   movl   $0x40204c,0x4(%esp)
>> 0x00401194 <main+30>:   movl   $0x40514c,(%esp)
>> 0x0040119b <_fu0___ZSt4cout+4>: call   0x4011f0 <_ZStlsISt11char_traitsIcEERSt13
>> basic_ostreamIcT_ES5_PKc>
>> -----------------------------------<gdb.txt>----------------------------------
>>
>>  The movl above should in fact have been relocated directly to the address
>> of std::cout in the cygstdc++ dll.  Instead, it has been statically linked to
>> the __imp_ entry for std::cout.
>>
>> -----------------------------------<gdb.txt>----------------------------------
>> (gdb) x/1xw 0x0040514c
>> 0x40514c <_imp___ZSt4cout>:     0x6c4caf00
>> (gdb) print &std::cout
>> $9 = (ostream *) 0x6c4caf00
>> (gdb)
>> -----------------------------------<gdb.txt>----------------------------------
>>
>>  So, the symbol _fu0___ZSt4cout is pointing at the address field of the
>>
>>        movl    $__ZSt4cout, (%esp)
>>
>> instruction, as it should be.  But somehow it ends up at runtime pointing to
>>
>> (gdb) x/1xw 0x0040514c
>> 0x40514c <_imp___ZSt4cout>:     0x6c4caf00
>>
>> the import stub.  It should have been handled as an import and pointed
>> directly to the symbol imported from the DLL, no?  Why this doesn't happen?
>>
>>  Well, it looks like the import table generated for the fixup is broken:
>> it has zero entries.  Here's what "dumpbin /imports" (from MSVC) shows
>>
>> -----------------------------------<imports>----------------------------------
>>
>> /win/i/FSF-Gcc/release/gcc4-4.3.2-2/build $ dumpbin /imports cout.exe
>> Microsoft (R) COFF Binary File Dumper Version 6.00.8168
>> Copyright (C) Microsoft Corp 1992-1998. All rights reserved.
>>
>>
>> Dump of file cout.exe
>>
>> File Type: EXECUTABLE IMAGE
>>
>>  Section contains the following imports:
>>
>> [ .. snip healthy import dir entries for regular _imp_ imports .. ]
>>
>>    cygstdc++-6.dll
>>                40119B Import Address Table
>>                4050EC Import Name Table
>>                     0 time date stamp
>>                     0 Index of first forwarder reference
>>
>> -----------------------------------<imports>----------------------------------
>>
>>  Looking at that last entry there, it has two problems: the address of the
>> IAT (the address where the imported addresses are going to be stored at, which
>> should be the address of the address operand field of the movl opcode shown
>> back above) is off by four, and there are no import entries.  (This
>> arrangement even crashes depends.exe!)  You have to look at the raw hex dumps
>> to see what's really going on.  Here's the fixup itself:
>>
>> ---------------------------------<raw binary>---------------------------------
>> Microsoft (R) COFF Binary File Dumper Version 6.00.8168
>> Copyright (C) Microsoft Corp 1992-1998. All rights reserved.
>>
>>
>> Dump of file cout.exe
>>
>> PE signature found
>>
>> File Type: EXECUTABLE IMAGE
>>
>>  [ ... snip! ... ]
>>
>>            5000 [     3B8] RVA [size] of Import Directory
>>
>>  [ ... snip! ... ]
>>
>> SECTION HEADER #5
>>  .idata name
>>     3B8 virtual size
>>    5000 virtual address
>>     400 size of raw data
>>     E00 file pointer to raw data
>>       0 file pointer to relocation table
>>       0 file pointer to line numbers
>>       0 number of relocations
>>       0 number of line numbers
>> C0300040 flags
>>         Initialized Data
>>         RESERVED - UNKNOWN
>>         RESERVED - UNKNOWN
>>         Read Write
>>
>> RAW DATA #5
>>  [ ... snip! ... ]
>>
>>  00405050: EC 50 00 00 00 00 00 00 00 00 00 00 A8 53 00 00  ìP..........¨S..
>>  00405060: 9B 11 00 00                                      ›...
>> ---------------------------------<raw binary>---------------------------------
>>
>>  This is an IMAGE_IMPORT_DESCRIPTOR struct, and what we're seeing is an RVA
>> of 0x50EC for the OriginalFirstThunk field, the list of symbols to import;  an
>> RVA of 0x53A8 for the Name field, and an RVA of 0x119B for the FirstThunk
>> field, AKA the IAT, the memory area where the imported symbol address(es)
>> is/are going to get stored to.  The Name field holds the RVA of the DLL name
>> to import from: the RVA 0x53A8 corresponds to the actual address 0x4053A8:
>>
>> ---------------------------------<raw binary>---------------------------------
>>  004053A0:                         63 79 67 73 74 64 63 2B          cygstdc+
>>  004053B0: 2B 2D 36 2E 64 6C 6C 00                          +-6.dll.
>> ---------------------------------<raw binary>---------------------------------
>>
>>  The RVAs 0x50EC and 0x119B correspond to the addresses shown in the
>> "dumpbin /imports" output as "Import Name Table" (0x4050EC) and "Import
>> Address Table" (0x40119B) and as mentioned before, the IAT pointer is off by
>> four from the actual address (0x401197) of the symbol _fu0___ZSt4cout.  Now we
>> look at the Import Name Table:
>>
>> ---------------------------------<raw binary>---------------------------------
>>  004050E0: 14 53 00 00 00 00 00 00 8C 52 00 00 00 00 00 00  .S......ŒR......
>>  004050F0: 00 00 00 00 60 51 00 00 6C 51 00 00 7C 51 00 00  ....`Q..lQ..|Q..
>> ---------------------------------<raw binary>---------------------------------
>>
>>  What we expect to see is a word at 0x4050EC pointing to the import name,
>> followed immediately by a zero sentinel; instead there's just a zero
>> immediately.  But look at the word immediately before (0x4050E8): that's a
>> plausible RVA in the .idata section, and sure enough at that address we find
>> the missing import (ordinal and name):
>>
>> ---------------------------------<raw binary>---------------------------------
>>  00405280:                                     78 07 5F 5A              x._Z
>>  00405290: 53 74 34 63 6F 75 74 00                          St4cout.
>> ---------------------------------<raw binary>---------------------------------
>>
>>  In other words, the 0x50EC is also off by four.  I stepped through the code
>> to see how it was getting this way, and it's caused by that line your patch
>> added in pe-dll.c/make_import_fixup_entry().  That function generates a fixup
>> that looks like this:
>>
>> ----------------------------------<pe-dll.c>----------------------------------
>>        .section        .idata$2
>>        ; OriginalFirstThunk
>>        .rva            __nm_thnk_SYM (singleton thunk with name of func)
>>        ; TimeDateStamp
>>        .long           0
>>        ; ForwarderChain
>>        .long           0
>>        ; Name
>>        .rva            __my_dll_iname (name of dll)
>>        ; FirstThunk
>>        .rva            __fuNN_SYM (pointer to reference (address) in text)
>> ----------------------------------<pe-dll.c>----------------------------------
>>
>> and it generates the three RVAs by pointing relocs of type BFD_RELOC_RVA at
>> the offsets 0, 12 and 16 in the fixup entry.  Your patch added the line
>>
>> ----------------------------------<pe-dll.c>----------------------------------
>>  d2[0] = d2[16] = PE_IDATA5_SIZE; /* Reloc addend.  */
>> ----------------------------------<pe-dll.c>----------------------------------
>>
>> just after these relocs are generated, and the results is that the first and
>> last of these RVAs are generated with an addend of four relative to the
>> symbols they point at, _nm_thnk_XXXXX (the lost-and-found name table shown
>> above) and __fuNN_SYM (the destination in the program text to write the
>> imported address to, the single-entry IAT) - resulting in the broken import
>> table I describe in the hex dumps above.
>>
>>  Sure enough, I have tested a version of ld with this line disabled, and it
>> works (for me) in the way I would expect, dumpbin shows a single import in the
>> fixup image import descriptor, depends.exe doesn't crash, at runtime the
>> program code shows it's been correctly relocated to point directly into the DLL:
>>
>> -----------------------------------<gdb.txt>----------------------------------
>> Breakpoint 1, main () at cout.C:4
>> 4       main() {
>> (gdb) disass
>> Dump of assembler code for function main:
>> 0x00401176 <main+0>:    lea    0x4(%esp),%ecx
>> 0x0040117a <main+4>:    and    $0xfffffff0,%esp
>> 0x0040117d <main+7>:    pushl  -0x4(%ecx)
>> 0x00401180 <main+10>:   push   %ebp
>> 0x00401181 <main+11>:   mov    %esp,%ebp
>> 0x00401183 <main+13>:   push   %ecx
>> 0x00401184 <main+14>:   sub    $0x14,%esp
>> 0x00401187 <main+17>:   call   0x401288 <__main>
>> 0x0040118c <main+22>:   movl   $0x40204c,0x4(%esp)
>> 0x00401194 <main+30>:   movl   $0x6c4caf00,(%esp)
>> 0x0040119b <_fu0___ZSt4cout+4>: call   0x4011f0 <_ZStlsISt11char_traitsIcEERSt13
>> basic_ostreamIcT_ES5_PKc>
>> -----------------------------------<gdb.txt>----------------------------------
>>
>> and above all else the program works!  (Pretending to libstdc++ that some
>> random address in your import section is a fully constructed ostream object
>> turns out to be a no-no, <g>!)
>>
>>  I wonder if we are having a 32-vs-64-bit clash here, because I know you
>> mostly work on win64?  I'm sure I don't understand your patch.  At first I
>> thought maybe it was something to do with the difference between
>> IMAGE_THUNK_DATA32 and IMAGE_THUNK_DATA64 and you were trying to offset a
>> 32-bit reloc so it fell into the low part of a 64-bit field, and then I
>> started thinking about endianness, and then I got confused and decided to ask
>> you about it instead.  But I'm pretty sure that the right thing for
>> i686-pc-cygwin is not to have those offsets there; what target were you
>> working on at the time?
>>
>>    cheers,
>>      DaveK
>>

Sorry, it was a bit late yesterday, when I answered your e-mail. I try
to express myself a bit more understandable.

Yes, I think this was a thinko related to the issue that in pe-dll.c
(make_one ...) the  import sections .idata$4 and .idata$5 are prefixed
by a zero element. But for auto-imports there are no prefix for
idata$4 and idata$5 generated.
The issue I saw here before was that for existing import tables the
initial zero element let point the fixup to the wrong location. This
was the reasone to add wrongly the offset to the auto-fixup import
directory.
I sent some days ago a patch to remove these leading zero elements to
idata$4 and idata$5. This should fix the auto-import for 32-bit, too.
The issue by those leading zero elements is, that IAT and import hash
table were build in a none coff spec conformance way.
The problem is, that for the first import directory everything is fine
(beside that the size of the idata$5 section isn't IAT size), For more
then one import library, we get by this old behavior between each
"import hash table" (idata$4) two zero element, like for the import
address table (IAT). This isn't as spec tells (I refer here to coff
spec 8.1).

Could you give the idata$4/idata$5 patch of mine a chance to verify,
that your problem is solved by it, too?

Cheers,
Kai

-- 
|  (\_/) This is Bunny. Copy and paste
| (='.'=) Bunny into your signature to help
| (")_(") him gain world domination


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]