This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: [WIP] New dwarf2 reader - updated 07-02-2001
- To: Jim Blandy <jimb at zwingli dot cygnus dot com>
- Subject: Re: [WIP] New dwarf2 reader - updated 07-02-2001
- From: Daniel Berlin <dan at cgsoftware dot com>
- Date: Tue, 31 Jul 2001 16:45:36 -0400
- cc: gdb-patches at sources dot redhat dot com
- References: <npg0bc96ou.fsf@zwingli.cygnus.com>
--On Tuesday, July 31, 2001 3:36 PM -0500 Jim Blandy
<jimb@zwingli.cygnus.com> wrote:
>
> Daniel Berlin <dan@cgsoftware.com> writes:
>> Jim Blandy <jimb@zwingli.cygnus.com> writes:
>> > - MD5 generates a 128-bit checksum. Any reason you only use the first
>> > 32 bits in your hash? It looks like checksum_die and
>> > checksum_partial_die both cut it off at eight digits.
>> I use the first 64 bits, just like GCC does for checksumming DIE's and
>> duplicate elimination. Though i'll give you gcc also includes the
>> name attached to the die. I'll happily make the two algorithms
>> *exactly* the same if you like.
>
> Eight hex digits * 4 bits per hex digit is 32 bits. No?
>
> static inline char *
> checksum_die (struct die_info *die)
> {
> struct md5_ctx ctx;
> char *retval;
> int i;
> unsigned char checksum[16];
> md5_init_ctx (&ctx);
> process_full_die (die, &ctx);
> md5_finish_ctx (&ctx, checksum);
> retval = xcalloc (1, 9);
> for (i = 0; i < 4; ++i)
> sprintf (retval + (i * 2), "%.2x", checksum[i]);
> retval[8] = 0;
> return retval;
> }
>
> The "MD5 never clashes" thesis is okay by me. But to say that,
> therefore, the first 32 bits of MD5 never clash, is not okay.
>
> Using a function yielding n distinct hash values, the probability is
> 1/n that they clash, thus 1-1/n that they don't clash. For p
> independent pairs, the chances are (1-1/n)^p that none clash. For a
> set of d dies, there are d(d-1) pairs. So in our case, using 32 bits
> of MD5 for 10000 distinct dies, the chance of no clashes are
> (1-1/(2^32))^(10000(10000-1)), or 0.977. Thus a 2.3% chance of a
> clash. This sucks.
>
> Using 64 bits, as you intended, is much better, but why not use the
> full 128? Honestly, why play this kind of game at all?
Because it's faster than trying to keep all the die data.
> It's not that
> hard to do the job perfectly --- just use the significant die contents
> themselves as the tree key. No collisions, no hash function, simpler.
>
Um, think about how much memory that would use.
We'd have to keep copies of die contents around.
>> > - Why are you using a splay tree keyed by MD5 hash values, instead of
>> > a hash table? The only advantage of a splay over a hash table is
>> > ordered retrieval, but your keys don't have any meaningful order.
>> I hate the hash table interface for libiberty, it's annoying to use,
>> IMHO.
>> Other than that, no reason. I was meaning to make a nice generic
>> dictionary interface, and talked with DJ about whether i could just
>> use libdict, but since it's not GPL, and i don't have time to
>> duplicate it's nice interface right now, i just went with what was
>> easiest to program.
>
> Okay.
>
>> > - Assuming MD5 is perfect, you don't checksum all attribute forms
>> > (DW_FORM_flag, for example). This means that you can get false
>> > duplicates, if two dies differ only in (say) a flag's value. How is
>> > read_comp_unit_dies set up to tolerate that?
>> I only checksummed the stuff gcc generates (since i based the
>> checksumming code on gcc's), and I forgot that gcc
>> doesn't generate some forms.
>
> GCC does generate DW_FORM_flag:
>
> .byte 0x2 ; uleb128 0x2; (abbrev code)
> .byte 0x2e ; uleb128 0x2e; (TAG: DW_TAG_subprogram)
> .byte 0x1 ; DW_children_yes
> .byte 0x1 ; uleb128 0x1; (DW_AT_sibling)
> .byte 0x13 ; uleb128 0x13; (DW_FORM_ref4)
> .byte 0x3f ; uleb128 0x3f; (DW_AT_external)
> .byte 0xc ; uleb128 0xc; (DW_FORM_flag)
> .byte 0x3 ; uleb128 0x3; (DW_AT_name)
> .byte 0x8 ; uleb128 0x8; (DW_FORM_string)
> .byte 0x3a ; uleb128 0x3a; (DW_AT_decl_file)
> .byte 0xb ; uleb128 0xb; (DW_FORM_data1)
> .byte 0x3b ; uleb128 0x3b; (DW_AT_decl_line)
> .byte 0xb ; uleb128 0xb; (DW_FORM_data1)
> .byte 0x27 ; uleb128 0x27; (DW_AT_prototyped)
> .byte 0xc ; uleb128 0xc; (DW_FORM_flag)
> .byte 0x49 ; uleb128 0x49; (DW_AT_type)
> .byte 0x13 ; uleb128 0x13; (DW_FORM_ref4)
> .byte 0x11 ; uleb128 0x11; (DW_AT_low_pc)
> .byte 0x1 ; uleb128 0x1; (DW_FORM_addr)
> .byte 0x12 ; uleb128 0x12; (DW_AT_high_pc)
> .byte 0x1 ; uleb128 0x1; (DW_FORM_addr)
> .byte 0x40 ; uleb128 0x40; (DW_AT_frame_base)
> .byte 0xa ; uleb128 0xa; (DW_FORM_block1)
> .byte 0x0
> .byte 0x0
>
>> > - In read_comp_unit_dies, when you find a duplicate die, you skip to
>> > its sibling. What if the parent die is identical, but the children
>> > dies differ?
>>
>> I don't believe this is possible in any language.
>
> How about this:
>
> namespace X {
> namespace A {
> int m, n;
> }
> }
>
> namespace Y {
> namespace A {
> int o, p;
> }
> }
>
> The dies for the two 'A' namespaces have the name DW_AT_name
> attribute, but they're clearly different.
>
This won't do it, they'll have different decl line attributes.
And we only eliminate at the top level anyway.
namespace A
{
int o, p;
}
namespace A
{
int m, n;
}
won't screw us either, since the decl line number will still differ.
and, if you implemented it according to spec, one would be a different tag
than the other.
> Just in principle, this seems sloppy. Dies are just arbitrary tree
> nodes. There's no reason to assume that if two parent nodes are
> identical, we can skip the second one and all its children.
It's not sloppy.
Elimination at the top level is normal in other debuggers i know of that do
elimination.