This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: struct environment

From: David Carlton <carlton at math dot stanford dot edu>
To: Daniel Jacobowitz <drow at mvista dot com>
Cc: gdb <gdb at sources dot redhat dot com>
Date: 06 Sep 2002 10:20:38 -0700
Subject: Re: struct environment
References: <ro1lm6gcg9r.fsf@jackfruit.Stanford.EDU><20020906150620.GA19200@nevyn.them.org>
Daniel Jacobowitz <drow@mvista.com> writes:
> On Thu, Sep 05, 2002 at 01:50:40PM -0700, David Carlton wrote:

>> So I want to add a type 'struct environment' to GDB.  It will be
>> completely opaque, with a variety of internal implementations
>> available (similar to the way that 'struct block' allows you to use
>> either a hash table or a simple list of symbols) sharing a common
>> external interface.  I want to start by converting 'struct block' over
>> to using this, then converting the global environment over to using
>> this, then (probably) 'struct type', and finally getting namespaces to
>> work well.

> Why do you want to have multiple available implementations?  I think
> the overhead for always hashing is small enough.  There are only two
> reasons struct block allows hash tables or linked lists:
>   - overloading of the meaning of that list to represent a list of
>     function arguments, which is ordered
>   - warts in mdebugread that I was not patient enough to overcome when
>     I finally merged in hash table support

> I suppose the first reason is a legitimate one for multiple
> implementations; we could mark an environment as 'ordered'.  Or we
> could stop overloading the meaning of the list that way.  I don't know
> which is better.

The global block in jv-lang.c may also pose similar problems to those
in mdebugread.c, for what that's worth.  (I'm not sure: it might be
cleaner than mdebugread.c.)  But you've basically put your finger on
it: some blocks currently have a reason to be ordered, and some pieces
of code try to construct the environments on the fly, growing them as
necessary.  Which, incidentally, I don't think we should discourage:
instead of requiring code to calculate the size of environments before
building them, we might as well support environments that can be built
incrementally and then finalized.  That way the common bookkeeping
code gets moved into the environment stuff, so people don't have to
keep on redoing it.

And there's another, even more important reason: the global
environment (and, in the future, namespaces).  The global environment
spans multiple files, uses lazy evaluation (i.e. partial_symbols), and
there's even minimal_symbols to shoehorn in there somehow.  So it's
going to need its own special implementation (which will be much more
complicated than the implementations for blocks).

At some point in the future, it might be possible to use a single
implementation for everything; it would have to support:

* Growing as needed
* Fast lookups
* Being able to retrieve all entries in the order in which they were
  added
* Whatever extra cruft is necessary to support environments like the
  global environment.

This is certainly possible (heaps solve the first two problems, for
example), but I think moving over to a single implementation can and
should be postponed until much later in the game.  It'll be easy
enough to do, if desirable, once existing code gets converted over to
using environments.

>> * What constructors/destructors should there be?  buildsym.c will need
>> constructors that create environments of a fixed size on an obstack
>> that are implemented using either a hashtable or a list.  jv-lang.c
>> and mdebugread.c both need constructors for environments allocated
>> using xmalloc whose final size isn't known initially; these will
>> need destructors, and perhaps some sort of 'finalize' function to
>> call once we know everything's been added to the environments in
>> question.  (I'll look into those situations more closely soon.)

> More preferably, just a finalize method that copies them onto the
> obstack.

Good idea.

> I'm not sure if pointers to symbol entries are saved in the process
> of constructing the list, which would make that hard; hopefully not.

> There is no reason to be concerned about performance of the finalize
> method in this case.  If someone is concerned they can fix the way
> mdebugread generates symbol lists so that it uses the same hash tables
> everyone else does.

>> * There needs to be a way to add symbols to an environment while
>> constructing it.  Maybe
>> 
>> void env_add_symbol(struct environment *, struct symbol *);

> Does there?  Right now most lists are constructed first into a pending
> list, and only then hashed into an environment.

I was thinking of env_add_symbol as the way to hash the symbols into
the environment.  I'll look at how the pending lists work; if all the
code uses the same data structure, then of course there should be a
constructor that takes the entire pending list rather than requiring
each symbol to be added individually.

> I'd rather not have to use growable hash tables.

Indeed.  On the other hand, as I mentioned above, it seems plausible
to me that the responsibility for managing the lists of pending
symbols should, in the future, be tranferred to environments rather
than the code that builds the blocks.  So it seems plausible to me to
have a linked list implementation of environments that is great for
adding new symbols and lousy for lookups, and then have finalize
functions that convert linked list implementations to either hash
tables or arrays (depending on whether or not you needed ordering),
which could no longer grow.

>> * There should be a shallow lookup function, i.e. one that looks up a
>> symbol just in one environment, without going up to a parent
>> environment.  (The latter will get added in once we get to
>> converting over the global environment.)  Maybe
>> 
>> struct symbol *env_lookup_shallow(struct environment *,
>> const char *);

> You may need more information than this.  Sometimes
> lookup_block_symbol needs to look up the symbol (i.e. demangled)
> which has a particular mangled name - this is a wart and not one I'm
> fond of but it isn't quite ready to die yet.

I think that I'm tentatively planning to defer issues like mangled
vs. demangled names until I convert over the global environment: that
will be the time to think about exactly what sorts of deep lookup
functions will be necessary, for example.  Having said that, I just
looked at lookup_block_symbol more closely, and I'm not sure that my
planned iterators would be quite enough to handle it.  (But I might be
able to delete the relevant code, see below.)

>> * There should probably be iterators that allow one to examine every
>> symbol in an environment.  One possibility, to save on memory
>> management headaches, would be to allow only one active iterator per
>> environment, and have that stored as part of the environment; so we
>> would have
>> 
>> struct symbol *env_lookup_first(struct environment *);
>> struct symbol *env_lookup_next(struct environment *);
>> 
>> (Where these return NULL if we've seen all the symbols in the
>> environment.)  I'll have to double-check to make sure that existing
>> use of iteration over symbols blocks is compatible with this, but
>> I'm almost positive it is.

> In most cases yes.  See the ALL_BLOCK_SYMBOLS macro.

Right, that's what I was thinking of: this could replace that easily
enough as long as there weren't two copies of ALL_BLOCK_SYMBOLS
running on the same block simultaneously, which seems plausible to
me.

> There are a couple of places where I could not use it for some
> reason, though.

>> * It's vaguely possible that some code might need to count the number
>> of symbols in an environment, but I don't think so; presumably
>> that's only used by iterators or by code that could be moved to the
>> inside of struct environment.
>> 
>> * When scanning GDB for uses of 'struct block', I noticed that some
>> code wants to sort symbols in the block.  I'll have to look at where
>> and why that's happening, but I don't think it's anything serious.
>> (When it's necessary, it can presumably be added to a 'finalize'
>> function as in the discussion of constructors above.)

> It's irrelevant - it was only for speeding up searching.  Note that
> hashed blocks are never sorted and blocks with a function are never
> sorted.  mdebugread-style blocks are simply legacy.

Great.  That probably means that I can just skip the sorting and
delete code in functions like lookup_block_symbols that handles sorted
linear blocks as a special case.

David Carlton
carlton@math.stanford.edu
Follow-Ups:
- Re: struct environment
  - From: Daniel Jacobowitz
References:
- struct environment
  - From: David Carlton
- Re: struct environment
  - From: Daniel Jacobowitz
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]