This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Outline for Guile Generational GC


Dale Jordan <dalej@alum.mit.edu> writes:

> I read your most excellent notes on Guile's generational gc.  I
> started thinking about all the issues you raised and came up an
> outline for a possible implementation.  I am not very familiar with
> most of Guile so there may be some gaping holes in what follows.

That's ok, there were (and most certainly still are) some gaping holes
in the notes, as well.  The only way to really find them is to try it
and see what breaks (which is probably cheaper than memorizing all of
the guile code, especially the fearsome eval).

> Guile's non-copying gc mode prevents the traditional technique of
> associating a generation with a heap segment.  Hence, generational
> information must be recorded somewhere.  Adding it to the cell doesn't
> seem feasible unless the tag structure is radically changed; even then
> few bits would be available.  An obvious thought is to use a vector of
> bytes (one for each cell) in parallel with each heap segment; subtracting
> the base of the segment from a pointer and shifting right by log2 of
> cell size gives an index into the vector.  However, finding the
> segment base for an arbitrary pointer also means doing a lookup in a
> segment address table which is somewhat clumsy, given that this must
> be done for every pointer encountered in the mark phase.

Yep, this is big slap against it. It also suffers with objects outside
the heap (I'm not entirely sure how all of them get there, but I think
they're out there, and I've got the segfaults to back me up, though
they could have been from other bugs). This is one of the annoyances
of guile, in that it isn't as simple as a vanilla scheme interpreter
(the complexities also make guile more pleasant to target with
extension code, but that doesn't decrease the occasionally large
annoyance factor).

> I think the following scheme would make everything much less painless.
> (The discussion assumes a 32-bit implementation; there is an obvious
> 64-bit analog.)  Consider a heap segment to be a collection of
> super-cells each containing space for 8 contiguous cells.

To keep the correspondance between what we're currently calling heap
segments (larger chunks of contiguous memory), let's call them
chunklets (which also has the benefit that, if they're implemented,
and there are bugs, I can say 'Argh!  Chunklet fudge!').

>   The first
> cell is not used as a cell but as an 8-byte vector, containing a byte
> for each of the remaining cells (with the final byte open for other
> use).  The byte corresponding to each cell (call it a Cell_Info) will
> contain the generation number and some other flags discussed below.
> The super-cells are aligned so that their addresses are a multiple of
> 0x40; this can always be done by wasting a few bytes in segments that
> do not begin on a page boundary.
> 
> Now, given a pointer p to a cell we can derive a pointer to its
> Cell_Info by:
>   #define SCM_CELL_INFOP(p) ((p) & 0x3f)/8 + ((p) & -0x40)
>   (some casting also required)
> This code sequence should only take 4 to 6 instructions depending on
> the processor architecture.
> 
> So what's in Cell_Info?  My thoughts go something like this:
> 
> struct Cell_Info {
>    int generation:4;
>    int car_newer:1;
>    int car_older:1;
>    int cdr_newer:1;
>    int cdr_older:1;
>    };
> 
> struct Super_Cell {
>    Cell_Info cell_info[7];
>    char any_out;
>    Cell cells[7];
>    };
> 
> The generation field is, of course, the cell's generation number.
> Four bits is probably overkill; three might be enough, with the fourth
> bit used for something else, like a pinned flag (because of reference
> from the stack), but I'm not sure if this is useful to be kept
> semi-permanently.  

Actually, one of the reasons I went beyond a byte is that we probably
should be keeping more than (8-16) possible generations. In the small
case, it seems like overkill, but if we stop at such a small number,
the last generation is likely to become large quickly. Also, since we
are going to have extra info per object, it's a good idea to put the
regular gc marks in there as well (we'll need two to flip flop, I
think), so that we don't need to trot all over a bunch of heap
segments removing marks to prevent guile from seeing illegal values (I
learned that one the hard way ;).

> The car/cdr-newer/older flags are set by the write
> barrier.  The newer or older flag is set when the pointer is to a
> member of a younger or older generation respectively.  As an
> optimization, we can set a flag in the any_out byte of the super-cell
> to indicate (at least) one of the cells has a pointer outside its
> generation.
> 
> The initial marking phase from the root set now has lots of
> opportunities for tree pruning.  If a pointer goes outside the
> generation being collected, it need not be followed for additional
> marking.  Unfortunately, a second marking phase must scan the whole
> heap for pointers into the generation being collected, but we can make
> effective use of the Cell_Info bits to lighten the work load.  (The
> normal copying collector wins big by not having to do this; this is
> the price we have to pay for a non-copying collector.)

This is the strongest case for storing possible pointers to a
generation separately. 

An improvement here is to keep at least some minimal information on a
per segment basis (where we could point... it's not perfect, but
avoids some unnecessary scanning). The part we really want to avoid is
scans through any large amount of heap space, because this is almost
always going to cost more than tracing live objects.
 
> So the second mark phase looks something like:
> 
>   int n = generation_being_collected;
>   Super_Cell *scp = initial_value;
>   while (/* some condition */) {
>     if (scp->any_out)
>        for (i=0; i<7; ++i) {
>          int sign = scp->cell_info[i].generation - n;
>          if (sign < 0) {
>             if (scp->cell_info[i].car_older) {
>                SCM *the_car = SCM_CAR(scp->cells[i]);  /* this is safe! */
>                if (SCM_CELL_INFOP(the_car)->generation == n)
>                   mark(the_car);
>                }
>             if (scp->cell_info[i].cdr_older) {
>                SCM *the_cdr = SCM_CDR(scp->cells[i]);
>                if (SCM_CELL_INFOP(the_cdr)->generation == n)
>                   mark(the_cdr);
>                }
>             }
>           else if (sign > 0) {
>             if (scp->cell_info[i].car_newer) {
>                SCM *the_car = SCM_CAR(scp->cells[i]);
>                if (SCM_CELL_INFOP(the_car)->generation == n)
>                   mark(the_car);
>                }
>             if (scp->cell_info[i].cdr_newer) {
>                SCM *the_cdr = SCM_CDR(scp->cells[i]);
>                if (SCM_CELL_INFOP(the_cdr)->generation == n)
>                   mark(the_cdr);
>                }
>           else
>              ;   /* cell is in generation being collected; */
>                  /* it's either already marked or will be marked by */
>                  /* being pointed to by another generation or is garbage */
>           }
>        }
>    ++scp;
>    }
> 
> Finally, the sweep phase will do the usual work of scanning for cells
> of the proper generation and unmarking them and either incrementing
> their generation number or placing them on the free list.  Of course,
> the any_out flags of any super cells with a newly collected cell must
> be recomputed.
>
> After thinking about the issues you raised in your notes, I think that
> babies should be treated specially.  I would propose that they be
> allocated in an "incubator" rather than in a normal generation heap
> segment.  This incubator would not consist of super-cells but be
> treated as a vector of cells.  Allocation would not require a free
> list, but just testing for exhaustion of the area and incrementing a
> pointer by 8.  

That is an excellent idea. 

> The write barrier for babies would be handled by
> keeping a list of pairs: the cell addresses that were stored into an
> older generation cell and the address of that cell.  This way a full
> secondary scan of the heap to find references to babies is not
> required.  The final difference of the incubator is that live cells
> that are not pinned are copied into the higher generation segments,
> using the normal copying gc techniques and allocating from the free
> list (but also updating older generation pointers from the write
> barrier list).
> 
> If there are no pinned cells, the whole incubator is ready to be
> reused as one contiguous allocation area; if there are n pinned cells,
> then we have (at most) n+1 contiguous areas that can each be allocated
> sequentially.  If we get too many pinned cells, we can just allocate a
> new incubator, but I think that long-term pinned cells will be fairly
> rare.

I may have stressed them a bit too much.

> The advantages of the incubator are that we have a lot of locality
> during allocation and during baby collection, which is the most
> frequent gc activity.  Since the incubator will contain mostly garbage
> (if the generational assumptions hold), the copying is minimal and we
> can avoid building a free list for the incubator.  We may prematurely
> copy some babies created at the end of the cycle that will soon become
> garbage.  

This can still be avoided by keeping a bit per object in the incubator
(I like that :), and moving them on if the bit is set (means a bit
more junk in the incubator, but we are talking about a very small
space, that won't be too difficult to scan). Or a second incubator
(probably better) that they have to live through to make it to the
heap proper. This is definately worth the mem, since the overhead for
an object that graduates is greater.

> We also might have to allocate a new heap segment if the
> normal free list is exhausted by the copying.  This affects the
> croaking geezer problem by filling out free cells, but may end up
> mixing generations too much.  As you point out, we can order the free
> list as desired.  Identifying babies complicates the calculation of
> the Cell_Info pointer (which doesn't exist for babies).  The secondary
> mark phase described above doesn't need to worry about this as elders
> never have their newer bits set when pointing to babies.  The root set
> trace and the write barrier must do a range check against the
> incubator bounds to see if a pointer is to a baby.
>
> The write barrier by now is non-trivial, but it only needs to be dealt
> with in set-car! and set-cdr! (and letrec?) which should be rare in
> truly schemeful programming.  We need to determine the generations of
> the cell being modified and the cell being referenced (if the pointer
> is to a pair). If the referenced cell is a baby and the modified cell
> is not a baby, we add the two cell addresses to the incubator list; if
> both are non-babies and of different generations, we set the
> car/cdr-newer/older flags as appropriate and the any-out flag.

Actually, this hits the unfortunate problem of user objects, as
well. Tenatively, what will happen is:

1) Normal 'scheme c' code will use something like SCM_ASSIGN(x, val),
   rather than x=val.

2) User smobs will have to take special care when they can contain
   pointers to the heap

And probably a lot more. I've not sketched this out completely, but it
seems likely that a memory protected write barrier won't be feasible,
from both a performance, and pain in the butt for everyone (including
me), standpoint.

> IMHO the heap segment allocation strategy needs to be modified from
> the current Guile for a generational gc.  It starts out by allocating
> 32k cells and then triples the heap on each further allocation (as
> long the OS will agree).  While selecting the appropriate parameters
> will require lots of tuning, I would think the growth should be more
> linear.  We are gc-ing the incubator fairly frequently so the normal
> heap segments should stay more in line with longer-term storage needs.

Cool mind reading... from my news page last night (oh yeah, there's a
news page about the work on the gcs', too...
http://home.thezone.net/~gharvey/guile/guile.html is the head of it
all)

    I've taken out the expmem stuff, because that's really the wrong
    track solution for this gc (and probably in general, since it only
    acts to mask gc problems).

I don't think a gc should ever throw gobs of memory at a problem, and
hope it goes away.
  
> There is no particular reason that the incubator should be the same
> size as a heap segment; it should be sized so that it can be gc-ed in
> a human-imperceptible amount of time.  

My inclination is to make all segments small (currently, a small
multiple of the page size, though it can go to the page size
itself). This has the really nice property of not making copying much
of an issue, since even sparse segments won't be wasting extraordinary
amounts of memory (although, that's even less a problem with the ideas
you've given here). It also allows the luxury of not having to force
kid objects in with the geezers, if the segment is mostly a bunch of
non-contiguous cells.

It also means that, if we use the scan for pointers method of getting
roots, there will be much less to scan.

> If we provide a gc-blocking
> mode that continues collecting the incubator, and doesn't try to
> collect older generations, but just allocates new heap segments as
> necessary, we would have a servicable semi-incremental collector that
> might satisfy MDJ's needs for smooth GUI performance.  This would only
> work if there were eventually some pause or restart points in
> applications when normal gc could be resumed or there were very few
> non-baby pairs surviving.

This should be a safe assumption for 99% of the applications out
there. There is usually some place you can hide a pause, where the
user expects a pause anyway (saving a file, loading a file, for a
game, moving to the next screen/level), and won't be terribly annoyed
if it's half a second, or a full second.
 
> I don't have a strong feeling about scheduling higher generation
> collections; since we are not copying them, there is no area to fill up
> to trigger a collection.  You suggested a frequency of 1/(n+1), which
> seems reasonable.  Perhaps we need some sort of adaptive scheduling
> which pays attention to the rates of change of the sizes of various
> generations.

Yep. I'd also like to provide the ability to program a generation
selection function through scheme, as well as some pre-tuned
behaviours for the various types of apps. This isn't something that
needs an awful lot of thought right now, since the best way to
determine is to actually have a gc to test out ideas on, and an
application is likely to have some little quirk that it wants to deal
with that probably can't be predicted.

> I see two scenarios that most long-running applications will have some
> mixture of: 1) a substantial data base is built up during a startup
> phase and then a steady state phase is entered and various numbers of
> cells are allocated and most soon become garbage, with some minimal
> growth of the long term data; and 2) the application operates in a
> cyclical manner, accumulating data for a medium-long time period,
> after which a great deal of garbage is suddenly produced and the cycle
> repeats.  In the first scenario spending a lot of time in higher
> generations is relatively wasteful, while it is essential in the
> second.  Probing for which generations are changing size by running
> gcs shouldn't affect long-running programs too much.  If they aren't
> long-running it doesn't really matter.

For short running, the gc I'm working on now (as a testbed for most of
the ideas, so it's not a complete waste of time, even if it doesn't
prove tremendously useful) might be sufficient, although the reduction
in scanning that a gen gc provides could make the gen gc mostly the
better choice... this one does a full mark, but only sweeps a small
portion of the segments, just removing marks from the segments it
isn't sweeping (this will be much better once a way to separate the
marks from the objects is implemented and working).
 
> Well, it seems no meaningful discussion of gc can be short.  I hope
> some of these ideas fit in with your current thinking.  Sorry if my
> presentation is too pedantic, but I have to work things out well
> enough that they don't seem completely idiotic or wishful.

I (and I guess most people, although maybe it would require torture to
get them to admit it >;) prefer it when someone explains too much,
rather than too little. The less I have to infer, the better! I really
appreciate the feedback (though this is usually implicit [I hope],
since I'm busy thinking about it and forgetting my manners).

Would it be alright with you if I put up your mail on my guile page?
There's certainly enough here that it will take a while for me to
integrate the ideas with the notes (is that ok, as well?)


-- 
Greg