This is the mail archive of the guile@sources.redhat.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Failure: explicit mark stack


Hello together.

I tried to speed up guile's gc with an explicit mark stack.  Abstract:  an
absolute failure:  Performance decreased by about 5%.

For those of you that are interested in the details:  The current
scm_gc_mark works as follows (in principle):

  scm_gc_mark (obj)
  {
  loop:
    if (immediate (obj))
      return;
    if (marked (obj))
      return;
    switch (type_of (obj)) {
      cons_pair:
  	scm_gc_mark (car (obj));
  	obj = cdr (obj);
  	goto loop;
      type_with_no_references:
  	return;
      type_with_one_reference:
  	obj = the_one_reference (obj);
  	goto loop;
      type_with_more_references:
  	for each reference 'ref':
  	  scm_gc_mark (ref);
  	return;
    }
  }

(Actually, the test whether the object is marked is done _after_ the type
dispatch, since the different types have their mark bits in different
places.  But this is just to show you the concept.)

As a test, I used the guile version with my enhancements that cause a
gc with every access to a cell.  This is an absolute gc stress test,
since it means that there is essentially nothing else than gc's going
on.  The following code for example:
  guile -c '(set-debug-cell-accesses! 1)(quit)'
takes about 45 seconds to complete :-)

45.78u 0.03s 0:48.95 93.5%
45.25u 0.06s 0:45.85 98.8%
45.29u 0.02s 0:46.17 98.1%

A second test worked as follows:
  guile -c '(define foo (map list (iota 10000)))(set-debug-cell-accesses! 1)(quit)'

68.93u 0.06s 1:09.61 99.1%
69.29u 0.02s 1:10.56 98.2%
69.29u 0.06s 1:10.06 98.9%

I tried two different ways to add an explicit mark stack:

  static SCM stack[10000];
  static unsigned int stackptr = 0;

  scm_gc_mark (obj)
  {
  loop:
    if (immediate (obj)) 
      goto fetch_tos;
    if (marked (obj))
      goto fetch_tos;
    switch (type_of (obj)) {
      cons_pair:
        stack[stackptr++] = car (obj);
  	obj = cdr (obj);
  	goto loop;
      type_with_no_references:
  	return;
      type_with_one_reference:
  	obj = the_one_reference (obj);
  	goto loop;
      type_with_two_references:
        stack[stackptr++] = the_first_reference (obj);
  	obj = the_second_reference (obj);
  	goto loop;
      type_with_more_references:
  	for each reference 'ref':
  	  scm_gc_mark (ref);
  	return;
    }

  fetch_tos:
    if (stackptr) {
      obj = stack[--stackptr];
      goto loop;
    }
  }

That is, I just changed a couple of types to use the stack, but kept a
single loop around the whole thing.  For more complicated things like
vectors, structs etc.  I just kept the recursive calls.  This is OK,
since we simply could not get around recursive calls, because the
innards of the smob mark functions might hold recursive calls anyway.
I did not add any stack overflow checks, but chose a stack depth of
10000, which for a test should be sufficient.

The results were about 5% worse with the explicit stack.  :-(

47.56u 0.03s 0:48.22 98.6%
47.62u 0.02s 0:48.39 98.4%
47.57u 0.04s 0:48.17 98.8%

Also with the second test, the results with the explicit stack were worse:

73.66u 0.03s 1:14.29 99.1%
73.61u 0.02s 1:14.30 99.0%
73.71u 0.02s 1:14.64 98.7%

I also tried the same tests with a different approach to adding an
explicit stack:

  static SCM stack[10000];
  static unsigned int stackptr = 0;

  scm_gc_mark (obj)
  {
  loop_1:
    if (immediate (obj)) 
      return;
    if (marked (obj))
      return;
    switch (type_of (obj)) {
      cons_pair:
        stack[stackptr++] = car (obj);
  	obj = cdr (obj);
  	goto loop_2;
      type_with_no_references:
  	return;
      type_with_one_reference:
  	obj = the_one_reference (obj);
  	goto loop_1;
      type_with_two_references:
        stack[stackptr++] = the_first_reference (obj);
  	obj = the_second_reference (obj);
  	goto loop_2;
      type_with_more_references:
  	for each reference 'ref':
  	  scm_gc_mark (ref);
  	return;
    }

  loop_2:
    if (immediate (obj)) 
      goto fetch_tos;
    if (marked (obj))
      goto fetch_tos;
    switch (type_of (obj)) {
      cons_pair:
        stack[stackptr++] = car (obj);
  	obj = cdr (obj);
  	goto loop_2;
      type_with_no_references:
  	goto fetch_tos;
      type_with_one_reference:
  	obj = the_one_reference (obj);
  	goto loop_2;
      type_with_two_references:
        stack[stackptr++] = the_first_reference (obj);
  	obj = the_second_reference (obj);
  	goto loop_2;
      type_with_more_references:
  	for each reference 'ref':
  	  scm_gc_mark (ref);
  	goto fetch_tos;
    }

  fetch_tos:
    if (stackptr) {
      obj = stack[--stackptr];
      goto loop_2;
    }
  }

That is, processing is done in two quite similar loops.  While the first
loop is processed, it is assumed that the stack is empty.  Thus, as soon
as an immediate or already marked object appears, it is allowed to return
immediately.  But, as soon as one single object is placed on the stack,
further processing is done by the second loop.

Again, the performance was worse with the explicit stack.  This time I
also distinguished between compilation with and without optimization (O2),
but since I realized that optimization did not bring much gain here and
since even the results with optimization were worse than without the
explicit stack, I did not try out the optimized versions for the other
cases.

with no optimization

47.15u 0.02s 0:47.74 98.8%
47.14u 0.04s 0:47.73 98.8%
47.25u 0.02s 0:47.60 99.3%

with -O2

45.93u 0.05s 0:46.56 98.7%
45.79u 0.03s 0:46.23 99.1%
45.71u 0.04s 0:46.38 98.6%

Second test with no optimization

73.84u 0.02s 1:14.55 99.0%
73.72u 0.04s 1:14.50 99.0%
73.87u 0.03s 1:14.75 98.8%

with -O2

71.71u 0.03s 1:12.39 99.1%
71.91u 0.02s 1:13.10 98.3%
71.88u 0.06s 1:12.91 98.6%


The machine on which I performed these tests is:
  > uname -a
  SunOS krantor 5.7 Generic sun4u sparc SUNW,UltraSPARC-IIi-Engine


And I really had thought that by using an explicit stack we could achieve
some speedup :-(

Best regards
Dirk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]