This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

New uniform vectors design - Draft I



Please feel free to critique this to little pieces.
Thank you.

(use Emacs outline mode to read this.)

------------------>8 cut cut 8<-----------------------

A Rough Take at Redesigning Something Important,
Or How to Write in Bad English and Make a Fool of Yourself
in Public.

* Problems to solve and Grand Goals to achieve.
  --------------------------------------------

** The following problems with uniform arrays now.

(maybe the list is not complete).
*** All uniform vectors have to be allocated by Guile and GC-managed.

Can't hand Guile a pointer to your matrix and have it treated as
Scheme array.

*** There are only so many types that can be put in a uniform vector.

What if I want an array of long doubles?  Of my own type?

*** Uniform vectors consume many tc7 tags.

This just seems wasteful, aestetically.  Oh, and all those long switch
statements in unif.c...

** These are the goals that the design presented here wants to achieve:

*** More freedom in vector storage management.

I want completely GC-managed vectors, completely unmanaged vectors and
vectors that should be just freed by the GC (think about the result
of, say, strdup).

*** Indirect vectors with unmodifiable contents.

Covers substrings, handles to read-only vectored fields of records,
and maybe more.

*** Any type you want.

I want to be able to have uniform vectors of records, for example.  In
general, I want to have a well-defined framework to describe types of
uniform vector (and record, for that matter) contents.

*** Have mercy on those tc7 tags.

I say in advance that this goal is not exactly achieved, but the
situation gets better.

* Steps to Take.
  -------------

** Concepts.

*** Uniform vector storage/access models.

**** Completely managed.

Created by Guile, modifiable, contents are marked (if applicable),
storage is freed by GC (what we have now).

**** Freed by GC.

Created "outside", modifiable, contents are not marked, storage is
freed by GC.

**** Indirect modifiable.

Points to innards to some other vector (or records), or just
somewhere.  GC-protects the pointee (simply by keeping it's SCM handle
and marking it).  Contents are not marked, storage is not freed.

**** Indirect read-only.

Like the above, but modification of contents is not allowed.

** Tag shuffle.

*** Bits 1 and 3 of the tag denote the vector storage/access model.

00 - managed
01 - freed
10 - indirect
11 - indirect read-only

*** Bits 4 and 5 denote the type.

01 - chars (for strings)
10 - SCM (regular Scheme vectors)
11 - explicit type (for all the rest, more on this later).

*** Summary of changed tags.

    Current                 Proposed                   Notes
--------------------------------------------------------------------------
0001101 - vector         0001101 - free             Up for grabs
0001111 - wvect          0001111 - free             Up for grabs

0010101 - string         001x1x1 - strings
0010111 - substring               |
0011101 - free                    |
0011111 - free                    |

0100101 - uvect          010x1x1 - SCM vectors
0100111 - lvector                 |
0101101 - fvect                   |
0101111 - dvect                   |

0110101 - cvect          011x1x1 - typed vectors    All the rest.
0110111 - svect                   |
0111101 - contin                  |
0111111 - cclo                    |

1000111 - bvect          1000111 - free             Up for grabs
1001101 - byvect         1001101 - contin           Moved
1001111 - ivect          1001111 - cclo             Moved

** Types.

*** Contained type descriptions.

We introduce a new smob type, called Contained Type.  It holds a
following info struct:

struct scm_contained_type_info {
    SCM name;
    scm_sizet size;

    SCM (*box)(SCM type, void *data, SCM owner);
    void (*unbox)(SCM type, void *where, scm_sizet n);

    int (*equalp)(SCM type, void *data1, void *data2);

    SCM (*mark)(SCM type, void *data);
    scm_sizet (*free)(SCM type, void *data);
    void (*init)(SCM type, void *data);

    enum {
      SCM_CONTAINED_TYPE_SIMPLE,
      SCM_CONTAINED_TYPE_RECORD,
      SCM_CONTAINED_TYPE_POINTER
    } flavour;

    union {
      struct scm_record_type_descriptor *rtd;
      SCM pointee_type;
    } data;
}

The members should be pretty self-explanatory.

The `type' and `owner' arguments to the methods are needed to
accomodate record and pointer types, and are ignored for simple atomic
types.

*** How do we use the type descriptions.

SCM vectors and strings have their type encoded in the tag.

SCM vectors and strings are special because they are commonly used, so
the space saving from having the type in the tag is significant.

The rest have the type handle (SCM) put before the data (or pointer
whereof).

All vector access and modification routines, as well as GC mark and
sweep, use the type descriptor to do their job.

** Various implementation notes.

(Note that there's no implementation yet, just the notes :-)

*** Type descriptor protection.

There may be a problem when freeing a vector (or a record) that
happened to be the last instance of a type, because the type can be
freed before the instance, and the reference to the appropriate `free'
method will cause a nice segfault.

To solve this problem, the types are kept in a weak hash table with a
guardian (see Dybvig's paper on guardians for definition).

I've implemented guardians for Guile and will post the code here soon.

*** Owner protection for indirect vectors.

Indirect vectors carry their owner's (pointee's) SCM handle around and
make sure to mark it.

They also keep a pointer to the owner's innards (their base).

This may seem wasteful of space, but it's actually not.

The other way is to keep the owner's handle and the starting index.

Our way has one advantage: a simple way to make an indirect vector out
of a simple C pointer - we just set the owner to SCM_BOOL_F or
something.

*** Freeing the freed vectors.

Is there a need to make it possible to pass user's freeing function to
call when freeing vectors?  The current design doesn't address it, but
it certainly can be done.

*** Read-onliness.

Atomic types will honour it.  Pointer types will honour it.  Records
will honour it too.

The rest (contained types defined by user) will have excersize some
discipline.

Not that there would be any real need to define new contained types
once Guile gets records...

------------------>8 cut cut 8<-----------------------

good night,
mike.