This is the mail archive of the
mailing list for the Guile project.
New shared substring implementation
- To: email@example.com
- Subject: New shared substring implementation
- From: forcer <firstname.lastname@example.org>
- Date: Mon, 31 May 1999 09:59:22 +0200
Lately, i was wondering why there was the need for (explicitely)
shared substrings at all.
As hinted in guile-ref, a copy-on-write strategy might make
make-shared-substring useless, since normal substring could
return a shared substring which will be copied on the first
write. This would add a simple check on each string mutation
function, but could increase performance drastically otherwise
and retain a very simple interface.
I liked the idea and wanted to implement it in guile. Taking a
look at strings.[ch] (and symbol.h), i noticed that the length of
a string is encoded in it's SCM object, while the SCM obj's CDR
has to point to the char* directly.
If i wanted to change this, i had to change the implementation of
symbols as well (and probably take in alot of incompatibilities
with other parts of guile that use strings)
I assume that such pointer plays - negative indices to store
information, etc. - are common in guile. One question arises:
Is (SCM_CAR(val)>>8) that much faster than SCM_CDR(val)->length?
Is SCM_CDR(val)[-1] that much faster than SCM_CDR(val)->something?
Well, my thoughts about the copy-on-write idea stumbled on one
(define str1 "foo") ; a string is allocated as usual
(define str2 (substring str1 0 1)) ; str2 is now a shared
; substring of str1
(string-set! str1 0 #\b) ; since str1 is not a shared
; substring, it's not copied,
; and thus str2 => "b"
A solution to this problem would be to have (substring) mark the
string it's applied to to be a "substring" as well. The problem
then is that:
(define str1 "foo")
(define str2 (substring str1 2 3)) ; str1 and str2 are marked as
; shared substrings
(string-set! str1 0 #\b) ; str1 now points to a newly
; malloc'd copy of the string,
; and str2 points to the
; original string
(string-set! str2 0 #\b) ; str2 is now a copy of the
; string as well, and there's
; no pointer to the original
; string left :]
Does anyone have an idea on how to solve this elegantly?
(one *could* use a reference count there. I don't see any
possibility of how a circular dependency might come up there)
((email . "email@example.com") (www . "http://forcix.cx/")
(irc . "forcer@#linux.de (IRCnet)") (gpg . "/other/forcer.gpg"))