This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: stable sorting


On Mon, 4 Jan 1999, Per Bothner wrote:
> > In applications which work with large amounts of data it is essential
> > that we can sort the data "in place".  Stable algorithms usually
> > require O(N) extra space
> 
> First, merge sort does not require extra space when sorting linked
> lists.  It does require extra space when sorting arrays, but only
> one pointer per record.  (When sorting large amounts of data,
> records are usually much larger than a pointer.)
> 
> Secondly, this is Scheme.  We are supposed to "optimize" for
> correctness, generality, and ease-of-use, not minor space savings.
> If sorting needs 10% or even 20% extra space for a work area,
> who cares?  If you do, you are probably willing to spell
> out unstable-sort or use a more complex interface (such as how
> to manage buffers).

I certainly agree to some extent but what/who says that sorting
beeing stable is a correctness criterion? In most situations you are
happy to get the data sorted as quick and efficient as possible. In
those cases where you really need to care for the order of the equal
items you can certainly also spell out "stable-sort", which also
makes the application program much clearer.

> Applications which work with large amounts of data use
> external sorting algorithms based on merge sort.

In my application, (data mining of huge data bases), I do a lot
of sorting and I need the most efficient algorithm available
and have mostly been using quick sort for that.

> > and significant extra time to execute.
> 
> Quicksort has a really fast inner loop, when the comparison
> is cheap and can be inlined.  I.e. it is great for sorting
> integer arrays with a hard-coded comparison function.
> Of course that is totally irrelevant for real applications,
> or for a Scheme sort routine where the comparison is a
> function parameter.  For a library sort routine, do you
> have real data that merge sort would take "significant
> extra time to execute?"

At the moment we use quick sort for arrays and merge sort for
lists. The performance is currently about the same, i.e. about
500 ms for a list/vector 100000 long on a PII300 and a subr as
comparision function.

	Roland Orre