This is the mail archive of the
gsl-discuss@sourceware.org
mailing list for the GSL project.
Re: Robust linear least squares
- From: Patrick Alken <patrick dot alken at Colorado dot EDU>
- To: "gsl-discuss at sourceware dot org" <gsl-discuss at sourceware dot org>
- Date: Fri, 24 May 2013 14:12:05 -0600
- Subject: Re: Robust linear least squares
- References: <CAGJVmuLLgYvVG+H_2ccctxmrB0Esm+No6zC1=+N6YWCTWOpuqw at mail dot gmail dot com>
Hi Tim,
Yes I remember and I do still think it would be a very nice function
to have in GSL. My main worry at this point is the need to call the
_alloc and _free functions for this function. Looking through the
statistics chapter of the manual, none of the functions there require
alloc/free calls, so it would be really nice to implement spearman in a
similar way.
One issue we need to worry about, is once we introduce a new
workspace (gsl_stats_spearman) into GSL, it will be there for a long
time since we need to keep GSL binary compatible for future releases (so
that for future releases users won't have to recompile their code and
can just link to the new library).
If there is no other way to nicely implement this function then so be
it - we will include a spearman workspace, but I'd really like to
exhaust other options first.
For example, I know you've looked at the Apache implementation. Have
you looked at the R implementation as well to get any ideas? Also,
numerical recipes implements this function where they allocate 2 vectors
(your ranks1/ranks2 vectors) inside the spear function. Numerical
recipes doesn't seem to need an additional sort vector (your d) or
permutation vector (p). Do you think there is a way to eliminate these 2
parameters, and then perhaps the user could simply pass in a double
variable of size 2*n which you could use as your ranks1/ranks2 vectors,
eliminating the need for spearman_workspace.
Alternatively, do you think there is any clever way to compute the
ranks in-place in the data1/2 vectors, so you won't have to allocate
additional ranks1/ranks2 vectors?
Finally, I know I asked you before about the ties_trace realloc call
- it looks like this variable is allocated to 'nties' on the fly. Is
there any way to count the number of ties initially, so that this only
needs to be allocated once?
I don't see any realloc calls in the Numerical Recipes
implementation, so I'd like to ask you to try to understand how they do
it (Also look at GNU R which may be more professionally written).
Perhaps look at octave too?
Sorry to be a stickler about this but I do think its worth trying to
eliminate the alloc/free calls for this function. Even with the
alloc/free calls there is still a performance hit due to the realloc
calls of ties_trace. I may have some time next week to look into this a
bit more myself.
Patrick
On 05/24/2013 01:23 PM, TimothÃe Flutre wrote:
Hello Patrick,
about the next release, a while ago I proposed some code (+tests) to
compute the Spearman rank correlation coefficient. I uploaded my code
on savannah (http://savannah.gnu.org/bugs/?36199) and it is also
available on github (https://github.com/timflutre/spearman). At least
one person asked on the mailing list if this coef was implemented
(http://savannah.gnu.org/bugs/?37728) so I think it would be useful to
add it.
I tried to follow the GSL guidelines as close as possible so that it
should be possible to integrate the code easily into the next release.
I would be glad to help in this matter if necessary.
Best,
Tim