This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Help-gsl] Spearman rank correlation coefficient


Hello,

a month ago I proposed an implementation of the Spearman rank
correlation coefficient as it is missing in the GSL (see emails
below). I took into account some advice and the updated code is
available here:
https://gist.github.com/1784199#file_spearman_v2.c

Since then, I didn't have any answer. I'm not an experienced C
programmer, thus my code may need further improvements, but still it
can be useful to others. Thus can I submit it to the GSL main trunk?
I've never done that before. Can someone indicate me what to do?
Should I request "developer write access" for instance?

Thanks in advance,
Tim


2012/2/11 TimothÃe Flutre <timflutre@gmail.com>
>
> Thanks for your input!
>
> 1) Here is the text of the license under which the Apache code is:
> http://www.apache.org/licenses/LICENSE-2.0. Indeed it seems that we
> would have to indicate their copyright. Is this a problem? In a way,
> there is not a lot of different algorithms to compute the Spearman
> coefficient...
>
> 2) I have made the changes and now have "gsl_stats_spearman_alloc" and
> "gsl_stats_spearman_free" functions for the four arrays ranks1,
> ranks2, d and p. I added the code as a 2nd file to the same gist:
> https://gist.github.com/1784199#file_spearman_v2.c
>
> 3) Yes, we don't know in advance how many ties there will be. That's
> why I reallocate inside the loop. I don't see how I can do
> differently.
>
> 4) I added a function performing tests, using the data defined in
> statistics/test_float_source.
> c. What do I do now? Do I need to have write access to the GSL
> repository on Savannah? Or maybe someone else can do it for me?
>
> Thanks,
> Tim
>
>
> On Thu, Feb 9, 2012 at 6:04 PM, Patrick Alken
> <Patrick.Alken@colorado.edu> wrote:
> >
> > Hello,
> >
> > ÂIt would be best to move this discussion over to gsl-discuss. I think
> > it would be very useful to have this function in GSL. Just a few comments on
> > your code:
> >
> > 1) The code looks clean and nicely commented. One issue is that since
> > you appear to have followed the apache code very closely, there may be a
> > licensing issue - I don't know if the Apache license is compatible with the
> > GPL. On a quick check, its possible we can use it but it seems we need to
> > preserve the original copyright notice.
> >
> > 2) Dynamic allocation - it looks like you dynamically allocate 5
> > different arrays to do the calculation. It would be better to either make
> > functions like gsl_stats_spearman_alloc and gsl_stats_spearman_free, or to
> > pass in a pre-allocated workspace as one of the function arguments. Since
> > you're using workspace of different types (double,size_t), its probably
> > better to make the alloc/free functions.
> >
> > 3) One of your dynamically allocated arrays is realloc()'d in a loop. Is
> > this because the size of the array is unknown before the loop? Perhaps there
> > is a way to avoid the realloc's.
> >
> > 4) We also need to think of some automated tests that can be added to
> > statistics/test.c to test this function exhaustively and make sure its
> > working correctly - even if that consists simply of known output values for
> > a few different input cases.
> >
> > Good work,
> > Patrick Alken
> >
> >
> > On 02/09/2012 04:26 PM, TimothÃe Flutre wrote:
> >>
> >> Hello,
> >>
> >> I noticed that only the Pearson correlation coefficient is implemented
> >> in the GSL
> >> (http://www.gnu.org/software/gsl/manual/html_node/Correlation.html).
> >> However, in quantitative genetics, several authors are using the
> >> Spearman coef (for instance, Stranger et al "Population genomics of
> >> human gene expression", Nature Genetics, 2007) as it is less
> >> influenced by outliers.
> >>
> >> Current high-throughput data requires to compute such coef several
> >> millions of times. Thus I implemented the computation of the Spearman
> >> coef in GSL-like code. In fact, one just need to rank the input
> >> vectors and then compute the Pearson coef on them. For the ranking, I
> >> got inspired by the code from the Apache Math module.
> >>
> >> I was thinking that it could be useful to other users to add my piece
> >> of code to the file "covariance_source.c" of the GSL
> >>
> >> (http://bzr.savannah.gnu.org/lh/gsl/trunk/annotate/head:/statistics/covariance_source.c#L77).
> >> So here is the code: https://gist.github.com/1784199
> >>
> >> I am not very proficient in C, so even if it is not possible to
> >> include the code in the GSL, don't hesitate to give me advice.
> >>
> >> Thanks,
> >> Tim
> >>
> >


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]