This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Sparse matrix extension


Hi Patrick,

On Sun, Feb 7, 2016 at 6:25 PM, Patrick Alken <alken@colorado.edu> wrote:
> On 02/06/2016 06:03 PM, Alexis Tantet wrote:
>> Hi Patrick,
>>
>> Thanks for working on it. Following, some comments:
>>
>> On Sun, Feb 7, 2016 at 1:02 AM, Patrick Alken <alken@colorado.edu> wrote:
>>> Hello,
>>>
>>>   I've been working my way through your patch. There is a new branch on
>>> the git called 'sparse' with all the current changes. You've done a lot
>>> of work on your patch so its taking me a while to get through all of it.
>>>
>>>   The main functionality of CRS is in there, along with basic stuff like
>>> memcpy, transpose_memcpy, transpose. Also dgemv works (and hence also
>>> the GMRES linear solver).
>> Since the GMRES is already implemented, do you think it would be worth
>> using the Arnoldi method to find eigenvalues and eigenvectors of
>> sparse non-hermitian matrices?
>
> If you think you can make a nice implementation, go for it. We could add
> it to the gsl_eigen interface.
>

The basic elements of the Arnoldi iteration and the eigenvalue solver
for dense non-hermitian matrices are there.
The question is then which restarting method to use, e.g. the
implicite restart of ARPACK.
I'll see what I can do.

>>
>>> I did some initial work on the fprintf/fscanf
>>> for the triplet case - for the compressed matrices, I am wondering if we
>>> should stick to a standard format such as Harwell-Boeing, I'm not sure
>>> at the moment. We might be able to avoid this issue by making a binary
>>> fwrite/fread interface which just directly writes the 3 arrays to disk.
>> I agree that a standard format would be best but I am not aware of any
>> dominant format.
>> Is Harwell-Boeing one of them?
>> It seems important to me to have both binary and formatted input/output
>> (for coherence with gsl_matrix and because formatted files are more
>> handy to share).
>> Also, it is important to have some header indicating how many elements
>> should be read
>> and what is the dimension of the matrix.
>
> I'm not aware of a standard format either, I'll try and look into this
> further to come up with a good solution.
>
>>
>>> Things I still need to add for the CRS storage:
>>>
>>> 1. spmatrix_add
>>> 2. spblas_dgemm
>>> 3. fprintf/fscanf
>>> 4. transpose - for now I'll keep it the way you coded it, but I think it
>>> would be better to keep the matrix in the same storage format during a
>>> transpose, ie: the user might want to pass A^T to an external solver
>>> library which requires CCS and they might not expect the transpose
>>> routine to change it to CRS. I did a quick search for in-place transpose
>>> algorithms in CCS/CRS and they do appear to exist though I haven't had
>>> time to implement it yet.
>> Fine, but then I think it is important to add a separate function, say
>> "transpose_switch_major",
>> for which the transposition is performed by just changing the major
>> (i.e. row->col or col->row)
>> since this is the cheapest way to transpose while some libraries may
>> work equally well with CRS or CCS.
>
> Ok I like this idea too since it is very efficient.
>
>>
>>> I removed the inner/outer idx stuff from gsl_spmatrix - its not
>>> necessary and also I'd like to avoid modifying the gsl_spmatrix struct
>>> since any changes would break binary compatibility for the next release.
>> That makes perfect sense, but then you need to test each time for the
>> major to know what is the size of the pointer?
>
> I am just making separate routines for CCS and CRS (ie: gsl_spmatrix_ccs
> and gsl_spmatrix_crs to do triplet -> compressed). I think its simpler
> this way.
>
>>
>>> Also your modification to gsl_spmatrix_set - I see what you're trying to
>>> do but I think its better to add a new function:
>>>
>>> double *gsl_spmatrix_ptr(gsl_spmatrix *m, size_t i, size_t j)
>>>
>>> which returns a pointer to the data array for element (i,j), and the
>>> user can then add whatever value they want (this is how its done with
>>> gsl_matrix and gsl_vector). I think it should be straightforward to add
>>> this function, possibly even for the compressed storage too.
>> Then I would also add a compressed matrix constructor (or an option) in which
>> gsl_spmatrix_ptr is used to sum duplicate entries instead of gsl_spmatrix_set.
>> This possibility is offered by some other libraries such as Eigen
>> and is for example useful to build matrices counting (e.g. transitions).
>
> I'm not sure what you mean. I've added a new function gsl_spmatrix_ptr
> to the git, which as far as I can tell does exactly what your
> sum_duplicate flag does. It searches the matrix for an (i,j) element,
> and if found returns a pointer. If not found a null pointer is returned.
> This makes it easy for the user to modify A(i,j) after it has been added
> to the matrix. Are you thinking of something else? Can you point me to
> the Eigen routine?
>

What I meant is to have the equivalent of gsl_spmatrix_compress,
with the difference that gsl_spmatrix_ptr is used instead of gsl_spmatrix_set,
so has to build the compressed matrix from triplets, summing the
duplicates, instead of replacing them.
This is what is done here :
The http://eigen.tuxfamily.org/dox/classEigen_1_1SparseMatrix.html#a5bcf3187e372ff7cea1e8f61152ae49b

Best,
Alexis

>>
>>> Anyway, just wanted to let you know things are progressing (though a
>>> little slowly).
>> Thanks a lot! Would you like me to work on something in particular?
>>
>> Best,
>> Alexis
>>
>>> Patrick
>>>
>>> On 01/20/2016 11:31 AM, Alexis Tantet wrote:
>>>> Hi Patrick,
>>>>
>>>> I've cloned gsl.git, made my modifications to spmatrix/ and spblas/
>>>> and pushed them to master at:
>>>> https://github.com/atantet/gsl
>>>>
>>>> This time, I've tried to modify the code as little as possible. I've
>>>> usually added comments labelled "AT" with each modification.
>>>> I also have a few questions about the code which I've labelled "?AT".
>>>> Finally, I've added the corresponding tests to both test.c (successful
>>>> on my machine).
>>>>
>>>> I see that duplicates are handled for the triplet format by replacing.
>>>> Thus, I removed the function taking care of sorting and summing
>>>> duplicates that I had added to _compress (the trick was to transpose
>>>> with copy to reorder first, transpose back and remove the duplicates
>>>> then, taken from Eigen C++). Instead, I have added an option to sum
>>>> the duplicates (useful e.g. when counting links of a graph or
>>>> transitions of a Markov chain). One may need the third option of
>>>> repeating the triplets, but then avl_insert should also be modified.
>>>>
>>>> However, if I've understood well, no sorting of the inner indices is
>>>> performed right now. This could be problematic in the future (e.g. in
>>>> functions such as _equal).
>>>>
>>>> For _spdgemm for CRS matrices, I've used the trick (A B)^t = B^t A^t
>>>> to convert the problem to CSC. I've tested it successfully but you may
>>>> not like the coding style.
>>>>
>>>> I hope that helps,
>>>> Best,
>>>> Alexis
>>>>
>>>> On Tue, Jan 19, 2016 at 9:50 PM, Patrick Alken <alken@colorado.edu> wrote:
>>>>> On 01/19/2016 12:55 PM, Alexis Tantet wrote:
>>>>>> Hi Patrick,
>>>>>>
>>>>>> Thanks for the quick reply! I wanted to be sure that this contribution
>>>>>> is useful before to spend time on the merging with the latest version.
>>>>>> I will create the gsl.git repository and work on it during the week.
>>>>>>
>>>>>> I had already had a look at the documentation but did not know about
>>>>>> the iterative solvers (a link between each modules would be useful).
>>>>>> My contribution indeed fits in the sparse matrix module + the update
>>>>>> of the dgemm and dgemv functions to support CRS (an update may also be
>>>>>> needed for the solvers).
>>>>> The solvers only call dgemv (as far as I remember) so they shouldn't need an
>>>>> update once dgemv is updated.
>>>>>
>>>>>> I have also developed a simple C++ object allowing to use gsl_spmatrix
>>>>>> as a user-defined matrix in ARPACK++ (a maintained fork of ARPACK++
>>>>>> can be found at https://github.com/m-reuter/arpackpp), allowing to
>>>>>> avoid having to use other libraries such as superLU. It could be
>>>>>> useful to others, maybe as an extension. Now that I think about it,
>>>>>> the iterative solvers could also be used to support the shift and
>>>>>> invert modes (see ARPACK++ documentation). What do you think (I could
>>>>>> work on it)?
>>>>> I've never used ARPACK, but if you want to make an extension to interface
>>>>> GSL/ARPACK its fine with me - such an extension would never be added to the
>>>>> main GSL code since GSL tries to be as standalone as possible.
>>>>>
>>>>>
>>>>>> If you have major comments, the sooner the better, so that I can work
>>>>>> on them while merging.
>>>>>>
>>>>>> Thank you for your interest,
>>>>>> Alexis
>>>>>>
>>>>>> On Tue, Jan 19, 2016 at 6:02 PM, Patrick Alken <alken@colorado.edu> wrote:
>>>>>>> Hi Alexis,
>>>>>>>
>>>>>>>    This looks like very good work! Adding compressed row storage has been
>>>>>>> on
>>>>>>> my todo list for a while. The 'gslsp' extension is unfortunately very out
>>>>>>> of
>>>>>>> date, and the current git contains newer code (including a GMRES
>>>>>>> iterative
>>>>>>> linear solver). I removed the gslsp extension from the web page a while
>>>>>>> back
>>>>>>> to reflect this. You can browse the latest manual to see the current
>>>>>>> sparse
>>>>>>> matrix capabilities (http://www.gnu.org/software/gsl/manual/gsl-ref.pdf)
>>>>>>> -
>>>>>>> there are 3 chapters: sparse matrices, sparse blas and sparse linear
>>>>>>> algebra
>>>>>>> - it looks like your contributions will fit into the sparse matrices
>>>>>>> chapter.
>>>>>>>
>>>>>>>    Would you be able to verify that your changes are compatible with the
>>>>>>> current gsl.git repository? This will make it much easier for me to merge
>>>>>>> everything into the git when ready. It would be best if you made a new
>>>>>>> branch of gsl.git, and add your changes so I can then pull them from
>>>>>>> github
>>>>>>> or somewhere. I will try to find some time in the next few days to look
>>>>>>> over
>>>>>>> your code.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Patrick
>>>>>>>
>>>>>>>
>>>>>>> On 01/19/2016 09:43 AM, Alexis Tantet wrote:
>>>>>>>> Dear GSLers,
>>>>>>>>
>>>>>>>> As a scientist rather than a developer, I have developed an extension
>>>>>>>> of the sparse matrix module (CRS, I/O, manipulation, see below), which
>>>>>>>> I have tested. These modifications conserve the structure of the
>>>>>>>> original module and be useful for a large number of sparse matrices
>>>>>>>> users.
>>>>>>>>
>>>>>>>> I'm not familiar with the contributing process here. My repository can
>>>>>>>> be found there:
>>>>>>>> https://github.com/atantet/gslsp
>>>>>>>> Unfortunately, I did not know of the gsl.git repository and I forked it
>>>>>>>> froml:
>>>>>>>> https://github.com/drjerry/gslsp ,
>>>>>>>> which seems to be a bit older than gsl.git.
>>>>>>>>
>>>>>>>> How can I push/merge to gsl.git ? Should it be as an update or another
>>>>>>>> extension? Is it necessary to adapt to the newest version of the code
>>>>>>>> ?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Alexis Tantet
>>>>>>>>
>>>>>>>> CHANGES.md:
>>>>>>>>
>>>>>>>> Extension of the sparse matrix module of GSL
>>>>>>>>
>>>>>>>> ===================================
>>>>>>>>
>>>>>>>> Introduction
>>>>>>>> ------------
>>>>>>>>
>>>>>>>> Usages of sparse matrices are numerous in scientific computing.
>>>>>>>> When numerical linear algebra problems become large, sparse
>>>>>>>> matrices become necessary to avoid memory overload and unnecessary
>>>>>>>> computations, at the cost of element access and matrix construction.
>>>>>>>> As a result, most large scale linear solvers or eigen solvers perform
>>>>>>>> on sparse matrices.
>>>>>>>>
>>>>>>>> Fortunately, a very useful sparse matrix module has recently been
>>>>>>>> introduced to GSL.
>>>>>>>> However, important features are still lacking, such has
>>>>>>>> Compressed Row Storage (CRS) matrices, input/output functions and
>>>>>>>> other matrix properties and manipulation functions.
>>>>>>>> This new version attempts to address this, conserving the original
>>>>>>>> structure of the module and conventions.
>>>>>>>>
>>>>>>>> Major changes
>>>>>>>> -------------
>>>>>>>>
>>>>>>>> * Add CRS format and update functions manipulating compressed matrices :
>>>>>>>>        - additional flag GSL_SPMATRIX_CRS and macro GSLSP_ISMATRIX (
>>>>>>>> gsl_spmatrix.h )
>>>>>>>>        - additional members innerSize and outerSize used to iterate
>>>>>>>> matrix elements ( gsl_spmatrix.h )
>>>>>>>>        - rename some variables for coherence ( gsl_spmatrix.h , *.c )
>>>>>>>>        - update all functions on compressed matrices ( *.c )
>>>>>>>> * Allow to sum duplicate elements when compressing ( spcompress.c ) :
>>>>>>>>        - modify gsl_spmatrix_compress
>>>>>>>>        - add    gsl_spmatrix_sum_duplicate
>>>>>>>> * CCS <-> CRS and fast transpose inplace in spswap.c :
>>>>>>>>        - add gsl_spmatrix_switch_major
>>>>>>>>        - add gsl_spmatrix_transpose
>>>>>>>> * Add printing and scanning functions in spio.c :
>>>>>>>>        - add gsl_spmatrix_fprintf
>>>>>>>>        - add gsl_spmatrix_fscanf
>>>>>>>> * Add manipulation functions in spmanip.c (particularly useful for
>>>>>>>> Markov chain transition matrices) :
>>>>>>>>        - add gsl_spmatrix_get_rowsum : get vector of sum over row
>>>>>>>> elements
>>>>>>>>        - add gsl_spmatrix_get_colsum : get vector of sum over column
>>>>>>>> elements
>>>>>>>>        - add gsl_spmatrix_div_rows   : divide all elements of each row
>>>>>>>> by a vector element
>>>>>>>>        - add gsl_spmatrix_div_cols   : divide all elements of each
>>>>>>>> column by a vector element
>>>>>>>> * Add test functions in atprop.c :
>>>>>>>>        - add gsl_spmatrix_gt_elements : greater than test for each
>>>>>>>> matrix
>>>>>>>> element
>>>>>>>>        - add gsl_spmatrix_ge_elements : greater or equal than test for
>>>>>>>> each matrix element
>>>>>>>>        - add gsl_spmatrix_lt_elements : lower than test for each matrix
>>>>>>>> element
>>>>>>>>        - add gsl_spmatrix_le_elements : lower or equal than test for
>>>>>>>> each matrix element
>>>>>>>>        - add gsl_spmatrix_any         : test if any non-zero element in
>>>>>>>> matrix
>>>>>>>>
>>>>>>>> Other minor changes have been made, such as error tests.
>>>>>>>> test.c has also been updated to test new features.
>>>>
>>
>>
>



-- 
Alexis Tantet


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]