This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

vector-matrix dot product


On most modern serial computers, the typical cache line is 128 bits --
four 32 bit single precision or two 64 bit double precision
floating-point numbers.  When the matrix-vector product

        Ax

where A is an m by n matrix object and x is an n element column vector
is implemented in Fortran, striding between elements in the rows of matrix A
may result in a cache miss on every access if A is a very large matrix.
In this case, it might be a good idea to store the transpose of matrix A

        B = A^T

in memory instead of matrix A and compute the matrix-vector dot product

        B^Tx

instead of the matrix-vector product.  A similar problem occurs
when the vector-matrix product is implemented in C.
In which case it might be better to store the transpose of matrix A

        B = A^T

in memory instead of matrix A and compute the vector-matrix dot product

        xB^T 

instead of the vector-matrix product xA.

Similarly, the matrix-matrix dot product

        A^TX

in Fortran or

        XA^T

in C has a slight advantage over the matrix-matrix product.
Block algorithms can be applied to The matrix-matrix dot product
just as they are applied to the matrix-matrix product.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]