-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSCMatrix OpMulMatrix uses O(numRows) memory, which is too much for some applications #767
Comments
Hi @darkjh , so almost every sparse matrix multiply algorithm I know about uses O(numRows) (or equivalent) temporary memory for doing a matrix multiply. for example, here's scipy: https://github.com/scipy/scipy/blob/f2ec91c4908f9d67b5445fbfacce7f47518b35d1/scipy/sparse/sparsetools/csr.h#L533 And here's CSparse (which powers matlab sparse routines, IIRC): Can you say a bit more about your use case? I can look into doing something with blocks or something |
@dlwh AFAIK we can go sparse only in one direction, for CSC is the row, not column. We use CSC for our ML algorithms and each CSC contains one partition of our dataset. Each column represents a feature vector which is very sparse as we use hashing to handle all the features. |
sorry for being so slow on this. I did look into fixing this, but it got super tricky and I abandoned it |
@dlwh Hi, np! Can you share some insights of the potential fix? Some links? Maybe I can give some brain power into this issue. |
The right approach to doing this (i think) is to make the buffer be
min(numRows, BLOCK_SIZE) and loop over columns multiple times if necessary.
There was something that tripped me up (it's been a while and I forget
exactly what), but it was some mix of painful and nonobvious and I put it
down and never picked it back up
…On Sun, Feb 14, 2021 at 4:18 AM Han Ju ***@***.***> wrote:
@dlwh <https://github.com/dlwh> Hi, np! Can you share some insights of
the potential fix? Some links? Maybe I can give some brain power into this
issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#767 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACLIPFHLGWF6XXI7CBATLS665QJANCNFSM4J3CVQDQ>
.
|
In the v1.0 release the
OpMulMatirx
impl forCSCMatrix
has changed.https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/operators/CSCMatrixOps.scala#L725
In the multiplication a dense array is allocated. For large sparse matrix (which csc matrix is designed for) this would not work ...
The text was updated successfully, but these errors were encountered: