Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrices #511

Closed

Conversation

Jake-Moss
Copy link
Contributor

@Jake-Moss Jake-Moss commented Feb 23, 2024

I've started looking into a the sparse matrix options and thought it best to begin grouping some notes and work related to it.

Format wise we have a couple options

  • CSR, compressed sparse row
    Stores 3 vectors, one of values, col index, and row index. Fast row access, not col access. Fast matrix multiplication.
    Not great for incremental construction.
    Block encoding is possible

  • CSC, compressed sparse column
    Same as CSR but for fast column access

  • COO, list of coordinates
    Solid for incremental construction, single element per entry. Random access is fastest when sorted by row then col index.
    Horrible for just about anything else

  • LIL, list of lists
    Good for incremental construction, builds whole rows at a time.
    Couple of feasible implementations, could be:

    • vector of pointers to vectors
      Gives expandable and variably sized rows, rows are nullable but slicing column (or row) wise is horrible for spacial locality. Row (or col) slices though are very good.
      Requires that all rows (or col) are allocated, not great for huge matrices
    • offset encoded
      Instead of storing multiple all rows, we can store and array of offsets as to where the rows pointer is. Much sparser with fast row slicing, but lookups for rows would require bisection
  • DOK, dictionary of keys
    Good for incremental construction, builds single elements at a time
    Typical implementation would be with hash maps of (row, col) -> value, great for very sparse matrices, or matrices with little groupings of element

    • Block encoding
      It's possible to encode small, dense blocks of the matrix. Say there's a 1k * 1k sparse float32 matrix, so that's 4 bytes per entry, which a cacheline size of 64 bytes, you could store a 4x4 dense matrix blocks in a single cache line. This is really useful for storing small dense clusters of values.

All these formats are implementable ourselves, I don't believe we need to pull in anything special to construct or manipulate these formats. Provided our formatting is standard and correct, conversion to a scipy.sparse matrix is trivial. Storage using omx should also be fine.

Our choice (or rather choices) then really comes to the manner in wise we wish to construct the select link results. @pedrocamargo do you think the existing select link results would be similar to the ones we'll be producing with the planned system? I want to take a look at generally the order in which we construct the matrix, and what the end result looks like.

@jamiecook
Copy link
Contributor

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

@Jake-Moss
Copy link
Contributor Author

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

The scipy sparse module is really just a small wrapper around 3 numpy arrays, the classes and such just provide convenient access to these formats. I don't plan on using our own format but rather implementing a similar wrapper that we can access without the gil, while encapsulating all the logic for disk reading/writing/indexing, and allowing us to later extend the AequilibraE matrix class with sparse support. I want do it in a way that is completely compatible with the scipy matrices as well, they even provided methods to construct a sparse object given the 3 arrays.

All the formats I mentioned above are implemented by scipy as well

@Jake-Moss Jake-Moss deleted the branch AequilibraE:sprase_matrices February 27, 2024 03:18
@Jake-Moss Jake-Moss closed this Feb 27, 2024
@Jake-Moss Jake-Moss deleted the sprase_matrices branch February 27, 2024 03:18
@Jake-Moss
Copy link
Contributor Author

Whoops I didn't think renaming a branch would close the PR. We can live with the typo for now

@Jake-Moss Jake-Moss restored the sprase_matrices branch February 27, 2024 03:22
@Jake-Moss Jake-Moss reopened this Feb 27, 2024
@Jake-Moss
Copy link
Contributor Author

Something that didn't occur to be original is that we require thread-safe construction. I think this can be resolved by simply treating the matrix as sparse rows, potentially empty, during construction, assuming we parallelise over origins. I don't believe this to be a big issue but just makes it harder with how much I can develop without seeing concrete usage.

@Jake-Moss
Copy link
Contributor Author

Superseded by #515. Work was copied over

@Jake-Moss Jake-Moss closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants