Sparse matrices #511

Jake-Moss · 2024-02-23T06:03:34Z

I've started looking into a the sparse matrix options and thought it best to begin grouping some notes and work related to it.

Format wise we have a couple options

CSR, compressed sparse row
Stores 3 vectors, one of values, col index, and row index. Fast row access, not col access. Fast matrix multiplication.
Not great for incremental construction.
Block encoding is possible
CSC, compressed sparse column
Same as CSR but for fast column access
COO, list of coordinates
Solid for incremental construction, single element per entry. Random access is fastest when sorted by row then col index.
Horrible for just about anything else
LIL, list of lists
Good for incremental construction, builds whole rows at a time.
Couple of feasible implementations, could be:
- vector of pointers to vectors
  Gives expandable and variably sized rows, rows are nullable but slicing column (or row) wise is horrible for spacial locality. Row (or col) slices though are very good.
  Requires that all rows (or col) are allocated, not great for huge matrices
- offset encoded
  Instead of storing multiple all rows, we can store and array of offsets as to where the rows pointer is. Much sparser with fast row slicing, but lookups for rows would require bisection
DOK, dictionary of keys
Good for incremental construction, builds single elements at a time
Typical implementation would be with hash maps of (row, col) -> value, great for very sparse matrices, or matrices with little groupings of element
- Block encoding
  It's possible to encode small, dense blocks of the matrix. Say there's a 1k * 1k sparse float32 matrix, so that's 4 bytes per entry, which a cacheline size of 64 bytes, you could store a 4x4 dense matrix blocks in a single cache line. This is really useful for storing small dense clusters of values.

All these formats are implementable ourselves, I don't believe we need to pull in anything special to construct or manipulate these formats. Provided our formatting is standard and correct, conversion to a scipy.sparse matrix is trivial. Storage using omx should also be fine.

Our choice (or rather choices) then really comes to the manner in wise we wish to construct the select link results. @pedrocamargo do you think the existing select link results would be similar to the ones we'll be producing with the planned system? I want to take a look at generally the order in which we construct the matrix, and what the end result looks like.

jamiecook · 2024-02-25T00:29:33Z

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

Jake-Moss · 2024-02-26T05:04:31Z

@Jake-Moss - would it not be possible to just use scipy.sparse format directly and avoid defining our own format/implementation?

The scipy sparse module is really just a small wrapper around 3 numpy arrays, the classes and such just provide convenient access to these formats. I don't plan on using our own format but rather implementing a similar wrapper that we can access without the gil, while encapsulating all the logic for disk reading/writing/indexing, and allowing us to later extend the AequilibraE matrix class with sparse support. I want do it in a way that is completely compatible with the scipy matrices as well, they even provided methods to construct a sparse object given the 3 arrays.

All the formats I mentioned above are implemented by scipy as well

Jake-Moss · 2024-02-27T03:22:33Z

Whoops I didn't think renaming a branch would close the PR. We can live with the typo for now

Jake-Moss · 2024-02-27T06:57:07Z

Something that didn't occur to be original is that we require thread-safe construction. I think this can be resolved by simply treating the matrix as sparse rows, potentially empty, during construction, assuming we parallelise over origins. I don't believe this to be a big issue but just makes it harder with how much I can develop without seeing concrete usage.

Jake-Moss · 2024-04-30T07:44:24Z

Superseded by #515. Work was copied over

A change to pr

156454b

Jake-Moss deleted the branch AequilibraE:sprase_matrices February 27, 2024 03:18

Jake-Moss closed this Feb 27, 2024

Jake-Moss deleted the sprase_matrices branch February 27, 2024 03:18

Jake-Moss restored the sprase_matrices branch February 27, 2024 03:22

Jake-Moss reopened this Feb 27, 2024

Jake-Moss closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse matrices #511

Sparse matrices #511

Jake-Moss commented Feb 23, 2024 •

edited

Loading

jamiecook commented Feb 25, 2024

Jake-Moss commented Feb 26, 2024

Jake-Moss commented Feb 27, 2024

Jake-Moss commented Feb 27, 2024

Jake-Moss commented Apr 30, 2024

Sparse matrices #511

Sparse matrices #511

Conversation

Jake-Moss commented Feb 23, 2024 • edited Loading

jamiecook commented Feb 25, 2024

Jake-Moss commented Feb 26, 2024

Jake-Moss commented Feb 27, 2024

Jake-Moss commented Feb 27, 2024

Jake-Moss commented Apr 30, 2024

Jake-Moss commented Feb 23, 2024 •

edited

Loading