Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've started looking into a the sparse matrix options and thought it best to begin grouping some notes and work related to it.
Format wise we have a couple options
CSR, compressed sparse row
Stores 3 vectors, one of values, col index, and row index. Fast row access, not col access. Fast matrix multiplication.
Not great for incremental construction.
Block encoding is possible
CSC, compressed sparse column
Same as CSR but for fast column access
COO, list of coordinates
Solid for incremental construction, single element per entry. Random access is fastest when sorted by row then col index.
Horrible for just about anything else
LIL, list of lists
Good for incremental construction, builds whole rows at a time.
Couple of feasible implementations, could be:
Gives expandable and variably sized rows, rows are nullable but slicing column (or row) wise is horrible for spacial locality. Row (or col) slices though are very good.
Requires that all rows (or col) are allocated, not great for huge matrices
Instead of storing multiple all rows, we can store and array of offsets as to where the rows pointer is. Much sparser with fast row slicing, but lookups for rows would require bisection
DOK, dictionary of keys
Good for incremental construction, builds single elements at a time
Typical implementation would be with hash maps of (row, col) -> value, great for very sparse matrices, or matrices with little groupings of element
It's possible to encode small, dense blocks of the matrix. Say there's a 1k * 1k sparse float32 matrix, so that's 4 bytes per entry, which a cacheline size of 64 bytes, you could store a 4x4 dense matrix blocks in a single cache line. This is really useful for storing small dense clusters of values.
All these formats are implementable ourselves, I don't believe we need to pull in anything special to construct or manipulate these formats. Provided our formatting is standard and correct, conversion to a scipy.sparse matrix is trivial. Storage using omx should also be fine.
Our choice (or rather choices) then really comes to the manner in wise we wish to construct the select link results. @pedrocamargo do you think the existing select link results would be similar to the ones we'll be producing with the planned system? I want to take a look at generally the order in which we construct the matrix, and what the end result looks like.