Skip to content

Support for row- and column-major order #3

Open
@grothesque

Description

@grothesque

I reference a discussion from rust-ndarray/ndarray#1272 (comment):

@grothesque wrote:

Is your choice motivated by BLAS/LAPACK being (marginally) more efficient for column-major data?

Do I understand correctly that mdarray is column major in the sense that the restricted layouts are column major? But the fully strided layout can accept any (fixed rank) strided array, right? Right now in Rust we cannot have a fully generic ndspan like in C++, but it should be possible to have a set of useful layouts for both column-major and row-major within a single library, or do you see a problem with this?

@fre-hu replied:

The choice is only to have a convention, and then column major is common for linear algebra. It is used both for memory layout and to give the order of dimensions in iteration.

Using strided layout with row major data will work, but operations that depend on iteration order will have worse access pattern. It works fine for interfacing though, and internally one could make a copy or reverse indices.

To have full support for both row and column major would require one more generic parameter for the order. I had it in an earlier version, both removed it as it made both the library and interface more complex. C++ mdspan gets around this since it is quite thin.

From my point of view, row-major is arguably more relevant for a Rust array library than column major:

  • Rust is much more a spiritual heir to C/C++ than Fortran.
  • expr![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]] having shape (3, 2) and not (2, 3) by default seems surprising.
  • NumPy is and likely will remain the array library to which most people are first exposed. Despite its matlab lineage, but in consistency with the Python/C-world to which it belongs, NumPy uses row major by default. Moreover, seamless interoperability with NumPy (at least potentially) seems like an important feature of a Rust array library. (Cf. the success of Polars.)

So, if there can be only one, I'd vote for row major 😇...

However, as far as memory layout goes, it should be possible to have both without an additional generic parameter, right? Just like C++ mdspan has layout_right, layout_left, and layout_stride.

The problem seems to be more about ensuring efficient order of dimensions when iterating. One possibility would be to have both ("iterate_left_to_right", and "iterate_right_to_left"), and then only one (or none) would be efficient for a given array.

To treat the general case efficiently, there could be a function to (statically or dynamically) reorder dimensions into either layout (if possible).

All of this would not require an additional generic parameter (I believe).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions