(No, we're not really a corporation. I couldn't get the @tatami or @tatami-galaxy handles because they were already taken.)
tatami is a C++ library that implements read access to a variety of matrix representations through a common interface. Developers can use this interface to create applications that can extract data from diverse matrix representations, whether it be dense or sparse, row-major or column-major, in-memory or file-backed, with or without delayed operations, and so on. It was initially developed to support scalable analyses of genomics datasets, where the matrix representation can be easily substituted depending on the circumstances. tatami-powered applications can thus be used to process small in-memory matrices on a laptop, or large file-backed matrices on a high performance compute node.
Matrix representations currently supported by tatami include:
- Dense row/column major matrices, with user-defined storage modes and containers.
- Compressed sparse row/column matrices, with user-defined storage modes and containers.
- Matrices generated by delayed operations,
stoleninspired by those in DelayedArray. - Matrices backed by HDF5 files, either as dense datasets or in a compressed sparse format.
- Matrices backed by dense or sparse TileDB arrays.
- Wrappers around R matrices that cannot be transformed into the above types, for use within R packages.
tatami is implemented as a header-only C++ library and can be dropped directly into an existing C++ project, either manually or via CMake.
Check out the repository for more details on the tatami::Matrix
interface.
tatami extensions implement more complex matrix representations and can be included by applications on an as-needed basis. As these often depend on additional third-party libraries, they are not included as part of the core library.
- tatami_hdf5 provides representations for HDF5-backed matrices.
- tatami_tiledb provides representations for TileDB-backed matrices.
- tatami_r provides representations for unknown R matrices.
- tatami_mtx enables loading of Matrix Market files into tatami matrices.
The beachmat R package (Bioconductor, GitHub) vendors the tatami library,
allowing R package developers to compile their C++ code against the tatami interface.
It also implements the initializeCpp()
generic that maps an abstract R matrix to a suitable tatami representation for immediate use by the package's C++ code.
beachmat has several extension packages that mirror those of tatami.
These can be import
ed by package developers or users on an as-needed basis, assuming that the additional dependencies are acceptable.
- beachmat.hdf5 vendors tatami_hdf5
and implements
initializeCpp()
methods for HDF5Array classes. - beachmat.tiledb vendors tatami_tiledb
and implements
initializeCpp()
methods for the TileDBArray class.
The mattress package vendors the tatami library,
allowing Python package developers to... well, see the beachmat description above.
Like beachmat, mattress implements the initialize()
generic function that maps various Python matrices into a suitable C++ representation.
The scran.js Javascript package compiles tatami to WebAssembly to enable single-cell analyses in the browser via kana.
The SingleR R package uses beachmat to implement matrix representations for the associated C++ cell type annotation library.
The scranpy Python package uses mattress to perform single-cell analyses based on the libscran libraries.
Got a use case that you'd like to advertise? Make a PR and add it here!
Hopefully this will be a bit less lonely soon.
Lun ATL, Pagès H, Smith ML (2018). "beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types." PLoS Comput. Biol., 14(5), e1006135. doi:10.1371/journal.pcbi.1006135