Daf.Queries.QueryColumns
@@ -3991,6 +4005,12 @@ diff --git a/docs/v0.1.0/.documenter-siteinfo.json b/docs/v0.1.0/.documenter-siteinfo.json index f84bbf3..fc8e8bd 100644 --- a/docs/v0.1.0/.documenter-siteinfo.json +++ b/docs/v0.1.0/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-15T18:38:38","documenter_version":"1.4.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-15T21:49:45","documenter_version":"1.4.1"}} \ No newline at end of file diff --git a/docs/v0.1.0/index.html b/docs/v0.1.0/index.html index c75c04a..e1279c9 100644 --- a/docs/v0.1.0/index.html +++ b/docs/v0.1.0/index.html @@ -1118,6 +1118,12 @@
Daf.Queries.FrameColumns
+
+
+Daf.Queries.GroupBy
diff --git a/docs/v0.1.0/objects.inv b/docs/v0.1.0/objects.inv
index 110c185..1a2bd17 100644
Binary files a/docs/v0.1.0/objects.inv and b/docs/v0.1.0/objects.inv differ
diff --git a/docs/v0.1.0/queries.html b/docs/v0.1.0/queries.html
index f83ec7d..393256e 100644
--- a/docs/v0.1.0/queries.html
+++ b/docs/v0.1.0/queries.html
@@ -603,7 +603,7 @@ get_frame(
daf::DafReader,
axis::QueryString,
- [columns::Maybe{Union{AbstractVector{<:AbstractString}, QueryColumns}} = nothing;
+ [columns::Maybe{FrameColumns} = nothing;
cache::Bool = true]
)::DataFrame end
@@ -628,17 +628,7 @@ columns
is not specified, the data frame will contain all the vector properties of the axis, in alphabetical order (since
DataFrame
-
has no concept of named rows, the 1st column will contain the name of the axis entry). Otherwise,
-columns
-
may be a vector of names of vector properties (e.g.,
-["batch", "age"]
-
), or a
-
-QueryColumns
-
- vector of pairs mapping a column name to a query that fetches a value for each entry of the
-axis
-
.
+ has no concept of named rows, the 1st column will contain the name of the axis entry).
By default, this will cache results of all queries. This may consume a large amount of memory. You can disable it by specifying
Specify the columns of a data frame. This can be either a vector of names of vector properties (e.g.,
+cache = false
@@ -655,6 +645,30 @@
Daf.Queries.FrameColumns
+
+ —
+Type
+
+["batch", "age"]
+
), or a
+
+QueryColumns
+
+ vector of pairs mapping a column name to a query that fetches a value for each entry of some axis.
+Daf.Queries.QueryColumns
@@ -3991,6 +4005,12 @@
Daf.Queries.FrameColumns
+
+
+Daf.Queries.GroupBy
diff --git a/docs/v0.1.0/search_index.js b/docs/v0.1.0/search_index.js
index baaf029..d188130 100644
--- a/docs/v0.1.0/search_index.js
+++ b/docs/v0.1.0/search_index.js
@@ -1,3 +1,3 @@
var documenterSearchIndex = {"docs":
-[{"location":"storage_types.html#Storage-types","page":"Storage types","title":"Storage types","text":"","category":"section"},{"location":"storage_types.html","page":"Storage types","title":"Storage types","text":"Daf.StorageTypes","category":"page"},{"location":"storage_types.html#Daf.StorageTypes","page":"Storage types","title":"Daf.StorageTypes","text":"Only a restricted set of scalar, matrix and vector types is stored by Daf.\n\nThe set of scalar types is restricted because we need to be able to store them in disk files. This rules out compound types such as Dict. This isn't an issue for vector and matrix elements but is sometimes bothersome for \"scalar\" data (not associated with any axis). If you find yourself needed to store such data, you'll have to serialize it to a string. By convention, we use JSON blobs for such data to maximize portability between different systems.\n\nJulia supports a potentially infinite variety of ways to represent matrices and vectors. Daf is intentionally restricted to specific representations. This has several advantages:\n\nDaf storage formats need only implement storing these restricted representations, which lend themselves to simple storage in consecutive bytes (in memory and/or on disk). These representations also allow for memory-mapping the data from disk files, which allows Daf to deal with data sets larger than the available memory.\nClient code need only worry about dealing with these restricted representations, which limits the amount of code paths required for efficient algorithm implementations. However, you (mostly) need not worry about this when invoking library functions, which have code paths covering all common matrix types. You do need to consider the layout of the data, though (see below).\n\nThis has the downside that Daf doesn't support efficient storage of specialized matrices (to pick a random example, upper triangular matrices). This isn't a great loss, since Daf targets storing arbitrary scientific data (especially biological data), which in general is not of any such special shape. The upside is that all matrices stored and returned by Daf have a clear MatrixLayouts (regardless of whether they are dense or sparse). This allows user code to ensure it is working \"with the grain\" of the data, which is much more efficient.\n\n\n\n\n\n","category":"module"},{"location":"storage_types.html#Storable-types","page":"Storage types","title":"Storable types","text":"","category":"section"},{"location":"storage_types.html","page":"Storage types","title":"Storage types","text":"Daf.StorageTypes.StorageInteger\nDaf.StorageTypes.StorageFloat\nDaf.StorageTypes.StorageNumber\nDaf.StorageTypes.StorageScalar\nDaf.StorageTypes.StorageScalarBase\nDaf.StorageTypes.StorageVector\nDaf.StorageTypes.StorageMatrix","category":"page"},{"location":"storage_types.html#Daf.StorageTypes.StorageInteger","page":"Storage types","title":"Daf.StorageTypes.StorageInteger","text":"StorageInteger = Union{Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64}\n\nInteger number types that can be used as scalars, or elements in stored matrices or vectors.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageFloat","page":"Storage types","title":"Daf.StorageTypes.StorageFloat","text":"StorageFloat = Union{Float32, Float64}\n\nFloating point number types that can be used as scalars, or elements in stored matrices or vectors.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageNumber","page":"Storage types","title":"Daf.StorageTypes.StorageNumber","text":"StorageNumber = Union{Bool, StorageInteger, StorageFloat}\n\nNumber types that can be used as scalars, or elements in stored matrices or vectors.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageScalar","page":"Storage types","title":"Daf.StorageTypes.StorageScalar","text":"StorageScalar = Union{StorageNumber, <:AbstractString}\n\nTypes that can be used as scalars, or elements in stored matrices or vectors.\n\nThis is restricted to StorageNumber (including Booleans) and strings. It is arguably too restrictive, as in principle we could support any arbitrary isbitstype. However, in practice this would cause much trouble when accessing the data from other systems (specifically Python and R). Since Daf targets storing scientific data (especially biological data), as opposed to \"anything at all\", this restriction seems reasonable.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageScalarBase","page":"Storage types","title":"Daf.StorageTypes.StorageScalarBase","text":"StorageScalarBase = Union{StorageNumber, AbstractString}\n\nFor using in where clauses when a type needs to be a StorageScalar. That is, write where {T <: StorageScalarBase} instead of where {T <: StorageScalar}, because of the limitations of Julia's type system.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageVector","page":"Storage types","title":"Daf.StorageTypes.StorageVector","text":"StorageVector{T} = AbstractVector{T} where {T <: StorageScalar}\n\nVectors that can be directly stored (and fetched) from Daf storage.\n\nThe element type must be a StorageScalar, to allow storing the data in disk files. Vectors of strings are supported but will be less efficient.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Daf.StorageTypes.StorageMatrix","page":"Storage types","title":"Daf.StorageTypes.StorageMatrix","text":"StorageMatrix{T} = AbstractMatrix{T} where {T <: StorageNumber}\n\nMatrices that can be directly stored (and fetched) from Daf storage.\n\nThe element type must be a StorageNumber, to allow efficient storage of the data in disk files. That is, matrices of strings are not supported.\n\nnote: Note\nAll matrices we store must have a clear MatrixLayouts, that is, must be in either row-major or column-major format.\n\n\n\n\n\n","category":"type"},{"location":"storage_types.html#Constructors","page":"Storage types","title":"Constructors","text":"","category":"section"},{"location":"storage_types.html#Storable-types-2","page":"Storage types","title":"Storable types","text":"","category":"section"},{"location":"storage_types.html","page":"Storage types","title":"Storage types","text":"Daf.StorageTypes.sparse_vector\nDaf.StorageTypes.sparse_matrix_csc","category":"page"},{"location":"storage_types.html#Daf.StorageTypes.sparse_vector","page":"Storage types","title":"Daf.StorageTypes.sparse_vector","text":"sparse_vector(dense::StorageMatrix)::SparseVector\n\nCreate a sparse vector using the smallest unsigned integer type needed for this size of matrix.\n\n\n\n\n\n","category":"function"},{"location":"storage_types.html#Daf.StorageTypes.sparse_matrix_csc","page":"Storage types","title":"Daf.StorageTypes.sparse_matrix_csc","text":"sparse_matrix_csc(dense::StorageMatrix)::SparseMatrixCSC\n\nCreate a sparse matrix using the smallest unsigned integer type needed for this size of matrix.\n\n\n\n\n\n","category":"function"},{"location":"storage_types.html#Index","page":"Storage types","title":"Index","text":"","category":"section"},{"location":"storage_types.html","page":"Storage types","title":"Storage types","text":"Pages = [\"storage_types.md\"]","category":"page"},{"location":"generic_functions.html#Generic-Functions","page":"Generic Functions","title":"Generic Functions","text":"","category":"section"},{"location":"generic_functions.html","page":"Generic Functions","title":"Generic Functions","text":"Daf.GenericFunctions\nDaf.GenericFunctions.dedent\nDaf.GenericFunctions.AbnormalHandler\nDaf.GenericFunctions.handle_abnormal","category":"page"},{"location":"generic_functions.html#Daf.GenericFunctions","page":"Generic Functions","title":"Daf.GenericFunctions","text":"Functions that arguably should belong in a more general-purpose package.\n\nWe do not re-export the functions and supporting types defined here from the top-level Daf namespace. That is, even if using Daf, you will not have these generic names polluting your namespace. If you do want to reuse them in your code, explicitly write using Daf.GenericFunctions.\n\n\n\n\n\n","category":"module"},{"location":"generic_functions.html#Daf.GenericFunctions.dedent","page":"Generic Functions","title":"Daf.GenericFunctions.dedent","text":"dedent(string::AbstractString; indent::AbstractString = \"\")::String\n\nGiven a possibly multi-line string with a common indentation in each line, strip this indentation from all lines, and replace it with indent. Will also strip any initial and/or final line breaks.\n\n\n\n\n\n","category":"function"},{"location":"generic_functions.html#Daf.GenericFunctions.AbnormalHandler","page":"Generic Functions","title":"Daf.GenericFunctions.AbnormalHandler","text":"The action to take when encountering an \"abnormal\" (but recoverable) operation.\n\nValid values are:\n\nIgnoreHandler - ignore the issue and perform the recovery operation.\n\nWarnHandler - emit a warning using @warn.\n\nErrorHandler - abort the program with an error message.\n\n\n\n\n\n","category":"type"},{"location":"generic_functions.html#Daf.GenericFunctions.handle_abnormal","page":"Generic Functions","title":"Daf.GenericFunctions.handle_abnormal","text":"handle_abnormal(message::Function, handler::AbnormalHandler)::Nothing\n\nCall this when encountering some abnormal, but recoverable, condition. Follow it by the recovery code.\n\nThis will error if the handler is ErrorHandler, and abort the program. If it is WarnHandler, it will just @warn and return. If it is IgnoreHandler it will just return.\n\nThe message is a function that should return an AbstractString to use. For efficiency, it is not invoked if ignoring the condition.\n\n\n\n\n\n","category":"function"},{"location":"generic_functions.html#Index","page":"Generic Functions","title":"Index","text":"","category":"section"},{"location":"generic_functions.html","page":"Generic Functions","title":"Generic Functions","text":"Pages = [\"generic_functions.md\"]","category":"page"},{"location":"tokens.html#Tokens","page":"Tokens","title":"Tokens","text":"","category":"section"},{"location":"tokens.html","page":"Tokens","title":"Tokens","text":"Daf.Tokens","category":"page"},{"location":"tokens.html#Daf.Tokens","page":"Tokens","title":"Daf.Tokens","text":"The only exported functions from this module are escape_value and unescape_value which are useful when embedding values into query strings. The rest of the module is documented to give insight into how a query string is broken into Tokens.\n\nIdeally Daf should have used some established parser generator module for parsing queries, making all this unnecessary. However, As of writing this code, Julia doesn't seem to have such a parser generator solution. Therefore, this module provides a simple tokenize function with rudimentary pattern matching which is all we need to parse queries (whose structure is \"trivial\").\n\n\n\n\n\n","category":"module"},{"location":"tokens.html#Escaping","page":"Tokens","title":"Escaping","text":"","category":"section"},{"location":"tokens.html","page":"Tokens","title":"Tokens","text":"Daf.Tokens.escape_value\nDaf.Tokens.unescape_value\nDaf.Tokens.is_value_char\nDaf.Tokens.VALUE_REGEX","category":"page"},{"location":"tokens.html#Daf.Tokens.escape_value","page":"Tokens","title":"Daf.Tokens.escape_value","text":"escape_value(value::AbstractString)::String\n\nGiven some raw value (name of an axis, axis entry or property, or a parameter value), which may contain special characters, return an escaped version to be used as a single value Token.\n\nWe need to consider the following kinds of characters:\n\nSafe (is_value_char) characters include a - z, A - Z, 0 - 9, _, +, -, and ., as well as any non-ASCII (that is, Unicode) characters. Any sequence of these characters will be considered a single value Token. These cover all the common cases (including signed integer and floating point values).\nAll other ASCII characters are (at least potentially) special, that is, may be used to describe an operation.\nPrefixing any character with a \\ allows using it inside a value Token. This is useful if some name or value contains a special character. For example, if you have a cell whose name is ACTG:Plate1, and you want to access the name of the batch of this specific cell, you will have to write / cell = ACTG\\:Plate1 : batch.\n\nnote: Note\nThe \\ character is also used by Julia inside \"...\" string literals, to escape writing non-printable characters. For example, \"\\n\" is a single-character string containing a line break, and therefore \"\\\\\" is used to write a single \\. Thus the above example would have to be written as \"cell = ACTG\\\\:Plate1 : batch\". This isn't nice.Luckily, Julia also has raw\"...\" string literals that work similarly to Python's r\"...\" strings (in Julia, r\"...\" is a regular expression, not a string). Inside raw string literals, a \\ is a \\ (unless it precedes a \"). Therefore the above example could also be written as raw\"/ cell = ACTG\\:Plate1 : batch, which is more readable.\n\nBack to escape_value - it will prefix any special character with a \\. It is useful if you want to programmatically inject a value. Often this happens when using $(...) to embed values into a query string, e.g., do not write a query / $(axis) @ $(property) as it is unsafe, as any of the embedded variables may contain unsafe characters. You should instead write something like / $(escape_value(axis)) @ $(escape_value(property)).\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Daf.Tokens.unescape_value","page":"Tokens","title":"Daf.Tokens.unescape_value","text":"unescape_value(escaped::AbstractString)::String\n\nUndo escape_value, that is, given an escaped value with a \\ characters escaping special characters, drop the \\ to get back the original string value.\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Daf.Tokens.is_value_char","page":"Tokens","title":"Daf.Tokens.is_value_char","text":"is_value_char(character::Char)::Bool\n\nReturn whether a character is safe to use inside a value Token (name of an axis, axis entry or property, or a parameter value).\n\nThe safe characters are a - z, A - Z, 0 - 9, _, +, -, and ., as well as any non-ASCII (that is, Unicode) characters.\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Daf.Tokens.VALUE_REGEX","page":"Tokens","title":"Daf.Tokens.VALUE_REGEX","text":"VALUE_REGEX = r\"^(?:[0-9a-zA-Z_.+-]|[^\\x00-\\xFF])+\"\n\nA sequence of is_value_char is considered to be a single value Token. This set of characters was chosen to allow expressing numbers, Booleans and simple names. Any other (ASCII, non-space) character may in principle be used as an operator (possibly in a future version of the code). Therefore, use escape_value to protect any value you embed into the expression.\n\n\n\n\n\n","category":"constant"},{"location":"tokens.html#Encoding","page":"Tokens","title":"Encoding","text":"","category":"section"},{"location":"tokens.html","page":"Tokens","title":"Tokens","text":"Daf.Tokens.encode_expression\nDaf.Tokens.decode_expression","category":"page"},{"location":"tokens.html#Daf.Tokens.encode_expression","page":"Tokens","title":"Daf.Tokens.encode_expression","text":"encode_expression(expr_string::AbstractString)::String\n\nGiven an expression string to parse, encode any non-ASCII (that is, Unicode) character, as well as any character escaped by a \\, such that the result will only use is_value_char characters. Every encoded character is replaced by _XX using URI encoding, but replacing the % with a _ so we can deal with unescaped % as an operator, so we also need to encode _ as _5F, so we need to encode \\_ as _5C_5F. Isn't encoding fun?\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Daf.Tokens.decode_expression","page":"Tokens","title":"Daf.Tokens.decode_expression","text":"decode_expression(encoded_string::AbstractString)::String\n\nGiven the results of encode_expression, decode it back to its original form.\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Tokenization","page":"Tokens","title":"Tokenization","text":"","category":"section"},{"location":"tokens.html","page":"Tokens","title":"Tokens","text":"Daf.Tokens.Token\nDaf.Tokens.tokenize\nDaf.Tokens.SPACE_REGEX","category":"page"},{"location":"tokens.html#Daf.Tokens.Token","page":"Tokens","title":"Daf.Tokens.Token","text":"struct Token\n is_operator::Bool\n value::AbstractString\n token_index::Int\n first_index::Int\n last_index::Int\n encoded_string::AbstractString\nend\n\nA parsed token of an expression.\n\nWe distinguish between \"value\" tokens and \"operator\" tokens using is_operator. A value token holds the name of an axis, axis entry or property, or a parameter value, while an operator token is used to identify a query operation to perform. In both cases, the value contains the token string. This goes through both decode_expression and unescape_value so it can be directly used as-is for value tokens.\n\nWe also keep the location (first_index .. last_index) and the (encoded) expression string, to enable generating friendly error messages. There are no line numbers in locations because in Daf we squash our queries to a single-line, under the assumption they are \"relatively simple\". This allows us to simplify the code.\n\n\n\n\n\n","category":"type"},{"location":"tokens.html#Daf.Tokens.tokenize","page":"Tokens","title":"Daf.Tokens.tokenize","text":"tokenize(string::AbstractString, operators::Regex)::Vector{Token}\n\nGiven an expression string, convert it into a vector of Token.\n\nWe first convert everything that matches the SPACE_REGEX into a single space. This squashed the expression into a single line (discarding line breaks and comments), and the squashed expression is used for reporting errors. This is reasonable for dealing with Daf queries which are expected to be \"relatively simple\".\n\nWhen tokenizing, we discard the spaces. Anything that matches the VALUE_REGEX is considered to be a value Token. Anything that matches the operators is considered to be an operator Token. As a special case, '' is converted to an empty string, which is otherwise impossible to represent (write \\'\\' to prevent this). Anything else is reported as an invalid character.\n\nnote: Note\nThe operators regex should only match the start of the string (that is, must start with ^). Also, when using |, you need to list the longer operators first (e.g., ^(?:++|+) as opposed to ^(?:+|++)).\n\n\n\n\n\n","category":"function"},{"location":"tokens.html#Daf.Tokens.SPACE_REGEX","page":"Tokens","title":"Daf.Tokens.SPACE_REGEX","text":"SPACE_REGEX = r\"(?:[\\s\\n\\r]|#[^\\n\\r]*(?:[\\r\\n]|$))+\"sm\n\nOptional white space can separate Token. It is required when there are two consecutive value tokens, but is typically optional around operators. White space includes spaces, tabs, line breaks, and a # ... comment suffix of a line.\n\n\n\n\n\n","category":"constant"},{"location":"tokens.html#Index","page":"Tokens","title":"Index","text":"","category":"section"},{"location":"tokens.html","page":"Tokens","title":"Tokens","text":"Pages = [\"tokens.md\"]","category":"page"},{"location":"chains.html#Chains","page":"Chains","title":"Chains","text":"","category":"section"},{"location":"chains.html","page":"Chains","title":"Chains","text":"Daf.Chains\nDaf.Chains.chain_reader\nDaf.Chains.chain_writer\nDaf.Chains.ReadOnlyChain\nDaf.Chains.WriteChain","category":"page"},{"location":"chains.html#Daf.Chains","page":"Chains","title":"Daf.Chains","text":"View a chain of Daf data as a single data set. This allows creating a small Daf data set that contains extra (or overriding) data on top of a larger read-only data set. In particular this allows creating several such incompatible extra data sets (e.g., different groupings of cells to metacells), without having to duplicate the common (read only) data.\n\n\n\n\n\n","category":"module"},{"location":"chains.html#Daf.Chains.chain_reader","page":"Chains","title":"Daf.Chains.chain_reader","text":"chain_reader(dafs::AbstractVector{<:DafReader}; name::Maybe{AbstractString} = nothing)::DafReader\n\nCreate a read-only chain wrapper of DafReaders, presenting them as a single DafReader. When accessing the content, the exposed value is that provided by the last data set that contains the data, that is, later data sets can override earlier data sets. However, if an axis exists in more than one data set in the chain, then its entries must be identical. This isn't typically created manually; instead call chain_reader.\n\nnote: Note\nWhile this verifies the axes are consistent at the time of creating the chain, it's no defense against modifying the chained data after the fact, creating inconsistent axes. Don't do that.\n\n\n\n\n\n","category":"function"},{"location":"chains.html#Daf.Chains.chain_writer","page":"Chains","title":"Daf.Chains.chain_writer","text":"chain_writer(dafs::AbstractVector{<:DafReader}; name::Maybe{AbstractString} = nothing)::DafWriter\n\nCreate a chain wrapper for a chain of DafReader data, presenting them as a single DafWriter. This acts similarly to chain_reader, but requires the final entry in the chain to be a DafWriter. Any modifications or additions to the chain are directed only at this final writer.\n\nnote: Note\nDeletions are only allowed for data that exists only in the final writer. That is, it is impossible to delete from a chain something that exists in any of the readers; it is only possible to override it.\n\n\n\n\n\n","category":"function"},{"location":"chains.html#Daf.Chains.ReadOnlyChain","page":"Chains","title":"Daf.Chains.ReadOnlyChain","text":"struct ReadOnlyChain <: DafReadOnly ... end\n\nA wrapper for a chain of DafReader data, presenting them as a single DafReadOnly. When accessing the content, the exposed value is that provided by the last data set that contains the data, that is, later data sets can override earlier data sets. However, if an axis exists in more than one data set in the chain, then its entries must be identical. This isn't typically created manually; instead call chain_reader.\n\n\n\n\n\n","category":"type"},{"location":"chains.html#Daf.Chains.WriteChain","page":"Chains","title":"Daf.Chains.WriteChain","text":"struct WriteChain <: DafWriter ... end\n\nA wrapper for a chain of DafReader data, with a final DafWriter, presenting them as a single DafWriter. When accessing the content, the exposed value is that provided by the last data set that contains the data, that is, later data sets can override earlier data sets (where the writer has the final word). However, if an axis exists in more than one data set in the chain, then its entries must be identical. This isn't typically created manually; instead call chain_reader.\n\nAny modifications or additions to the chain are directed at the final writer. Deletions are only allowed for data that exists only in this writer. That is, it is impossible to delete from a chain something that exists in any of the readers; it is only possible to override it.\n\n\n\n\n\n","category":"type"},{"location":"chains.html#Index","page":"Chains","title":"Index","text":"","category":"section"},{"location":"chains.html","page":"Chains","title":"Chains","text":"Pages = [\"chains.md\"]","category":"page"},{"location":"registry.html#Operations-registry","page":"Operations registry","title":"Operations registry","text":"","category":"section"},{"location":"registry.html","page":"Operations registry","title":"Operations registry","text":"Daf.Registry","category":"page"},{"location":"registry.html#Daf.Registry","page":"Operations registry","title":"Daf.Registry","text":"Registering element-wise and reduction operations is required, to allow them to be used in a query.\n\nnote: Note\nWe do not re-export everything from here to the main Daf namespace, as it is only of interest for implementers of new query operations. Most users of Daf just stick with the (fairly comprehensive) list of built-in query operations so there's no need to pollute their namespace with these detail.\n\n\n\n\n\n","category":"module"},{"location":"registry.html#Element-wise-operations","page":"Operations registry","title":"Element-wise operations","text":"","category":"section"},{"location":"registry.html","page":"Operations registry","title":"Operations registry","text":"Daf.Registry.EltwiseOperation\nDaf.Registry.compute_eltwise","category":"page"},{"location":"registry.html#Daf.Registry.EltwiseOperation","page":"Operations registry","title":"Daf.Registry.EltwiseOperation","text":"Abstract type for all element-wise operations.\n\nIn a string query, this is specified using the % operator (e.g., % Abs, % Log base 2):\n\nEltwiseOperation := % operation ( parameter value )*\n\nSince each EltwiseOperation isa QueryOperation, you can directly apply it to a query (e.g., Axis(\"cell\") |> Lookup(\"age\") |> Abs()). For this there should be other constructor(s) tailored for this usage.\n\nAn element-wise operation may be applied to scalar, vector ot matrix data. It will preserve the shape of the data, but changes the value(s), and possibly the data type of the elements. For example, Abs will compute the absolute value of each value.\n\nTo implement a new such operation, the type is expected to be of the form:\n\nstruct MyOperation <: EltwiseOperation\n ... optional parameters ...\nend\n@query_operation MyOperation\n\nMyOperation(operation_name::Token, parameter_values::Dict{String, Token})::MyOperation\n\nThe constructor should use parse_parameter for each of the parameters (for example, using parse_number_assignment). In addition you will need to invoke @query_operation to register the operation so it can be used in a query, and implement the functions listed below. See the query operations module for details and examples.\n\n\n\n\n\n","category":"type"},{"location":"registry.html#Daf.Registry.compute_eltwise","page":"Operations registry","title":"Daf.Registry.compute_eltwise","text":"compute_eltwise(operation::EltwiseOperation, input::StorageMatrix)::StorageMatrix\ncompute_eltwise(operation::EltwiseOperation, input::StorageVector)::StorageVector\ncompute_eltwise(operation::EltwiseOperation, input_value::Number)::StorageNumber\n\nCompute an EltwiseOperation operation.\n\n\n\n\n\n","category":"function"},{"location":"registry.html#Reduction-operations","page":"Operations registry","title":"Reduction operations","text":"","category":"section"},{"location":"registry.html","page":"Operations registry","title":"Operations registry","text":"Daf.Registry.ReductionOperation\nDaf.Registry.compute_reduction\nDaf.Registry.reduction_result_type","category":"page"},{"location":"registry.html#Daf.Registry.ReductionOperation","page":"Operations registry","title":"Daf.Registry.ReductionOperation","text":"Abstract type for all reduction operations.\n\nIn a string query, this is specified using the %> operator (e.g., %> Sum, %> Quantile fraction 0.05):\n\nReductionOperation := %> operation ( parameter value )*\n\nSince each ReductionOperation isa QueryOperation, you can directly apply it to a query (e.g., Axis(\"cell\") |> Axis(\"gene\") |> Lookup(\"UMIs\") |> Quantile(0.05)). For this there should be other constructor(s) tailored for this usage.\n\nA reduction operation may be applied to matrix or vector data. It will reduce (eliminate) one dimension of the data, and possibly the result will have a different data type than the input. When applied to a vector, the operation will return a scalar. When applied to a matrix, it assumes the matrix is in column-major layout, and will return a vector with one entry per column, containing the result of reducing the column to a scalar.\n\nTo implement a new such operation, the type is expected to be of the form:\n\nstruct MyOperation <: ReductionOperation\n ... optional parameters ...\nend\n\nMyOperation(operation_name::Token, parameter_values::Dict{String, Token})::MyOperation\n\nThe constructor should use parse_parameter for each of the parameters (for example, using typically parse_number_assignment). In addition you will need to invoke @query_operation to register the operation so it can be used in a query, and implement the functions listed below. See the query operations module for details and examples.\n\n\n\n\n\n","category":"type"},{"location":"registry.html#Daf.Registry.compute_reduction","page":"Operations registry","title":"Daf.Registry.compute_reduction","text":"compute_reduction(operation::ReductionOperation, input::StorageMatrix)::StorageVector\ncompute_reduction(operation::ReductionOperation, input::StorageVector)::StorageNumber\n\nSince each ReductionOperation isa QueryOperation, you can directly apply it to a query (e.g., Axis(\"cell\") |> Axis(\"gene\") |> Lookup(\"UMIs\") |> Sum()). For this there should be other constructor(s) tailored for this usage.\n\n\n\n\n\n","category":"function"},{"location":"registry.html#Daf.Registry.reduction_result_type","page":"Operations registry","title":"Daf.Registry.reduction_result_type","text":"reduction_result_type(operation::ReductionOperation, eltype::Type)::Type\n\nReturn the data type of the result of the reduction operation if applied to a vector of the specified eltype.\n\n\n\n\n\n","category":"function"},{"location":"registry.html#Registering-operations","page":"Operations registry","title":"Registering operations","text":"","category":"section"},{"location":"registry.html","page":"Operations registry","title":"Operations registry","text":"Daf.Registry.QueryOperation\nDaf.Registry.@query_operation\nDaf.Registry.register_query_operation","category":"page"},{"location":"registry.html#Daf.Registry.QueryOperation","page":"Operations registry","title":"Daf.Registry.QueryOperation","text":"Abstract interface for all query operations. An actual query is a series of these operations which, when applied to Daf data, compute some result.\n\n\n\n\n\n","category":"type"},{"location":"registry.html#Daf.Registry.@query_operation","page":"Operations registry","title":"Daf.Registry.@query_operation","text":"struct MyOperation <: EltwiseOperation # Or <: ReductionOperation\n ...\nend\n@query_operation MyOperation\n\nAutomatically call register_query_operation for MyOperation.\n\nNote this will import Daf.Registry.register_query_operation, so it may only be called from the top level scope of a module.\n\n\n\n\n\n","category":"macro"},{"location":"registry.html#Daf.Registry.register_query_operation","page":"Operations registry","title":"Daf.Registry.register_query_operation","text":"register_query_operation(\n type::Type{<:Union{EltwiseOperation, ReductionOperation}},\n source_file::AbstractString,\n source_line::Integer,\n)::Nothing\n\nRegister a specific operation so it would be available inside queries. This is required to be able to parse the operation. This is idempotent (safe to invoke multiple times).\n\nThis isn't usually called directly. Instead, it is typically invoked by using the @query_operation macro.\n\n\n\n\n\n","category":"function"},{"location":"registry.html#Index","page":"Operations registry","title":"Index","text":"","category":"section"},{"location":"registry.html","page":"Operations registry","title":"Operations registry","text":"Pages = [\"registry.md\"]","category":"page"},{"location":"computations.html#Computations","page":"Computations","title":"Computations","text":"","category":"section"},{"location":"computations.html","page":"Computations","title":"Computations","text":"Daf.Computations\nDaf.Computations.@computation\nDaf.Computations.DEFAULT\nDaf.Computations.CONTRACT\nDaf.Computations.CONTRACT1\nDaf.Computations.CONTRACT2","category":"page"},{"location":"computations.html#Daf.Computations","page":"Computations","title":"Daf.Computations","text":"Support writing \"well-behaved\" computations. Such computations declare a Contract describing their inputs and outputs. This is enforced, so that the implementation need not worry about missing inputs, and the caller can rely on the results. It is also self-documenting, so the generated documentation is always contains a clear up-to-date description of the contract.\n\n\n\n\n\n","category":"module"},{"location":"computations.html#Daf.Computations.@computation","page":"Computations","title":"Daf.Computations.@computation","text":"@computation function something(...)\n return ...\nend\n\n@computation Contract(...) function something(daf::DafWriter, ...)\n return ...\nend\n\n@computation Contract(...) Contract(...) function something(\n first::DafReader/DafWriter, second::DafReader/DafWriter, ...\n)\n return ...\nend\n\nMark a function as a Daf computation. This has the following effects:\n\nIt verifies that the Daf data satisfies the Contract, when the computation is invoked and when it is complete (using verify_input and verify_output).\nIt stashes the contract(s) (if any) in a global variable. This allows expanding CONTRACT in the documentation string (for a single contract case), or CONTRACT1 and CONTRACT2 (for the dual contract case).\nIt stashes the default value of named arguments. This allows expanding DEFAULT in the documentation string, which is especially useful if these defaults are computed, read from global constants, etc.\nIt logs the invocation of the function (using @debug), including the actual values of the named arguments (using depict).\n\nnote: Note\nFor each Contract parameter (if any), there needs to be a DafReader or DafWriter, which the contract(s) will be applied to. These parameters should be the initial positional parameters of the function.\n\n\n\n\n\n","category":"macro"},{"location":"computations.html#Daf.Computations.DEFAULT","page":"Computations","title":"Daf.Computations.DEFAULT","text":"When using @computation:\n\n'''\n something(daf::DafWriter, x::Int = $(DEFAULT.x); y::Bool = $(DEFAULT.y))\n\n...\nIf `x` (default: $(DEFAULT.y)) is even, ...\n...\nIf `y` (default: $(DEFAULT.y)) is set, ...\n...\n'''\n@computation Contract(...)\nfunction something(daf::DafWriter, x::Int = 0; y::Bool = false)\n return ...\nend\n\nThen $(DEFAULT.x) will be expanded with the default value of the parameter x. It is good practice to contain a description of the effects of each parameter somewhere in the documentation, and it is polite to also provide its default value. This can be done in either the signature line or in the text, or both. Using DEFAULT ensures that the correct value is used in the documentation.\n\n\n\n\n\n","category":"constant"},{"location":"computations.html#Daf.Computations.CONTRACT","page":"Computations","title":"Daf.Computations.CONTRACT","text":"When using @computation:\n\n'''\n...\n# Contract\n...\n$(CONTRACT)\n...\n'''\n@computation Contract(...)\nfunction something(daf::DafWriter, ...)\n return ...\nend\n\nThen $(CONTRACT) will be expanded with a description of the Contract. This is based on DocStringExtensions.\n\nnote: Note\nThe first argument of the function must be a DafWriter, which the contract will be applied to.\n\n\n\n\n\n","category":"constant"},{"location":"computations.html#Daf.Computations.CONTRACT1","page":"Computations","title":"Daf.Computations.CONTRACT1","text":"Same as CONTRACT, but reference the contract for the 1st Daf argument for a @computation with two such arguments.\n\n\n\n\n\n","category":"constant"},{"location":"computations.html#Daf.Computations.CONTRACT2","page":"Computations","title":"Daf.Computations.CONTRACT2","text":"Same as CONTRACT, but reference the contract for the 2nd Daf argument for a @computation with two such arguments.\n\n\n\n\n\n","category":"constant"},{"location":"computations.html#Index","page":"Computations","title":"Index","text":"","category":"section"},{"location":"computations.html","page":"Computations","title":"Computations","text":"Pages = [\"computations.md\"]","category":"page"},{"location":"reconstruction.html#Reconstruction","page":"Reconstruction","title":"Reconstruction","text":"","category":"section"},{"location":"reconstruction.html","page":"Reconstruction","title":"Reconstruction","text":"Daf.Reconstruction\nDaf.Reconstruction.reconstruct_axis!","category":"page"},{"location":"reconstruction.html#Daf.Reconstruction","page":"Reconstruction","title":"Daf.Reconstruction","text":"Reconstruct implicit axes. Due to AnnData two-axes limitations, other axes are often represented by storing their expanded data (e.g., a type for each cell, and a color for each cell, where the color is actually per type). When converting such data to Daf, it is useful to reconstruct such axes (e.g., create a type axis, assign a color for each type, and delete the per-cell color property).\n\n\n\n\n\n","category":"module"},{"location":"reconstruction.html#Daf.Reconstruction.reconstruct_axis!","page":"Reconstruction","title":"Daf.Reconstruction.reconstruct_axis!","text":"reconstruct_axis!(\n daf::DafWriter;\n existing_axis::AbstractString,\n implicit_axis::AbstractString,\n [rename_axis::Maybe{AbstractString} = nothing,\n empty_implicit::Maybe{StorageScalar} = nothing,\n implicit_properties::Maybe{AbstractSet{<:AbstractString}} = nothing,\n properties_defaults::Maybe{AbstractDict} = nothing]\n)::AbstractDict{<:AbstractString, Maybe{StorageScalar}}\n\nGiven an existing_axis in daf, which has a property implicit_axis, create a new axis with the same name (or, if specified, call it rename_axis). If empty_implicit is specified, this value of the property is replaced by the empty string (indicate there is no value associated with the existing_axis entry). For each of the implicit_properties, we collect the mapping between the implicit_axis and the property values, and store it as a property of the newly created axis.\n\nIf the implicit_axis already exists, we verify that all the values provided for it by the existing_axis do, in fact, exist as names of entries in the implicit_axis. This allows manually creating the implicit_axis with additional entries that are not currently in use.\n\nnote: Note\nIf the implicit_axis already exists and contains entries that aren't currently in use, you must specify properties_defaults for the values of these entries of the reconstructed properties.Due to Julia's type system limitations, there's just no way for the system to enforce the type of the pairs in this vector. That is, what we'd like to say is:properties_defaults::Maybe{AbstractDict{<:AbstractString, <:StorageScalar}} = nothingBut what we are forced to say is:properties_defaults::Maybe{Dict} = nothingGlory to anyone who figures out an incantation that would force the system to perform more meaningful type inference here.\n\nIf implicit_properties are explicitly specified, then we require the mapping from implicit_axis to be consistent. Otherwise, we look at all the properties of the existing_axis, and check for each one whether the mapping is consistent; if it is, we migrate the property to the new axis. For example, when importing AnnData containing per-cell data, it isn't always clear which property is actually per-batch (e.g., cell age) and which is actually per cell (e.g., doublet score). Not specifying the implicit_properties allows the function to figure it out on its own.\n\nnote: Note\nFor each converted property, the value associated with existing_axis entries which have no implicit_axis value (that is, have an empty string or empty_implicit value) is lost. For example, if each cell type has a color, but some cells do not have a type, then the color of \"cells with no type\" is lost. We still require this value to be consistent, and return a mapping between each migrated property name and the value of such entries (if any exist). When reconstructing the original property, specify this value using IfNot (e.g., / cell : type => color ?? magenta).\n\n\n\n\n\n","category":"function"},{"location":"reconstruction.html#Index","page":"Reconstruction","title":"Index","text":"","category":"section"},{"location":"reconstruction.html","page":"Reconstruction","title":"Reconstruction","text":"Pages = [\"reconstruction.md\"]","category":"page"},{"location":"contracts.html#Contracts","page":"Contracts","title":"Contracts","text":"","category":"section"},{"location":"contracts.html","page":"Contracts","title":"Contracts","text":"Daf.Contracts\nDaf.Contracts.Contract\nDaf.Contracts.ContractAxes\nDaf.Contracts.ContractData\nDaf.Contracts.ContractExpectation\nDaf.Contracts.verify_input\nDaf.Contracts.verify_output","category":"page"},{"location":"contracts.html#Daf.Contracts","page":"Contracts","title":"Daf.Contracts","text":"Enforce input and output contracts of computations using Daf data.\n\n\n\n\n\n","category":"module"},{"location":"contracts.html#Daf.Contracts.Contract","page":"Contracts","title":"Daf.Contracts.Contract","text":"Contract(;\n [axes::Maybe{ContractAxes} = nothing,\n data::Maybe{ContractData} = nothing]\n)::Contract\n\nThe contract of a computational tool, specifing the ContractAxes and ContractData.\n\n\n\n\n\n","category":"type"},{"location":"contracts.html#Daf.Contracts.ContractAxes","page":"Contracts","title":"Daf.Contracts.ContractAxes","text":"A vector of pairs where the key is the axis name and the value is a tuple of the ContractExpectation and a description of the axis (for documentation). Axes are listed mainly for documentation; axes of required or guaranteed vectors or matrices are automatically required or guaranteed to match. However it is considered polite to explicitly list the axes with their descriptions so the documentation of the contract will be complete.\n\nnote: Note\nDue to Julia's type system limitations, there's just no way for the system to enforce the type of the pairs in this vector. That is, what we'd like to say is:ContractAxes = AbstractVector{Pair{AbstractString, Tuple{ContractExpectation, AbstractString}}}But what we are forced to say is:ContractAxes = AbstractVector{<:Pair}Glory to anyone who figures out an incantation that would force the system to perform more meaningful type inference here.\n\n\n\n\n\n","category":"type"},{"location":"contracts.html#Daf.Contracts.ContractData","page":"Contracts","title":"Daf.Contracts.ContractData","text":"A vector of pairs where the key is a DataKey identifying some data property, and the value is a tuple of the ContractExpectation, the expected data type, and a description (for documentation).\n\nnote: Note\nDue to Julia's type system limitations, there's just no way for the system to enforce the type of the pairs in this vector. That is, what we'd like to say is:ContractData = AbstractVector{Pair{DataKey, Tuple{ContractExpectation, Type, AbstractString}}}But what we are forced to say is:ContractData = AbstractVector{<:Pair}Glory to anyone who figures out an incantation that would force the system to perform more meaningful type inference here.\n\n\n\n\n\n","category":"type"},{"location":"contracts.html#Daf.Contracts.ContractExpectation","page":"Contracts","title":"Daf.Contracts.ContractExpectation","text":"The expectation from a specific property for a computation on Daf data.\n\nInput data:\n\nRequiredInput - data that must exist in the data when invoking the computation, will be used as input.\n\nOptionalInput - data that, if existing in the data when invoking the computation, will be used as an input.\n\nOutput data:\n\nGuaranteedOutput - data that is guaranteed to exist when the computation is done.\n\nOptionalOutput - data that may exist when the computation is done, depending on some condition, which may include the existence of optional input and/or the value of parameters to the computation, and/or the content of the data.\n\n\n\n\n\n","category":"type"},{"location":"contracts.html#Daf.Contracts.verify_input","page":"Contracts","title":"Daf.Contracts.verify_input","text":"verify_input(daf::DafReader, contract::Contract, computation::AbstractString)::Nothing\n\nVerify the daf data when a computation is invoked. This verifies that all the required data exists and is of the appropriate type, and that if any of the optional data exists, it has the appropriate type.\n\n\n\n\n\n","category":"function"},{"location":"contracts.html#Daf.Contracts.verify_output","page":"Contracts","title":"Daf.Contracts.verify_output","text":"verify_output(daf::DafReader, contract::Contract, computation::AbstractString)::Nothing\n\nVerify the daf data when a computation is complete. This verifies that all the guaranteed output data exists and is of the appropriate type, and that if any of the optional output data exists, it has the appropriate type.\n\n\n\n\n\n","category":"function"},{"location":"contracts.html#Index","page":"Contracts","title":"Index","text":"","category":"section"},{"location":"contracts.html","page":"Contracts","title":"Contracts","text":"Pages = [\"contracts.md\"]","category":"page"},{"location":"messages.html#Messages","page":"Messages","title":"Messages","text":"","category":"section"},{"location":"messages.html","page":"Messages","title":"Messages","text":"Daf.Messages","category":"page"},{"location":"messages.html#Daf.Messages","page":"Messages","title":"Daf.Messages","text":"Functions for improving the quality of error and log messages.\n\n\n\n\n\n","category":"module"},{"location":"messages.html","page":"Messages","title":"Messages","text":"Daf.Messages.unique_name\nDaf.Messages.depict\nDaf.Messages.depict_percent","category":"page"},{"location":"messages.html#Daf.Messages.unique_name","page":"Messages","title":"Daf.Messages.unique_name","text":"unique_name(prefix::AbstractString)::AbstractString\n\nUsing short, human-readable unique names for things is a great help when debugging. Normally one has to choose between using a human-provided short non-unique name, and an opaque object identifier, or a combination thereof. This function replaces the opaque object identifier with a short counter, which gives names that are both unique and short.\n\nThat is, this will return a unique name starting with the prefix and followed by #, the process index (if using multiple processes), and an index (how many times this name was used in the process). For example, unique_name(\"foo\") will return foo for the first usage, foo#2 for the 2nd, etc. If using multiple processes, it will return foo, foo#1.2, etc.\n\nThat is, for code where the names are unique (e.g., a simple script or Jupyter notebook), this doesn't mess up the names. It only appends a suffix to the names if it is needed to disambiguate between multiple uses of the same name.\n\nTo help with tests, if the prefix ends with !, we return it as-is, accepting it may not be unique.\n\n\n\n\n\n","category":"function"},{"location":"messages.html#Daf.MatrixLayouts.depict","page":"Messages","title":"Daf.MatrixLayouts.depict","text":"depict(value::Any)::String\n\nDepict a value in an error message or a log entry. Unlike \"$(value)\", this focuses on producing a human-readable indication of the type of the value, so it double-quotes strings, prefixes symbols with :, and reports the type and sizes of arrays rather than showing their content, as well as having specializations for the various Daf data types.\n\n\n\n\n\n","category":"function"},{"location":"messages.html#Daf.Messages.depict_percent","page":"Messages","title":"Daf.Messages.depict_percent","text":"depict_percent(used::Integer, out_of::Integer)::String\n\nDescribe a fraction of used amount out_of some total as a percentage.\n\n\n\n\n\n","category":"function"},{"location":"messages.html#Index","page":"Messages","title":"Index","text":"","category":"section"},{"location":"messages.html","page":"Messages","title":"Messages","text":"Pages = [\"messages.md\"]","category":"page"},{"location":"matrix_layouts.html#Matrix-layouts","page":"Matrix layouts","title":"Matrix layouts","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Daf.MatrixLayouts","category":"page"},{"location":"matrix_layouts.html#Daf.MatrixLayouts","page":"Matrix layouts","title":"Daf.MatrixLayouts","text":"All stored Daf matrix data has a clear matrix layout, that is, a major_axis, regardless of whether it is dense or sparse.\n\nThat is, for Columns-major data, the values of each column are laid out consecutively in memory (each column is a single contiguous vector), so any operation that works on whole columns will be fast (e.g., summing the value of each column). In contrast, the values of each row are stored far apart from each other, so any operation that works on whole rows will be very slow in comparison (e.g., summing the value of each row).\n\nFor Rows-major data, the values of each row are laid out consecutively in memory (each row is a single contiguous vector). In contrast, the values of each column are stored far apart from each other. In this case, summing columns would be slow, and summing rows would be fast.\n\nThis is much simpler than the ArrayLayouts module which attempts to fully describe the layout of N-dimensional arrays, a much more ambitious goal which is an overkill for our needs.\n\nnote: Note\nThe \"default\" layout in Julia is column-major, which inherits this from matlab, which inherits this from FORTRAN, allegedly because this is more efficient for some linear algebra operations. In contrast, Python numpy uses row-major layout by default. In either case, this is just an arbitrary convention, and all systems work just fine with data of either memory layout; the key consideration is to keep track of the layout, and to apply operations \"with the grain\" rather than \"against the grain\" of the data.\n\n\n\n\n\n","category":"module"},{"location":"matrix_layouts.html#Symbolic-names-for-axes","page":"Matrix layouts","title":"Symbolic names for axes","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Daf.MatrixLayouts.Rows\nDaf.MatrixLayouts.Columns\nDaf.MatrixLayouts.axis_name","category":"page"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.Rows","page":"Matrix layouts","title":"Daf.MatrixLayouts.Rows","text":"A symbolic name for the rows axis. It is much more readable to write, say, size(matrix, Rows), instead of size(matrix, 1).\n\n\n\n\n\n","category":"constant"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.Columns","page":"Matrix layouts","title":"Daf.MatrixLayouts.Columns","text":"A symbolic name for the rows axis. It is much more readable to write, say, size(matrix, Columns), instead of size(matrix, 2).\n\n\n\n\n\n","category":"constant"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.axis_name","page":"Matrix layouts","title":"Daf.MatrixLayouts.axis_name","text":"axis_name(axis::Maybe{Integer})::String\n\nReturn the name of the axis (for messages).\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Checking-layout","page":"Matrix layouts","title":"Checking layout","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Daf.MatrixLayouts.major_axis\nDaf.MatrixLayouts.require_major_axis\nDaf.MatrixLayouts.minor_axis\nDaf.MatrixLayouts.require_minor_axis\nDaf.MatrixLayouts.other_axis","category":"page"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.major_axis","page":"Matrix layouts","title":"Daf.MatrixLayouts.major_axis","text":"major_axis(matrix::AbstractMatrix)::Maybe{Int8}\n\nReturn the index of the major axis of a matrix, that is, the axis one should keep fixed for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns nothing.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.require_major_axis","page":"Matrix layouts","title":"Daf.MatrixLayouts.require_major_axis","text":"require_major_axis(matrix::AbstractMatrix)::Int8\n\nSimilar to major_axis but will error if the matrix isn't in either row-major or column-major layout.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.minor_axis","page":"Matrix layouts","title":"Daf.MatrixLayouts.minor_axis","text":"minor_axis(matrix::AbstractMatrix)::Maybe{Int8}\n\nReturn the index of the minor axis of a matrix, that is, the axis one should vary for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns nothing.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.require_minor_axis","page":"Matrix layouts","title":"Daf.MatrixLayouts.require_minor_axis","text":"require_minor_axis(matrix::AbstractMatrix)::Int8\n\nSimilar to minor_axis but will error if the matrix isn't in either row-major or column-major layout.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.other_axis","page":"Matrix layouts","title":"Daf.MatrixLayouts.other_axis","text":"other_axis(axis::Maybe{Integer})::Maybe{Int8}\n\nReturn the other matrix axis (that is, convert between Rows and Columns). If given nothing returns nothing.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Changing-layout","page":"Matrix layouts","title":"Changing layout","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Daf.MatrixLayouts.relayout!\nDaf.MatrixLayouts.transposer!\nDaf.ReadOnly.copy_array","category":"page"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.relayout!","page":"Matrix layouts","title":"Daf.MatrixLayouts.relayout!","text":"relayout!(matrix::AbstractMatrix)::AbstractMatrix\nrelayout!(matrix::NamedMatrix)::NamedMatrix\nrelayout!(destination::AbstractMatrix, source::AbstractMatrix)::AbstractMatrix\nrelayout!(destination::AbstractMatrix, source::NamedMatrix)::NamedMatrix\n\nReturn the same matrix data, but in the other memory layout.\n\nSuppose you have a column-major UMIs matrix, whose rows are cells, and columns are genes. Therefore, summing the UMIs of a gene will be fast, but summing the UMIs of a cell will be slow. A transpose (no !) of a matrix is fast; it creates a zero-copy wrapper of the matrix with flipped axes, so its rows will be genes and columns will be cells, but in row-major layout. Therefore, still, summing the UMIs of a gene is fast, and summing the UMIs of a cell is slow.\n\nIn contrast, transpose! (with a !) (or transposer!) is slow; it creates a rearranged copy of the data, also returning a matrix whose rows are genes and columns are cells, but this time, in column-major layout. Therefore, in this case summing the UMIs of a gene will be slow, and summing the UMIs of a cell will be fast.\n\nnote: Note\nIt is almost always worthwhile to relayout! a matrix and then perform operations \"with the grain\" of the data, instead of skipping it and performing operations \"against the grain\" of the data. This is because (in Julia at least) the implementation of transpose! is optimized for the task, while the other operations typically don't provide any specific optimizations for working \"against the grain\" of the data. The benefits of a relayout! become even more significant when performing a series of operations (e.g., summing the gene UMIs in each cell, converting gene UMIs to fractions out of these totals, then computing the log base 2 of this fraction).\n\nIf you transpose (no !) the result of transpose! (with a !), you end up with a matrix that appears to be the same as the original (rows are cells and columns are genes), but behaves differently - summing the UMIs of a gene will be slow, and summing the UMIs of a cell is fast. This transpose of transpose! is a common idiom and is basically what relayout! does for you. In addition, relayout! will work for both sparse and dense matrices, and if destination is not specified, a similar matrix is allocated automatically for it.\n\nnote: Note\nThe caller is responsible for providing a sensible destination matrix (sparse for a sparse source, dense for a non-sparse source). This can be a transposed matrix. If source is a NamedMatrix, then the result will be a NamedMatrix with the same axes. If destination is also a NamedMatrix, then its axes must match source.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.transposer!","page":"Matrix layouts","title":"Daf.MatrixLayouts.transposer!","text":"transposer!(matrix::AbstractMatrix)::AbstractMatrix\ntransposer!(matrix::NamedMatrix)::NamedMatrix\n\nThis is a shorthand for LinearAlgebra.transpose!(similar(transpose(m)), m). That is, this will return a transpose of a matrix, but instead of simply using a zero-copy wrapper, it actually rearranges the data. See relayout!.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.copy_array","page":"Matrix layouts","title":"Daf.MatrixLayouts.copy_array","text":"copy_array(array::AbstractArray)::AbstractArray\n\nCreate a mutable copy of an array. This differs from Base.copy in the following:\n\nCopying a read-only array is a mutable array. In contrast, both Base.copy and Base.deepcopy of a read-only array will return a read-only array, which is technically correct, but is rather pointless for Base.copy.\nCopying will preserve the layout of the data; for example, copying a Transpose array is still a Transpose array. In contrast, while Base.deepcopy will preserve the layout, Base.copy will silently relayout! the matrix, which is both expensive and confusing.\nCopying a sparse vector or matrix gives the same type of sparse array or matrix. Copying anything else gives a simple dense array regardless of the original type. This is done because a deepcopy of PyArray will still share the underlying buffer. Sigh.\nCopying a vector of anything derived from AbstractString returns a vector of AbstractString.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Ensuring-code-efficiency","page":"Matrix layouts","title":"Ensuring code efficiency","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Daf.MatrixLayouts.check_efficient_action\nDaf.MatrixLayouts.inefficient_action_handler","category":"page"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.check_efficient_action","page":"Matrix layouts","title":"Daf.MatrixLayouts.check_efficient_action","text":"check_efficient_action(\n action::AbstractString,\n axis::Integer,\n operand::AbstractString,\n matrix::AbstractMatrix,\n)::Nothing\n\nThis will check whether the action about to be executed for an operand which is matrix works \"with the grain\" of the data, which requires the matrix to be in axis-major layout. If it isn't, then apply the inefficient_action_handler.\n\nIn general, you really want operations to go \"with the grain\" of the data. Unfortunately, Julia (and Python, and R, and matlab) will silently run operations \"against the grain\", which would be painfully slow. A liberal application of this function in your code will help in detecting such slowdowns, without having to resort to profiling the code to isolate the problem.\n\nnote: Note\nThis will not prevent the code from performing \"against the grain\" operations such as selectdim(matrix, Rows, 1) for a column-major matrix, but if you add this check before performing any (series of) operations on a matrix, then you will have a clear indication of whether (and where) such operations occur. You can then consider whether to invoke relayout! on the data, or (for data fetched from Daf), simply query for the other memory layout.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Daf.MatrixLayouts.inefficient_action_handler","page":"Matrix layouts","title":"Daf.MatrixLayouts.inefficient_action_handler","text":"inefficient_action_handler(handler::AbnormalHandler)::AbnormalHandler\n\nSpecify the AbnormalHandler to use when accessing a matrix in an inefficient way (\"against the grain\"). Returns the previous handler. The default handler is WarnHandler.\n\n\n\n\n\n","category":"function"},{"location":"matrix_layouts.html#Index","page":"Matrix layouts","title":"Index","text":"","category":"section"},{"location":"matrix_layouts.html","page":"Matrix layouts","title":"Matrix layouts","text":"Pages = [\"matrix_layouts.md\"]","category":"page"},{"location":"readers.html#Readers","page":"Readers","title":"Readers","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers\nDaf.Readers.description","category":"page"},{"location":"readers.html#Daf.Readers","page":"Readers","title":"Daf.Readers","text":"The DafReader interface specifies a high-level API for reading Daf data. This API is implemented here, on top of the low-level FormatReader API. The high-level API provides thread safety so the low-level API can (mostly) ignore this issue.\n\nEach data set is given a name to use in error messages etc. You can explicitly set this name when creating a Daf object. Otherwise, when opening an existing data set, if it contains a scalar \"name\" property, it is used. Otherwise some reasonable default is used. In all cases, object names are passed through unique_name to avoid ambiguity.\n\nData properties are identified by a unique name given the axes they are based on. That is, there is a separate namespace for scalar properties, vector properties for each specific axis, and matrix properties for each unordered pair of axes.\n\nFor matrices, we keep careful track of their MatrixLayouts. Returned matrices are always in column-major layout, using relayout! if necessary. As this is an expensive operation, we'll cache the result in memory. Similarly, we cache the results of applying a query to the data. We allow clearing the cache to reduce memory usage, if necessary.\n\nThe data API is the high-level API intended to be used from outside the package, and is therefore re-exported from the top-level Daf namespace. It provides additional functionality on top of the low-level FormatReader implementation, accepting more general data types, automatically dealing with relayout! when needed. In particular, it enforces single-writer multiple-readers for each data set, so the format code can ignore multi-threading and still be thread-safe.\n\nnote: Note\nIn the APIs below, when getting a value, specifying a default of undef means that it is an error for the value not to exist. In contrast, specifying a default of nothing means it is OK for the value not to exist, returning nothing. Specifying an actual value for default means it is OK for the value not to exist, returning the default instead. This is in spirit with, but not identical to, undef being used as a flag for array construction saying \"there is no initializer\". If you feel this is an abuse of the undef value, take some comfort in that it is the default value for the default, so you almost never have to write it explicitly in your code.\n\n\n\n\n\n","category":"module"},{"location":"readers.html#Daf.Readers.description","page":"Readers","title":"Daf.Readers.description","text":"description(daf::DafReader[; deep::Bool = false])::AbstractString\n\nReturn a (multi-line) description of the contents of daf. This tries to hit a sweet spot between usefulness and terseness. If cache, also describes the content of the cache. If deep, also describes any data set nested inside this one (if any).\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Scalar-properties","page":"Readers","title":"Scalar properties","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers.has_scalar\nDaf.Readers.scalars_set\nDaf.Readers.get_scalar","category":"page"},{"location":"readers.html#Daf.Readers.has_scalar","page":"Readers","title":"Daf.Readers.has_scalar","text":"has_scalar(daf::DafReader, name::AbstractString)::Bool\n\nCheck whether a scalar property with some name exists in daf.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.scalars_set","page":"Readers","title":"Daf.Readers.scalars_set","text":"scalars_set(daf::DafReader)::AbstractSet{<:AbstractString}\n\nThe names of the scalar properties in daf.\n\nnote: Note\nThere's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.get_scalar","page":"Readers","title":"Daf.Readers.get_scalar","text":"get_scalar(\n daf::DafReader,\n name::AbstractString;\n [default::Union{StorageScalar, Nothing, UndefInitializer} = undef]\n)::Maybe{StorageScalar}\n\nGet the value of a scalar property with some name in daf.\n\nIf default is undef (the default), this first verifies the name scalar property exists in daf. Otherwise default will be returned if the property does not exist.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Readers-axes","page":"Readers","title":"Readers axes","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers.has_axis\nDaf.Readers.axes_set\nDaf.Readers.axis_array\nDaf.Readers.axis_dict\nDaf.Readers.axis_indices\nDaf.Readers.axis_length","category":"page"},{"location":"readers.html#Daf.Readers.has_axis","page":"Readers","title":"Daf.Readers.has_axis","text":"has_axis(daf::DafReader, axis::AbstractString)::Bool\n\nCheck whether some axis exists in daf.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.axes_set","page":"Readers","title":"Daf.Readers.axes_set","text":"axes_set(daf::DafReader)::AbstractSet{<:AbstractString}\n\nThe names of the axes of daf.\n\nnote: Note\nThere's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.axis_array","page":"Readers","title":"Daf.Readers.axis_array","text":"axis_array(\n daf::DafReader,\n axis::AbstractString;\n [default::Union{Nothing, UndefInitializer} = undef]\n)::Maybe{AbstractVector{<:AbstractString}}\n\nThe array of unique names of the entries of some axis of daf. This is similar to doing get_vector for the special name property, except that it returns a simple vector (array) of strings instead of a NamedVector.\n\nIf default is undef (the default), this verifies the axis exists in daf. Otherwise, the default is nothing, which is returned if the axis does not exist.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.axis_dict","page":"Readers","title":"Daf.Readers.axis_dict","text":"function axis_dict(daf::DafReader, axis::AbstractString)::AbstractDict{<:AbstractString, <:Integer}\n\nReturn a dictionary converting axis entry names to their integer index.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.axis_indices","page":"Readers","title":"Daf.Readers.axis_indices","text":"function axis_indices(daf::DafReader, axis::AbstractString, entries::AbstractVector{<:AbstractString})::AbstractVector{<:Integer}\n\nReturn a vector of the indices of the entries in the axis.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.axis_length","page":"Readers","title":"Daf.Readers.axis_length","text":"axis_length(daf::DafReader, axis::AbstractString)::Int64\n\nThe number of entries along the axis in daf.\n\nThis first verifies the axis exists in daf.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Vector-properties","page":"Readers","title":"Vector properties","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers.has_vector\nDaf.Readers.vectors_set\nDaf.Readers.get_vector","category":"page"},{"location":"readers.html#Daf.Readers.has_vector","page":"Readers","title":"Daf.Readers.has_vector","text":"has_vector(daf::DafReader, axis::AbstractString, name::AbstractString)::Bool\n\nCheck whether a vector property with some name exists for the axis in daf. This is always true for the special name property.\n\nThis first verifies the axis exists in daf.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.vectors_set","page":"Readers","title":"Daf.Readers.vectors_set","text":"vectors_set(daf::DafReader, axis::AbstractString)::AbstractSet{<:AbstractString}\n\nThe names of the vector properties for the axis in daf, not including the special name property.\n\nThis first verifies the axis exists in daf.\n\nnote: Note\nThere's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.get_vector","page":"Readers","title":"Daf.Readers.get_vector","text":"get_vector(\n daf::DafReader,\n axis::AbstractString,\n name::AbstractString;\n [default::Union{StorageScalar, StorageVector, Nothing, UndefInitializer} = undef]\n)::Maybe{NamedVector}\n\nGet the vector property with some name for some axis in daf. The names of the result are the names of the vector entries (same as returned by axis_array). The special property name returns an array whose values are also the (read-only) names of the entries of the axis.\n\nThis first verifies the axis exists in daf. If default is undef (the default), this first verifies the name vector exists in daf. Otherwise, if default is nothing, it will be returned. If it is a StorageVector, it has to be of the same size as the axis, and is returned. If it is a StorageScalar. Otherwise, a new Vector is created of the correct size containing the default, and is returned.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Matrix-properties","page":"Readers","title":"Matrix properties","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers.has_matrix\nDaf.Readers.matrices_set\nDaf.Readers.get_matrix","category":"page"},{"location":"readers.html#Daf.Readers.has_matrix","page":"Readers","title":"Daf.Readers.has_matrix","text":"has_matrix(\n daf::DafReader,\n rows_axis::AbstractString,\n columns_axis::AbstractString,\n name::AbstractString;\n [relayout::Bool = true]\n)::Bool\n\nCheck whether a matrix property with some name exists for the rows_axis and the columns_axis in daf. Since this is Julia, this means a column-major matrix. A daf may contain two copies of the same data, in which case it would report the matrix under both axis orders.\n\nIf relayout (the default), this will also check whether the data exists in the other layout (that is, with flipped axes).\n\nThis first verifies the rows_axis and columns_axis exists in daf.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.matrices_set","page":"Readers","title":"Daf.Readers.matrices_set","text":"matrices_set(\n daf::DafReader,\n rows_axis::AbstractString,\n columns_axis::AbstractString;\n [relayout::Bool = true]\n)::AbstractSet{<:AbstractString}\n\nThe names of the matrix properties for the rows_axis and columns_axis in daf.\n\nIf relayout (default), then this will include the names of matrices that exist in the other layout (that is, with flipped axes).\n\nThis first verifies the rows_axis and columns_axis exist in daf.\n\nnote: Note\nThere's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.get_matrix","page":"Readers","title":"Daf.Readers.get_matrix","text":"get_matrix(\n daf::DafReader,\n rows_axis::AbstractString,\n columns_axis::AbstractString,\n name::AbstractString;\n [default::Union{StorageNumber, StorageMatrix, Nothing, UndefInitializer} = undef,\n relayout::Bool = true]\n)::Maybe{NamedMatrix}\n\nGet the column-major matrix property with some name for some rows_axis and columns_axis in daf. The names of the result axes are the names of the relevant axes entries (same as returned by axis_array).\n\nIf relayout (the default), then if the matrix is only stored in the other memory layout (that is, with flipped axes), then automatically call relayout! to compute the result. If daf isa DafWriter, then store the result for future use; otherwise, just cache it as MemoryData. This may lock up very large amounts of memory; you can call empty_cache! to release it.\n\nThis first verifies the rows_axis and columns_axis exist in daf. If default is undef (the default), this first verifies the name matrix exists in daf. Otherwise, if default is nothing, it is returned. If default is a StorageMatrix, it has to be of the same size as the rows_axis and columns_axis, and is returned. Otherwise, a new Matrix is created of the correct size containing the default, and is returned.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Utilities","page":"Readers","title":"Utilities","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Daf.Readers.axis_version_counter\nDaf.Readers.vector_version_counter\nDaf.Readers.matrix_version_counter","category":"page"},{"location":"readers.html#Daf.Readers.axis_version_counter","page":"Readers","title":"Daf.Readers.axis_version_counter","text":"axis_version_counter(daf::DafReader, axis::AbstractString)::UInt32\n\nReturn the version number of the axis. This is incremented every time delete_axis! is called. It is used by interfaces to other programming languages to minimize copying data.\n\nnote: Note\nThis is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.vector_version_counter","page":"Readers","title":"Daf.Readers.vector_version_counter","text":"vector_version_counter(daf::DafReader, axis::AbstractString, name::AbstractString)::UInt32\n\nReturn the version number of the vector. This is incremented every time set_vector!, empty_dense_vector! or empty_sparse_vector! are called. It is used by interfaces to other programming languages to minimize copying data.\n\nnote: Note\nThis is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Daf.Readers.matrix_version_counter","page":"Readers","title":"Daf.Readers.matrix_version_counter","text":"matrix_version_counter(\n daf::DafReader,\n rows_axis::AbstractString,\n columns_axis::AbstractString,\n name::AbstractString\n)::UInt32\n\nReturn the version number of the matrix. The order of the axes does not matter. This is incremented every time set_matrix!, empty_dense_matrix! or empty_sparse_matrix! are called. It is used by interfaces to other programming languages to minimize copying data.\n\nnote: Note\nThis is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.\n\n\n\n\n\n","category":"function"},{"location":"readers.html#Index","page":"Readers","title":"Index","text":"","category":"section"},{"location":"readers.html","page":"Readers","title":"Readers","text":"Pages = [\"readers.md\"]","category":"page"},{"location":"copies.html#Copies","page":"Copies","title":"Copies","text":"","category":"section"},{"location":"copies.html","page":"Copies","title":"Copies","text":"Daf.Copies\nDaf.Copies.copy_scalar!\nDaf.Copies.copy_axis!\nDaf.Copies.copy_vector!\nDaf.Copies.copy_matrix!\nDaf.Copies.copy_all!\nDaf.Copies.EmptyData\nDaf.Copies.DataTypes","category":"page"},{"location":"copies.html#Daf.Copies","page":"Copies","title":"Daf.Copies","text":"Copy data between Daf data sets.\n\nnote: Note\nCopying into an in-memory data set does not duplicate the data; instead it just shares a reference to it. This is fast. In contrast, copying into a disk-based data set (e.g. using HDF5 or simple files) will create a duplicate of the data on disk. This is slow. However, both directions will not significantly increase the amount of memory allocated by the application.\n\n\n\n\n\n","category":"module"},{"location":"copies.html#Daf.Copies.copy_scalar!","page":"Copies","title":"Daf.Copies.copy_scalar!","text":"copy_scalar(;\n destination::DafWriter,\n source::DafReader,\n name::AbstractString,\n [rename::Maybe{AbstractString} = nothing,\n dtype::Maybe{Type{<:StorageScalarBase}} = nothing,\n default::Union{StorageScalar, Nothing, UndefInitializer} = undef,\n overwrite::Bool = false]\n)::Nothing\n\nCopy a scalar with some name from some source DafReader into some destination DafWriter.\n\nThe scalar is fetched using the name and the default. If rename is specified, store the scalar using this new name. If dtype is specified, the data is converted to this type. If overwrite (not the default), overwrite an existing scalar in the target.\n\n\n\n\n\n","category":"function"},{"location":"copies.html#Daf.Copies.copy_axis!","page":"Copies","title":"Daf.Copies.copy_axis!","text":"copy_axis(;\n destination::DafWriter,\n source::DafReader,\n axis::AbstractString,\n [rename::Maybe{AbstractString} = nothing,\n default::Union{Nothing, UndefInitializer} = undef]\n)::Nothing\n\nCopy an axis from some source DafReader into some destination DafWriter.\n\nThe axis is fetched using the name and the default. If rename is specified, store the axis using this name.\n\n\n\n\n\n","category":"function"},{"location":"copies.html#Daf.Copies.copy_vector!","page":"Copies","title":"Daf.Copies.copy_vector!","text":"copy_vector(;\n destination::DafWriter,\n source::DafReader,\n axis::AbstractString,\n name::AbstractString,\n [reaxis::Maybe{AbstractString} = nothing,\n rename::Maybe{AbstractString} = nothing,\n dtype::Maybe{Type{<:StorageScalarBase}} = nothing,\n default::Union{StorageScalar, StorageVector, Nothing, UndefInitializer} = undef,\n empty::Maybe{StorageScalar} = nothing,\n overwrite::Bool = false]\n)::Nothing\n\nCopy a vector from some source DafReader into some destination DafWriter.\n\nThe vector is fetched using the axis, name and the default. If reaxis is specified, store the vector using this axis. If rename is specified, store the vector using this name. If dtype is specified, the data is converted to this type. If overwrite (not the default), overwrite an existing vector in the target.\n\nThis requires the axis of one data set is the same, or is a superset of, or a subset of, the other. If the target axis contains entries that do not exist in the source, then empty must be specified to fill the missing values. If the source axis contains entries that do not exist in the target, they are discarded (not copied).\n\n\n\n\n\n","category":"function"},{"location":"copies.html#Daf.Copies.copy_matrix!","page":"Copies","title":"Daf.Copies.copy_matrix!","text":"copy_matrix(;\n destination::DafWriter,\n source::DafReader,\n rows_axis::AbstractString,\n columns_axis::AbstractString,\n name::AbstractString,\n [rows_reaxis::Maybe{AbstractString} = nothing,\n columns_reaxis::Maybe{AbstractString} = nothing,\n rename::Maybe{AbstractString} = nothing,\n dtype::Maybe{Type{<:StorageScalarBase}} = nothing,\n default::Union{StorageScalar, StorageVector, Nothing, UndefInitializer} = undef,\n empty::Maybe{StorageScalar} = nothing,\n relayout::Bool = true,\n overwrite::Bool = false]\n)::Nothing\n\nCopy a matrix from some source DafReader into some destination DafWriter.\n\nThe matrix is fetched using the rows_axis, columns_axis, name, relayout and the default. If rows_reaxis and/or columns_reaxis are specified, store the vector using these axes. If rename is specified, store the matrix using this name. If dtype is specified, the data is converted to this type. If overwrite (not the default), overwrite an existing matrix in the target. The matrix is stored with the same relayout.\n\nThis requires each axis of one data set is the same, or is a superset of, or a subset of, the other. If a target axis contains entries that do not exist in the source, then empty must be specified to fill the missing values. If a source axis contains entries that do not exist in the target, they are discarded (not copied).\n\nnote: Note\nWhen copying a matrix from a subset to a superset, if the empty value is zero, then we create a sparse matrix in the destination. However, currently we create a temporary dense matrix for this; this is inefficient and should be replaced by a more efficient method.\n\n\n\n\n\n","category":"function"},{"location":"copies.html#Daf.Copies.copy_all!","page":"Copies","title":"Daf.Copies.copy_all!","text":"copy_all!(;\n destination::DafWriter,\n source::DafReader\n [empty::Maybe{EmptyData} = nothing,\n dtypes::Maybe{DataTypes} = nothing,\n overwrite::Bool = false,\n relayout::Bool = true]\n)::Nothing\n\nCopy all the content of a source DafReader into a destination DafWriter. If overwrite, this will overwrite existing data in the target. If relayout, matrices will be stored in the target both layouts, regardless of how they were stored in the source.\n\nThis will create target axes that exist in only in the source, but will not overwrite existing target axes, regardless of the value of overwrite. An axis that exists in the target must be identical to, or be a subset of, the same axis in the source.\n\nIf the source has axes which are a subset of the same axes in the target, then you must specify a dictionary of values for the empty entries that will be created in the target when copying any vector and/or matrix properties. This is specified using a (axis, property) => value entry for specifying an empty value for a vector property and a (rows_axis, columns_axis, property) => entry for specifying an empty value for a matrix property. The order of the axes for matrix properties doesn't matter (the same empty value is automatically used for both axes orders).\n\nIf dtype is specified, the copied data of the matching property is converted to the specified data type.\n\n\n\n\n\n","category":"function"},{"location":"copies.html#Daf.Copies.EmptyData","page":"Copies","title":"Daf.Copies.EmptyData","text":"Specify the data to use for missing properties in a Daf data set. This is a dictionary with an DataKey specifying for which property we specify a value to, and the value to use.\n\nnote: Note\nDue to Julia's type system limitations, there's just no way for the system to enforce the type of the pairs when initializing this dictionary. That is, what we'd like to say is:EmptyData = AbstractDict{DataKey, StorageScalar}But what we are forced to say is:EmptyData = AbstractDictThat's not a mistake. Even EmptyData = AbstractDict{Key, StorageScalar} where {Key} fails to work, as do all the (many) possibilities for expressing \"this is a dictionary where the key or the value can be one of several things\" Sigh. Glory to anyone who figures out an incantation that would force the system to perform any meaningful type inference here.\n\n\n\n\n\n","category":"type"},{"location":"copies.html#Daf.Copies.DataTypes","page":"Copies","title":"Daf.Copies.DataTypes","text":"Specify the data type to use for overriding properties types in a Daf data set. This is a dictionary with an DataKey specifying for which property we specify a value to, and the data type to use.\n\nnote: Note\nDue to Julia's type system limitations, there's just no way for the system to enforce the type of the pairs when initializing this dictionary. That is, what we'd like to say is:DataTypes = AbstractDict{DataKey, Type{<:StorageScalarBase}}}But what we are forced to say is:DataTypes = AbstractDictThat's not a mistake. Even DataTypes = AbstractDict{<:Any, <: StorageScalarBase} fails to work, as do all the (many) possibilities for expressing \"this is a dictionary where the key or the value can be one of several things\" Sigh. Glory to anyone who figures out an incantation that would force the system to perform any meaningful type inference here.\n\n\n\n\n\n","category":"type"},{"location":"copies.html#Index","page":"Copies","title":"Index","text":"","category":"section"},{"location":"copies.html","page":"Copies","title":"Copies","text":"Pages = [\"copies.md\"]","category":"page"},{"location":"files_format.html#Files-Format","page":"Files Format","title":"Files Format","text":"","category":"section"},{"location":"files_format.html","page":"Files Format","title":"Files Format","text":"Daf.FilesFormat\nDaf.FilesFormat.MAJOR_VERSION\nDaf.FilesFormat.MINOR_VERSION\nDaf.FilesFormat.FilesDaf","category":"page"},{"location":"files_format.html#Daf.FilesFormat","page":"Files Format","title":"Daf.FilesFormat","text":"A Daf storage format in disk files. This is an efficient way to persist Daf data in a filesystem, and offers a different trade-off compared to storing the data in an HDF5 file.\n\nOn the downside, this being a directory, you need to create a zip or tar or some other form of archive file if you want to publish it. Also, accessing FilesDaf will consume multiple file descriptors as opposed to just one for HDF5, and, of course, HDF5 has libraries to support it in most systems.\n\nOn the upside, the format of the files is so simple that it is trivial to access them from any programming environment, without requiring a complex library like HDF5. In addition, since each scalar, vector or matrix property is stored in a separate file, deleting data automatically frees the storage (unlike in an HDF5 file, where you must manually repack the file to actually release the storage). Also, you can use standard tools to look at the data (e.g. use ls or the Windows file explorer to view the list of properties, how much space each one uses, when it was created, etc.). Most importantly, this allows using standard tools like make to create automatic repeatable processing workflows.\n\nWe use multiple files to store Daf data, under some root directory, as follows:\n\nThe directory will contain 4 sub-directories: scalars, axes, vectors, and matrices, and a file called daf.json.\nThe daf.json signifies that the directory contains Daf data. In this file, there should be a mapping with a version key whose value is an array of two integers. The first is the major version number and the second is the minor version number, using semantic versioning. This makes it easy to test whether a directory does/n't contain Daf data, and which version of the internal structure it is using. Currently the only defined version is [1,0].\nThe scalars directory contains scalar properties, each as in its own name.json file, containing a mapping with a type key whose value is the data type of the scalar (one of the StorageScalar types, with String for a string scalar) and a value key whose value is the actual scalar value.\nThe axes directory contains a name.txt file per axis, where each line contains a name of an axis entry.\nThe vectors directory contains a directory per axis, containing the vectors. For every vector, a name.json file will contain a mapping with an eltype key specifying the type of the vector element, and a format key specifying how the data is stored on disk, one of dense and sparse.\nIf the format is dense, then there will be a file containing the vector entries, either name.txt for strings (with a value per line), or name.data for binary data (which we can memory-map for direct access).\nIf the format is sparse, then there will also be an indtype key specifying the data type of the indices of the non-zero values, and two binary data files, name.nzind containing the indices of the non-zero entries, and name.nzval containing the values of the non-zero entries (which we can memory-map for direct access). See Julia's SparseVector implementation for details.\nThe matrices directly contains a directory per rows axis, which contains a directory per columns axis, which contains the matrices. For each matrix, a name.json file will contain a mapping with an eltype key specifying the type of the matrix element, and a format key specifying how the data is stored on disk, one of dense and sparse.\nIf the format is dense, then there will be a name.data binary file in column-major layout (which we can memory-map for direct access).\nIf the format is sparse, then there will also be an indtype key specifying the data type of the indices of the non-zero values, and three binary data files, name.colptr, name.rowval containing the indices of the non-zero values, and name.nzval containing the values of the non-zero entries (which we can memory-map for direct access). See Julia's SparseMatrixCSC implementation for details.\n\nExample directory structure:\n\nexample-daf-dataset-root-directory/\n├─ daf.json\n├─ scalars/\n│ └─ version.json\n├─ axes/\n│ ├─ cell.txt\n│ └─ gene.txt\n├─ vectors/\n│ ├─ cell/\n│ │ ├─ batch.json\n│ │ └─ batch.txt\n│ └─ gene/\n│ ├─ is_marker.json\n│ └─ is_marker.data\n└─ matrices/\n ├─ cell/\n │ ├─ cell/\n │ └─ gene/\n │ ├─ UMIs.json\n │ ├─ UMIs.colptr\n │ ├─ UMIs.rowval\n │ └─ UMIs.nzval\n └─ gene/\n ├─ cell/\n └─ gene/\n\nnote: Note\nAll binary data is stored as a sequence of elements, in little endian byte order (which is the native order for modern CPUs), without any headers or padding. (Dense) matrices are stored in column-major layout (which matches Julia's native matrix layout).All string data is stored in lines, one entry per line, separated by a `\n\ncharacter (regardless of the OS used). Therefore, you can't have a line break inside an axis entry name or in a vector property value, at least not when storing it inFilesDaf`.\n\nWhen creating an HDF5 file to contain `Daf` data, you should specify\n`;fapl=HDF5.FileAccessProperties(;alignment=(1,8))`. This ensures all the memory buffers are properly aligned for\nefficient access. Otherwise, memory mapping will be **much** less efficient. A warning is therefore generated\nwhenever you try to access `Daf` data stored in an HDF5 file which does not enforce proper alignment.\n\nThat's all there is to it. The format is intentionally simple and transparent to maximize its accessibility by other (standard) tools. Still, it is easiest to create the data using the Julia Daf package.\n\nnote: Note\nThe code here assumes the files data obeys all the above conventions and restrictions. As long as you only create and access Daf data in files using FilesDaf, then the code will work as expected (assuming no bugs). However, if you do this in some other way (e.g., directly using the filesystem and custom tools), and the result is invalid, then the code here may fails with \"less than friendly\" error messages.\n\n\n\n\n\n","category":"module"},{"location":"files_format.html#Daf.FilesFormat.MAJOR_VERSION","page":"Files Format","title":"Daf.FilesFormat.MAJOR_VERSION","text":"The specific major version of the FilesDaf format that is supported by this code (1). The code will refuse to access data that is stored in a different major format.\n\n\n\n\n\n","category":"constant"},{"location":"files_format.html#Daf.FilesFormat.MINOR_VERSION","page":"Files Format","title":"Daf.FilesFormat.MINOR_VERSION","text":"The maximal minor version of the FilesDaf format that is supported by this code (0). The code will refuse to access data that is stored with the expected major version (1), but that uses a higher minor version.\n\nnote: Note\nModifying data that is stored with a lower minor version number may increase its minor version number.\n\n\n\n\n\n","category":"constant"},{"location":"files_format.html#Daf.FilesFormat.FilesDaf","page":"Files Format","title":"Daf.FilesFormat.FilesDaf","text":"FilesDaf(\n path::AbstractString,\n mode::AbstractString = \"r\";\n [name::Maybe{AbstractString} = nothing]\n)\n\nStorage in disk files in some directory.\n\nWhen opening an existing data set, if name is not specified, and there exists a \"name\" scalar property, it is used as the name. Otherwise, the path will be used as the name.\n\nThe valid mode values are as follows (the default mode is r):\n\nMode Allow modifications? Create if does not exist? Truncate if exists? Returned type\nr No No No DafReadOnly\nr+ Yes No No FilesDaf\nw+ Yes Yes No FilesDaf\nw Yes Yes Yes FilesDaf\n\n\n\n\n\n","category":"type"},{"location":"files_format.html#Index","page":"Files Format","title":"Index","text":"","category":"section"},{"location":"files_format.html","page":"Files Format","title":"Files Format","text":"Pages = [\"files_format.md\"]","category":"page"},{"location":"groups.html#Groups","page":"Groups","title":"Groups","text":"","category":"section"},{"location":"groups.html","page":"Groups","title":"Groups","text":"Daf.Groups\nDaf.Groups.group_names\nDaf.Groups.compact_groups!\nDaf.Groups.collect_group_members","category":"page"},{"location":"groups.html#Daf.Groups","page":"Groups","title":"Daf.Groups","text":"Functions for dealing with computing groups of axis entries (typically for creating a new axis).\n\n\n\n\n\n","category":"module"},{"location":"groups.html#Daf.Groups.group_names","page":"Groups","title":"Daf.Groups.group_names","text":"group_names(\n daf::DafReader,\n axis::AbstractString,\n entries_of_groups::AbstractVector{<:AbstractVector{<:Integer}};\n prefix::AbstractString,\n)::Vector{String}\n\nGiven an entries_of_groups vector of vectors, one for each group, containing the (sorted) indices of the entries of the group along some axis of some daf data set, return a vector giving a unique name for each group. This name consists of the prefix, followed by the index of the group, followed by a .XX two-digit suffix which is a hash of the names of the axis entries of the group.\n\nThe returned names strike a balance between readability and safety. A name like M123.89 for group #123 is easy to deal with manually, but is also reasonably safe in the common use case that groups are re-computed, and there is per-group metadata lying around associated with the old groups, as the probability of the new group #123 having the same suffix is only 1% (unless it is actually identical).\n\n\n\n\n\n","category":"function"},{"location":"groups.html#Daf.Groups.compact_groups!","page":"Groups","title":"Daf.Groups.compact_groups!","text":"compact_groups!(\n group_indices::AbstractVector{<:Integer},\n)::Int\n\nGiven an array group_indices which assigns each entry of some axis to a non-negative group index (with zero meaning \"no group\"), compact it in-place so that the group indices will be 1...N, and return N.\n\n\n\n\n\n","category":"function"},{"location":"groups.html#Daf.Groups.collect_group_members","page":"Groups","title":"Daf.Groups.collect_group_members","text":"collect_group_members(\n group_indices::AbstractVector{T},\n)::Vector{Vector{T}} where {T <: Integer}\n\nGiven an array group_indices which assigns each entry of some axis to a non-negative group index (with zero meaning \"no group\"), where the group indices are compact (in the range 1...N), return a vector of vectors, one for each group, containing the (sorted) indices of the entries of the group.\n\n\n\n\n\n","category":"function"},{"location":"groups.html#Index","page":"Groups","title":"Index","text":"","category":"section"},{"location":"groups.html","page":"Groups","title":"Groups","text":"Pages = [\"groups.md\"]","category":"page"},{"location":"generic_logging.html#Generic-Logging","page":"Generic Logging","title":"Generic Logging","text":"","category":"section"},{"location":"generic_logging.html","page":"Generic Logging","title":"Generic Logging","text":"Daf.GenericLogging\nDaf.GenericLogging.setup_logger\nDaf.GenericLogging.@logged","category":"page"},{"location":"generic_logging.html#Daf.GenericLogging","page":"Generic Logging","title":"Daf.GenericLogging","text":"Generic macros and functions for logging, that arguably should belong in a more general-purpose package.\n\nWe do not re-export the macros and functions defined here from the top-level Daf namespace. That is, even if using Daf, you will not have these generic names polluting your namespace. If you do want to reuse them in your code, explicitly write using Daf.GenericLogging.\n\n\n\n\n\n","category":"module"},{"location":"generic_logging.html#Daf.GenericLogging.setup_logger","page":"Generic Logging","title":"Daf.GenericLogging.setup_logger","text":"setup_logger(\n io::IO = stderr;\n [level::LogLevel = Warn,\n show_time::Bool = true,\n show_module::Bool = true,\n show_location::Bool = false]\n)::Nothing\n\nSetup a global logger that will print into io, printing messages with a timestamp prefix.\n\nBy default, this will only print warnings. Note that increasing the log level will apply to everything. An alternative is to set up the environment variable JULIA_DEBUG to a comma-separated list of modules you wish to see the debug messages of.\n\nIf show_time, each message will be prefixed with a yyyy-dd-mm HH:MM:SS.sss timestamp prefix.\n\nIf show_module, each message will be prefixed with the name of the module emitting the message.\n\nIf show_location, each message will be prefixed with the file name and the line number emitting the message.\n\nnote: Note\nWhen multi-processing is used, a P