GitHub is NOT the preferred viewer for this file. Please visit https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_37.html
The File Archive Format defines a JSON representation of a set or list of file system objects.
Name | github.com/flux-framework/rfc/spec_37.rst |
Editor | Jim Garlick <[email protected]> |
State | raw |
The File Archive Format is a container of file system objects envisioned for:
- Batch scripts, configuration files, and user defined input files embedded in RFC 14 jobspec.
- Stage-in and stage-out of job data sets.
- Allow an archive to be defined as either a set or a list, as appropriate for the circumstances.
- Represent regular files, directories, and symbolic links.
- Support self-contained representation of file content.
- Support JSON file content without encoding.
- Avoid requiring metadata that is not essential for reconstructing the file system object.
- Enable the distributed associative cache described in RFC 10 to be leveraged for efficient file broadcast.
- Enable efficient representation of sparse files.
An archive SHALL consist of either a JSON array or a JSON dictionary of file system objects. File system objects MAY represent regular files, directories, or symbolic links.
The following keys are REQUIRED in file system objects:
- mode
- (integer) The file type and permissions encoded as defined for the
st_mode
member of the POSIXstat
structure [1]. - path
- (string) The UNIX path to the file system object. Paths SHALL NOT contain
.
or..
characters and SHALL NOT begin with a/
. When the archive container is a dictionary, the path is derived from the dictionary key and SHALL NOT appear in the file system object.
The following keys are OPTIONAL in file system objects:
- mtime
- (integer) The last data modification timestamp in seconds since the Epoch.
- ctime
- (integer) The last file status change timestamp in seconds since the Epoch.
- size
- (integer) The size of the regular file in bytes.
- encoding
- (string) The encoding type used in data for regular files. Possible
values are
utf-8
,base64
,blobvec
. - data
- Regular file content (see below) or symbolic link target (string).
A directory object SHALL be represented as empty in the archive, thus the size, data, and encoding fields SHALL NOT be present in a directory object.
Example:
{
"path":"appdata/phase1",
"mode":16893,
"mtime":1677604007,
"ctime":1677604007,
}
A symbolic link object SHALL store the link target in the data field as a UTF-8 string. The size and encoding fields SHALL NOT be present in a symbolic link object.
Example:
{
"path":"src",
"mode":41471,
"data":"/users/fred/work/project",
}
Regular files are represented as follows.
An empty regular file (zero length or sparse with no data) SHALL be represented with size set to the file size and no encoding or data fields.
Example:
{
"path":"data/empty",
"mode":33204,
"size":0,
"mtime":1677604909,
"ctime":1677604909
}
A regular file with JSON content MAY be represented without encoding. In this case, size and encoding SHALL NOT be set and data SHALL be set to any JSON value, array, or object. When such a file is unarchived, its content SHALL be a faithful JSON encoding but MAY vary in other ways including file size.
Example:
{
"path":"config.json",
"mode":33204,
"data":{
"resource":{
"exclude":"node42"
}
}
}
A regular file containing text MAY be represented with UTF-8 encoding.
In this case, size SHALL be set to the file size, encoding SHALL be
set to utf-8
, and data SHALL be set to a UTF-8 string.
Example:
{
"path":"data.csv",
"mode":33204,
"encoding":"utf-8",
"data":"iteration,density\n1,35435.555\n2,356655.332\n3,5454545.500\n",
"size":57,
}
A regular file that requires a self-contained representation in the archive
and whose content is unknown SHALL be represented with base64 encoding.
In this case, size SHALL be set to the file size, encoding SHALL Be
set to base64
, and data SHALL be set to a base64 string.
Example:
{
"path":"vectors.dat",
"mode":33204,
"encoding":"base64",
"data":"MzU0MzUuNTU1CjIsMzU2NjU1LjMzMgozLDU0NTQ1NDUuNTAwCg=="
"size":37,
}
A regular file that requires content to be referenced to the associative cache
described in RFC 10 SHALL be represented with blobvec encoding. In this case,
size is set to the file size, encoding is set to blobvec
, and
data SHALL be set to an array of 3-tuples representing file regions.
Each region is an array of three REQUIRED values:
- offset
- (integer) region starting byte
- size
- (integer) size of the region in bytes
- blobref
- (string) RFC 10 blobref string
Example:
{
"path": "kernel8.img",
"size": 8194604,
"mtime": 1674520056,
"ctime": 1674520057,
"mode": 33261,
"encoding":"blobvec",
"data": [
[0, 1048576, "sha1-d4a09c5dd5a0d2d570066b6f13e465c73c3f9944"],
[1048576, 1048576, "sha1-3eb8716208bc606a28948e2cf2fcce113e22b202"],
[2097152, 1048576, "sha1-d7cc175e14044e9d9c02d908e4df4bcf80788bc9"],
[3145728, 1048576, "sha1-34ce5050ff615ee4e2712a1f1e5b3d3df5ae6072"],
[4194304, 1048576, "sha1-d79525827b6f326ac3d731764ee2d088bc2e5fec"],
[5242880, 1048576, "sha1-ae1c6b3cb8eba86241fc4a761ee393dd22b833a7"],
[6291456, 1048576, "sha1-289585f4d0c26db7ae98ecb36c04393ff32cabeb"],
[7340032, 854572, "sha1-649d3449aa52ac46e19dc894360409d6abbeb882"]
],
}
Note
Only blobvec encoding is capable of representing non-empty sparse files.
[1] | sys/stat.h - data returned by the stat() function sys/stat.h; The Open Group Base Specifications Issue 7, 2018 edition IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) |