This document describes how to serialize a filesystem and filesystem changes like removed files into a blob called a layer. One or more layers are applied on top of each other to create a complete filesystem. This document will use a concrete example to illustrate how to create and consume these filesystem layers.
This section defines the application/vnd.oci.image.layer.v1.tar
, application/vnd.oci.image.layer.v1.tar+gzip
, application/vnd.oci.image.layer.v1.tar+zstd
, application/vnd.oci.image.layer.nondistributable.v1.tar
, application/vnd.oci.image.layer.nondistributable.v1.tar+gzip
, and application/vnd.oci.image.layer.nondistributable.v1.tar+zstd
media types.
- The media type
application/vnd.oci.image.layer.v1.tar+gzip
represents anapplication/vnd.oci.image.layer.v1.tar
payload which has been compressed with gzip. - The media type
application/vnd.oci.image.layer.nondistributable.v1.tar+gzip
represents anapplication/vnd.oci.image.layer.nondistributable.v1.tar
payload (deprecation notice) which has been compressed with gzip.
- The media type
application/vnd.oci.image.layer.v1.tar+zstd
represents anapplication/vnd.oci.image.layer.v1.tar
payload which has been compressed with zstd. - The media type
application/vnd.oci.image.layer.nondistributable.v1.tar+zstd
represents anapplication/vnd.oci.image.layer.nondistributable.v1.tar
payload (deprecation notice) which has been compressed with zstd.
- Layer Changesets for the media type
application/vnd.oci.image.layer.v1.tar
MUST be packaged in tar archive. - Layer Changesets for the media type
application/vnd.oci.image.layer.v1.tar
MUST NOT include duplicate entries for file paths in the resulting tar archive.
Types of changes that can occur in a changeset are:
- Additions
- Modifications
- Removals
Additions and Modifications are represented the same in the changeset tar archive.
Removals are represented using "whiteout" file entries (See Representing Changes).
Throughout this document section, the use of word "files" or "entries" includes the following, where supported:
- regular files
- directories
- sockets
- symbolic links
- block devices
- character devices
- FIFOs
Where supported, MUST include file attributes for Additions and Modifications include:
- Modification Time (
mtime
) - User ID (
uid
)- User Name (
uname
) should be ignored on platforms that support User ID (uid
)
- User Name (
- Group ID (
gid
)- Group Name (
gname
) should be ignored on platforms that support Group ID (gid
)
- Group Name (
- Mode (
mode
) - Extended Attributes (
xattrs
) - Symlink reference (
linkname
+ symbolic link type) - Hardlink reference (
linkname
)
Sparse files SHOULD NOT be used because they lack consistent support across tar implementations.
- Hardlinks are a POSIX concept for having one or more directory entries for the same file on the same device.
- Not all filesystems support hardlinks (e.g. FAT).
- Hardlinks are possible with all file types except
directories
. - Non-directory files are considered "hardlinked" when their link count is greater than 1.
- Hardlinked files are on a same device (i.e. comparing Major:Minor pair) and have the same inode.
- The corresponding files that share the link with the > 1 linkcount may be outside the directory that the changeset is being produced from, in which case the
linkname
is not recorded in the changeset. - Union filesystem implementations may have limited or no support for hardlinks, particularly when a change is made to a hardlinked file or a hardlink is created to a file in a lower filesystem. (See the overlay specification for more details.)
- Extracting a layer with hardlink references to files outside of the layer may fail.
- Hardlinks are stored in a tar archive with type of a
1
char, per the GNU Basic Tar Format and libarchive tar(5). - While approaches to deriving new or changed hardlinks may vary, a possible approach is:
SET LinkMap to map[< Major:Minor String >]map[< inode integer >]< path string >
SET LinkNames to map[< src path string >]< dest path string >
FOR each path in root path
IF path type is directory
CONTINUE
ENDIF
SET filestat to stat(path)
IF filestat num of links == 1
CONTINUE
ENDIF
IF LinkMap[filestat device][filestat inode] is not empty
SET LinkNames[path] to LinkMap[filestat device][filestat inode]
ELSE
SET LinkMap[filestat device][filestat inode] to path
ENDIF
END FOR
With this approach, the link map and links names of a directory could be compared against that of another directory to derive additions and changes to hardlinks.
Implementations on Windows MUST support these additional attributes, encoded in PAX vendor extensions as follows:
- Windows file attributes (
MSWINDOWS.fileattr
) - Security descriptor (
MSWINDOWS.rawsd
): base64-encoded self-relative binary security descriptor - Mount points (
MSWINDOWS.mountpoint
): if present on a directory symbolic link, then the link should be created as a directory junction - Creation time (
LIBARCHIVE.creationtime
)
The initial root filesystem is the base or parent layer.
For this example, an image root filesystem has an initial state as an empty directory. The name of the directory is not relevant to the layer itself, only for the purpose of producing comparisons.
Here is an initial empty directory structure for a changeset, with a unique directory name rootfs-c9d-v1
.
rootfs-c9d-v1/
Files and directories are then created:
rootfs-c9d-v1/
etc/
my-app-config
bin/
my-app-binary
my-app-tools
The rootfs-c9d-v1
directory is then created as a plain tar archive with relative path to rootfs-c9d-v1
.
Entries for the following files:
./
./etc/
./etc/my-app-config
./bin/
./bin/my-app-binary
./bin/my-app-tools
Create a new directory and initialize it with a copy or snapshot of the prior root filesystem. Example commands that can preserve file attributes to make this copy are:
- cp(1):
cp -a rootfs-c9d-v1/ rootfs-c9d-v1.s1/
- rsync(1):
rsync -aHAX rootfs-c9d-v1/ rootfs-c9d-v1.s1/
- tar(1):
mkdir rootfs-c9d-v1.s1 && tar --acls --xattrs -C rootfs-c9d-v1/ -c . | tar -C rootfs-c9d-v1.s1/ --acls --xattrs -x
(including--selinux
where supported)
Any changes to the snapshot MUST NOT change or affect the directory it was copied from.
For example rootfs-c9d-v1.s1
is an identical snapshot of rootfs-c9d-v1
.
In this way rootfs-c9d-v1.s1
is prepared for updates and alterations.
Implementor's Note: a copy-on-write or union filesystem can efficiently make directory snapshots
Initial layout of the snapshot:
rootfs-c9d-v1.s1/
etc/
my-app-config
bin/
my-app-binary
my-app-tools
See Change Types for more details on changes.
For example, add a directory at /etc/my-app.d
containing a default config file, removing the existing config file.
Also a change (in attribute or file content) to ./bin/my-app-tools
binary to handle the config layout change.
Following these changes, the representation of the rootfs-c9d-v1.s1
directory:
rootfs-c9d-v1.s1/
etc/
my-app.d/
default.cfg
bin/
my-app-binary
my-app-tools
When two directories are compared, the relative root is the top-level directory. The directories are compared, looking for files that have been added, modified, or removed.
For this example, rootfs-c9d-v1/
and rootfs-c9d-v1.s1/
are recursively compared, each as relative root path.
The following changeset is found:
Added: /etc/my-app.d/
Added: /etc/my-app.d/default.cfg
Modified: /bin/my-app-tools
Deleted: /etc/my-app-config
This reflects the removal of /etc/my-app-config
and creation of a file and directory at /etc/my-app.d/default.cfg
.
/bin/my-app-tools
has also been replaced with an updated version.
A tar archive is then created which contains only this changeset:
- Added and modified files and directories in their entirety
- Deleted files or directories marked with a whiteout file
The resulting tar archive for rootfs-c9d-v1.s1
has the following entries:
./etc/my-app.d/
./etc/my-app.d/default.cfg
./bin/my-app-tools
./etc/.wh.my-app-config
To signify that the resource ./etc/my-app-config
MUST be removed when the changeset is applied, the basename of the entry is prefixed with .wh.
.
- Layer Changesets of media type
application/vnd.oci.image.layer.v1.tar
are applied, rather than simply extracted as tar archives. - Applying a layer changeset requires special consideration for the whiteout files.
- In the absence of any whiteout files in a layer changeset, the archive is extracted like a regular tar archive.
This section specifies applying an entry from a layer changeset if the target path already exists.
If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset. In all other cases, the implementation MUST do the semantic equivalent of the following:
- removing the file path (e.g.
unlink(2)
on Linux systems) - recreating the file path, based on the contents and attributes of the changeset entry
- A whiteout file is an empty file with a special filename that signifies a path should be deleted.
- A whiteout filename consists of the prefix
.wh.
plus the basename of the path to be deleted. - As files prefixed with
.wh.
are special whiteout markers, it is not possible to create a filesystem which has a file or directory with a name beginning with.wh.
. - Once a whiteout is applied, the whiteout itself MUST also be hidden.
- Whiteout files MUST only apply to resources in lower/parent layers.
- Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.
The following is a base layer with several resources:
file1
a/file2
b/
c/file3
If we then delete file1
, file2
, and b/
, while leaving file3
and adding `file4, the next layer looks like:
.wh.file1
a/.wh.file2
.wh.b
file4
Note that regardless of the path being deleted, the whiteout file is a regular file in the archive.
Implementations SHOULD generate layers such that the whiteout files appear before sibling directory entries.
- In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
- An opaque whiteout entry is a file with the name
.wh..wh..opq
indicating that all siblings are hidden in the lower layer.
Let's take the following base layer as an example:
etc/
my-app-config
bin/
my-app-binary
my-app-tools
tools/
my-app-tool-one
If all children of bin/
are removed, the next layer would have the following:
bin/
.wh..wh..opq
This is called opaque whiteout format.
An opaque whiteout file hides all children of the bin/
including sub-directories and all descendants.
Using explicit whiteout files, this would be equivalent to the following:
bin/
.wh.my-app-binary
.wh.my-app-tools
.wh.tools
In this case, a unique whiteout file is generated for each entry.
If there were more children of bin/
in the base layer, there would be an entry for each.
Note that this opaque file will apply to all children, including sub-directories, other resources and all descendants.
Implementations SHOULD generate layers using explicit whiteout files, but MUST accept both.
As another example, consider the following base layer:
a/
a/b/
a/b/c/
a/b/c/bar
When the next layer is created, the original a/b
directory is deleted and recreated with a/b/c/foo
:
a/
a/.wh..wh..opq
a/b/
a/b/c/
a/b/c/foo
When processing the second layer, a/.wh..wh..opq
is applied first, before creating the new version of a/b
, regardless of the ordering in which the whiteout file was encountered.
For example, the following layer is equivalent to the layer above:
a/
a/b/
a/b/c/
a/b/c/foo
a/.wh..wh..opq
Any given image is likely to be composed of several of these Image Filesystem Changeset tar archives.
NOTE: Non-distributable layers are deprecated, and not recommended for future use. Implementations SHOULD NOT produce new non-distributable layers. Implementations are expected to support preexisting images with non-distributable layers.
Due to legal requirements, certain layers may not be regularly distributable. Such "non-distributable" layers are typically downloaded directly from a distributor but never uploaded.
Non-distributable layers SHOULD be tagged with an alternative mediatype of application/vnd.oci.image.layer.nondistributable.v1.tar
, application/vnd.oci.image.layer.nondistributable.v1.tar+gzip
, or application/vnd.oci.image.layer.nondistributable.v1.tar+zstd
.
Implementations SHOULD NOT upload layers tagged with this media type; however, such a media type SHOULD NOT affect whether an implementation downloads the layer.
Descriptors referencing non-distributable layers MAY include urls
for downloading these layers directly; however, the presence of the urls
field SHOULD NOT be used to determine whether or not a layer is non-distributable.