Skip to content

Commit

Permalink
Update format documentation in the manual
Browse files Browse the repository at this point in the history
Make it (mostly) up to date with RPMv4 standards.
Also fix some broken links, and mark old signature tags as deprecated.
  • Loading branch information
dralley committed Jan 6, 2024
1 parent 647af12 commit b891bfc
Show file tree
Hide file tree
Showing 3 changed files with 120 additions and 116 deletions.
193 changes: 98 additions & 95 deletions docs/manual/format.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
layout: default
title: rpm.org - RPM Package format
---

# Package format

This document describes the RPM file format version 3.0, which is used
by RPM versions 2.1 and greater. The format is subject to change, and
you should not assume that this document is kept up to date with the
latest RPM code. That said, the 3.0 format should not change for
quite a while, and when it does, it will not be 3.0 anymore :-).
This document describes the RPM file format version 4.0. The format is subject
to change, and you should not assume that this document is kept up to date with
the latest RPM code. With that said, the basic principles have not and are not
likely to change significantly over time.

\warning In any case, THE PROPER WAY TO ACCESS THESE STRUCTURES IS THROUGH
THE RPM LIBRARY!!
Expand All @@ -23,17 +23,20 @@ package file is divided in 4 logical sections:
. Payload -- compressed archive of the file(s) in the package (aka "payload")
```

All 2 and 4 byte "integer" quantities (int16 and int32) are stored in
network byte order. When data is presented, the first number is the
byte number, or address, in hex, followed by the byte values in hex,
followed by character "translations" (where appropriate).
All 2 and 4 byte "integer" quantities (int16 and int32) are stored in network
byte order (big-endian). When data is presented, the first number is the byte
number, or address, in hex, followed by the byte values in hex, followed by
character "translations" (where appropriate).

## Lead

The Lead is basically for file(1). All the information contained in
the Lead is duplicated or superceded by information in the Header.
Much of the info in the Lead was used in old versions of RPM but is
now ignored. The Lead is stored as a C structure:
The Lead is basically for file(1). All the information contained in the Lead
is duplicated or superceded by information in the Header. Much of the info in
the Lead was used in old versions of RPM but is now ignored. The details here
are left for historical reasons, but current and future development should
use the Header structure instead.

The Lead is stored as a C structure:

\code
struct rpmlead {
Expand All @@ -48,31 +51,31 @@ struct rpmlead {
};
\endcode

and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm
package:
and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm package:

```
00000000: ed ab ee db 03 00 00 00
```

The first 4 bytes (0-3) are "magic" used to uniquely identify an RPM
package. It is used by RPM and file(1). The next two bytes (4, 5)
are int8 quantities denoting the "major" and "minor" RPM file format
version. This package is in 3.0 format. The following 2 bytes (6-7)
form an int16 which indicates the package type. As of this writing
there are only two types: 0 == binary, 1 == source.
The first 4 bytes (0-3) are the "magic" number used to uniquely identify a file
as an RPM package. It is used by RPM and file(1). The next two bytes (4, 5)
are int8 quantities denoting the "major" and "minor" RPM file format version.
For legacy reasons, this version is always "3.0" (major version "3", minor
version "0"), even with packages built by RPM 4.0+ (referred to as RPM v4
packages). The following 2 bytes (6-7) form an int16 which indicates the
package type. As of this writing there are only two types: 0 == binary,
1 == source.

```
00000008: 00 01 72 70 6d 2d 32 2e ..rpm-2.
```

The next two bytes (8-9) form an int16 that indicates the architecture
the package was built for. While this is used by file(1), the true
architecture is stored as a string in the Header. See, lib/misc.c for
a list of architecture->int16 translations. In this case, 1 == i386.
Starting with byte 10 and extending to byte 75, are 65 characters and
a null byte which contain the familiar "name-version-release" of the
package, padded with null (0) bytes.
The next two bytes (8-9) form an int16 that indicates the architecture that the
package was built for. While this is used by file(1), the true architecture
is stored as a string in the Header. In this case, 1 == i386. Starting with
byte 10 and extending to byte 75, are 65 characters and a null byte which
contain the familiar "name-version-release" of the package, padded with null
(0) bytes.

```
00000010: 31 2e 32 2d 31 00 00 00 1.2-1...
Expand All @@ -85,88 +88,72 @@ package, padded with null (0) bytes.
00000048: 00 00 00 00 00 01 00 05 ........
```

Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the
package was built for. In this case, 1 == Linux. The next 2 bytes
(78-79) form an int16 that indicates the signature type. This tells
RPM what to expect in the Signature. For version 3.0 packages, this
is 5, which indicates the new "Header-style" signatures.
Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the package was
built for. In this case, 1 == Linux. The next 2 bytes (78-79) form an int16
that indicates the signature type. This tells RPM what to expect in the
Signature. This is generally expected to be 5, which indicates the use of
"Header-style" signatures.

```
00000050: 04 00 00 00 68 e6 ff bf ........
00000058: ab ad 00 08 3c eb ff bf ........
```

The remaining 16 bytes (80-95) are currently unused and are reserved
for future expansion.
The remaining 16 bytes (80-95) are unused.

## Signature

A 3.0 format signature (denoted by signature type 5 in the Lead), uses
the same structure as the Header. For historical reasons, this
structure is called a "header structure", which can be confusing since
it is used for both the Header and the Signature. The details of the
header structure are given below, and you'll want to read them so the
rest of this makes sense. The tags for the Signature are defined in
lib/signature.h.

The Signature can contain multiple signatures, of different types.
There are currently only three types, each with its own tag in the
header structure:

```
Name Tag Header Type
---- ---- -----------
SIZE 1000 INT_32
MD5 1001 BIN
PGP 1002 BIN
```

The MD5 signature is 16 bytes, and the PGP signature varies with
the size of the PGP key used to sign the package.

As of RPM 2.1, all packages carry at least SIZE and MD5 signatures,
and the Signature section is padded to a multiple of 8 bytes.
"Header-style" signatures (denoted by signature type 5 in the Lead), use the
same structure as the Header. For historical reasons, this structure is called
a "header structure", which can be confusing since it is used for both the
Header and the Signature. The details of the header structure are given below,
and you'll want to read them so the rest of this makes sense. The tags for the
Signature are defined in include/rpm/rpmtag.h.

The Signature can contain multiple different types of signatures, stored under
unique tags (just like the Header). Details about these tags and the information
they store can be found [here](signatures_digests.md).

RPM v4 packages are expected to contain at least one of SHA1HEADER or SHA256HEADER
tags, providing a cryptographic digest of the main header, and may contain one
or both of the PAYLOADDIGEST and PAYLOADDIGESTALT tags, providing a cryptographic
digest of the package payload in the compressed and uncompressed forms, respectively.

If the package has been cryptographically signed using OpenPGP, an RSAHEADER or
DSAHEADER tag ought to be present, which contains an OpenPGP signature of the
package header. Which tag is present depends on which of the two (supported)
OpenPGP algorithms was used at signing time. Using a key based upon the RSA
algorithm to sign the package will result in the signature being stored in the
RSAHEADER tag, whereas the use of the EdDSA (ed25519) algorithm will use the
DSAHEADER tag instead. The name of the DSAHEADER tag is a historical artifact,
it originally referred to the long-obsolete DSA algorithm but was later reused
for EdDSA (ed25519) signatures.

As the package header itself contains a checksum of the payload (as of RPM 4.14+),
the header signature is sufficient to establish cryptographic provenance of the
package.

Other signature tags which may be present are considered legacy and their use is
discouraged if a more modern option is available.

## Header

The Header contains all the information about a package: name,
version, file list, etc. It uses the same "header structure" as the
Signature, which is described in detail below. A complete list of the
tags for the Header would take too much space to list here, and the
list grows fairly frequently. For the complete list see lib/rpmlib.h
in the RPM sources.

## Payload

The Payload is currently a cpio archive, gzipped by default. The cpio archive
type used is SVR4 with a CRC checksum.

As cpio is limited to 4 GB (32 bit unsigned) file sizes RPM since
version 4.12 uses a stripped down version of cpio for packages with
files > 4 GB. This format uses `07070X` as magic bytes and the file
header otherwise only contains the index number of the file in the RPM
header as 8 byte hex string. The file metadata that is normally found
in a cpio file header - including the file name - is completely
omitted as it is stored in the RPM header already.

To use a different compression method when building new packages with
`rpmbuild(8)`, define the `%_binary_payload` or `%_source_payload` macros for
the binary or source packages, respectively. These macros accept an
[RPM IO mode string](https://ftp.osuosl.org/pub/rpm/api/4.17.0/group__rpmio.html#example-mode-strings)
(only `w` mode).
The Header contains all the information about a package: name, version, file
list, etc. It uses the same "header structure" as the Signature, which is
described in further detail below. A complete list of the tags for the Header
would take too much space to list here, and the list grows fairly frequently.
For the complete list see include/rpm/rpmtag.h in the RPM sources.

## The Header Structure

The header structure is a little complicated, but actually performs a
very simple function. It acts almost like a small database in that it
allows you to store and retrieve arbitrary data with a key called a
"tag". When a header structure is written to disk, the data is
written in network byte order, and when it is read from disk, is is
converted to host byte order.
The header structure is a little complicated, but actually performs a very
simple function. It acts almost like a small database in that it allows you
to store and retrieve arbitrary data with a key called a "tag". When a header
structure is written to disk, the data is written in network byte order
(big-endian), and when it is read from disk, is is converted to host byte order.

Along with the tag and the data, a data "type" is stored, which indicates,
obviously, the type of the data associated with the tag. There are
currently 9 types:
obviously, the type of the data associated with the tag. There are currently 9 types:

```
Type Number
Expand All @@ -178,7 +165,7 @@ currently 9 types:
INT32 4
INT64 5
STRING 6
BIN 7
BIN 7
STRING_ARRAY 8
I18NSTRING_TYPE 9
```
Expand Down Expand Up @@ -229,7 +216,7 @@ In our example there would be 32 such 16-byte index entries, followed
by the data section:

```
00000210: 72 70 6d 00 32 2e 31 2e 32 00 31 00 52 65 64 20 rpm.2.1.2.1.Red
00000210: 72 70 6d 00 32 2e 31 2e 32 00 31 00 52 65 64 20 rpm.2.1.2.1.Red
00000220: 48 61 74 20 50 61 63 6b 61 67 65 20 4d 61 6e 61 Hat Package Mana
00000230: 67 65 72 00 31 e7 cb b4 73 63 68 72 6f 65 64 65 ger.1...schroede
00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00 r.redhat.com....
Expand Down Expand Up @@ -264,3 +251,19 @@ could start at byte 589, byte that is an improper boundary for an INT32.
As a result, 3 null bytes are inserted and the date for the SIZE actually
starts at byte 592: "00 09 9b 31", which is 629553).

## Payload

The Payload is currently a cpio archive, typically compressed using the gzip,
zstandard, or LZMA algorithms. The cpio archive type used is SVR4 with a CRC checksum.

As cpio is limited to 4 GB (32 bit unsigned) file sizes, RPM (since version 4.12)
uses a stripped down variant of cpio for packages with files > 4 GB. This format
uses `07070X` as magic bytes and the file header otherwise only contains the
index number of the file in the RPM header as 8 byte hex string. The file
metadata that is normally found in a cpio file header - including the file name -
is completely omitted as it is stored in the RPM header already.

To use a different compression method when building new packages with `rpmbuild(8)`,
define the `%_binary_payload` or `%_source_payload` macros for the binary or source
packages, respectively. These macros accept an [RPM IO mode string](https://ftp.osuosl.org/pub/rpm/api/4.17.0/group__rpmio.html#example-mode-strings)
(only `w` mode).
27 changes: 14 additions & 13 deletions docs/manual/signatures_digests.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,25 @@
layout: default
title: rpm.org - Signatures and Digests
---

# Signatures and Digests

Table describing signatures and digests which RPM uses to verify package
contents:

| RPMSIGTAG_ | RPMTAG_ | Version | Algorithm | Location | Range |
| :---: | :-------: | :---: | :-----: | :--: | :-----: |
| MD5 | SIGMD5 | 3.0 | MD5 | S | HP |
| PGP | SIGPGP | 3.0 | OpenPGP/RSA | S | HP |
| GPG | SIGGPG | 3.0 | OpenPGP/DSA | S | HP |
| SHA1 | SHA1HEADER | 4.0 | SHA1 | S | H |
| RSA | RSAHEADER | 4.0 | OpenPGP/RSA | S | H |
| DSA | DSAHEADER | 4.0 | OpenPGP/DSA | S | H |
| SHA256 | SHA256HEADER | 4.14 | SHA256 | S | H |
| - | PAYLOADDIGEST | 4.14 | SHA256 (*) | H | Pc |
| - | PAYLOADDIGESTALT | 4.16 | SHA256 (*) | H | P |
| - | FILEMD5 | 3.0 | MD5 | H | F |
| - | FILEDIGESTS | 4.6 | SHA256 (**) | H | F |
| RPMSIGTAG_ | RPMTAG_ | Version | Deprecated | Algorithm | Location | Range |
| :---: | :-------: | :---: | :---: | :-----: | :--: | :-----: |
| MD5 | SIGMD5 | 3.0 | Y | MD5 | S | HP |
| PGP | SIGPGP | 3.0 | Y | OpenPGP/RSA | S | HP |
| GPG | SIGGPG | 3.0 | Y | OpenPGP/EdDSA | S | HP |
| SHA1 | SHA1HEADER | 4.0 | Y | SHA1 | S | H |
| RSA | RSAHEADER | 4.0 | | OpenPGP/RSA | S | H |
| DSA | DSAHEADER | 4.0 | | OpenPGP/EdDSA | S | H |
| SHA256 | SHA256HEADER | 4.14 | | SHA256 | S | H |
| - | FILEMD5 | 3.0 | Y | MD5 | H | F |
| - | FILEDIGESTS | 4.6 | | SHA256 (**) | H | F |
| - | PAYLOADDIGEST | 4.14 | | SHA256 (*) | H | Pc |
| - | PAYLOADDIGESTALT | 4.16 | | SHA256 (*) | H | P |

* S = Signature header
* H = Main header
Expand Down
16 changes: 8 additions & 8 deletions docs/manual/tags.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,22 +298,22 @@ Transfiletriggerversion | 5081 | string array

## Signatures and digests

[Signatures](signatures.md) allow to verify the origin of a package.
[Signatures](signatures_digests.md) allow verifying the origin of a package.

Tag Name | Value| Type | Description
------------------|------|--------------|------------
Dsaheader | 267 | bin | OpenPGP DSA signature of the header (if thus signed)
Longsigsize | 270 | int64 | Header+payload size if > 4GB.
Dsaheader | 267 | bin | OpenPGP EdDSA signature of the header (if thus signed)
Longsigsize | 270 | int64 | Deprecated: Header+payload size if > 4GB.
Payloaddigest | 5092 | string array | Cryptographic digest of the compressed payload.
Payloaddigestalgo | 5093 | int32 | ID of the payload digest algorithm.
Payloaddigestalt | 5097 | string array | Cryptographic digest of the uncompressed payload.
Rsaheader | 268 | bin | OpenPGP RSA signature of the header (if thus signed).
Sha1header | 269 | string | SHA1 digest of the header.
Sha1header | 269 | string | Deprecated: SHA1 digest of the header.
Sha256header | 273 | string | SHA256 digest of the header.
Siggpg | 262 | bin | OpenPGP DSA signature of the header+payload (if thus signed).
Sigmd5 | 261 | bin | MD5 digest of the header+payload.
Sigpgp | 259 | bin | OpenPGP RSA signature of the header+payload (if thus signed).
Sigsize | 257 | int32 | Header+payload size.
Siggpg | 262 | bin | Deprecated: OpenPGP DSA signature of the header+payload (if thus signed).
Sigmd5 | 261 | bin | Deprecated: MD5 digest of the header+payload.
Sigpgp | 259 | bin | Deprecated: OpenPGP RSA signature of the header+payload (if thus signed).
Sigsize | 257 | int32 | Deprecated: Header+payload size.

## Installed package headers only

Expand Down

0 comments on commit b891bfc

Please sign in to comment.