Skip to content

Commit

Permalink
Merge branch 'tb/pseudo-merge-reachability-bitmap' into seen
Browse files Browse the repository at this point in the history
The pack-bitmap machinery learned to write pseudo-merge bitmaps,
which act as imaginary octopus merges covering un-bitmapped
reference tips. This enhances bitmap coverage, and thus,
performance, for repositories with many references using bitmaps.

* tb/pseudo-merge-reachability-bitmap: (24 commits)
  t/perf: implement performace tests for pseudo-merge bitmaps
  pseudo-merge: implement support for finding existing merges
  ewah: `bitmap_equals_ewah()`
  pack-bitmap: extra trace2 information
  pack-bitmap.c: use pseudo-merges during traversal
  t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  pack-bitmap: implement test helpers for pseudo-merge
  ewah: implement `ewah_bitmap_popcount()`
  pseudo-merge: implement support for reading pseudo-merge commits
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: scaffolding for reads
  pack-bitmap: extract `read_bitmap()` function
  pack-bitmap-write.c: write pseudo-merge table
  pack-bitmap-write.c: select pseudo-merge commits
  pseudo-merge: implement support for selecting pseudo-merge commits
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap-write: support storing pseudo-merge commits
  pseudo-merge.ch: initial commit
  pack-bitmap: move some initialization to `bitmap_writer_init()`
  ...
  • Loading branch information
gitster committed Apr 3, 2024
2 parents 55ad16e + 4cbfcd8 commit 9822723
Show file tree
Hide file tree
Showing 19 changed files with 2,401 additions and 61 deletions.
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,8 @@ include::config/apply.txt[]

include::config/attr.txt[]

include::config/bitmap-pseudo-merge.txt[]

include::config/blame.txt[]

include::config/branch.txt[]
Expand Down
75 changes: 75 additions & 0 deletions Documentation/config/bitmap-pseudo-merge.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
bitmapPseudoMerge.<name>.pattern::
Regular expression used to match reference names. Commits
pointed to by references matching this pattern (and meeting
the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
and `bitmapPseudoMerge.<name>.threshold`) will be considered
for inclusion in a pseudo-merge bitmap.
+
Commits are grouped into pseudo-merge groups based on whether or not
any reference(s) that point at a given commit match the pattern, which
is an extended regular expression.
+
Within a pseudo-merge group, commits may be further grouped into
sub-groups based on the capture groups in the pattern. These
sub-groupings are formed from the regular expressions by concatenating
any capture groups from the regular expression, with a '-' dash in
between.
+
For example, if the pattern is `refs/tags/`, then all tags (provided
they meet the below criteria) will be considered candidates for the
same pseudo-merge group. However, if the pattern is instead
`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
be grouped into separate pseudo-merge groups, based on the remote
number.

bitmapPseudoMerge.<name>.decay::
Determines the rate at which consecutive pseudo-merge bitmap
groups decrease in size. Must be non-negative. This parameter
can be thought of as `k` in the function `f(n) = C *
n^(-k/100)`, where `f(n)` is the size of the `n`th group.
+
Setting the decay rate equal to `0` will cause all groups to be the
same size. Setting the decay rate equal to `100` will cause the `n`th
group to be `1/n` the size of the initial group. Higher values of the
decay rate cause consecutive groups to shrink at an increasing rate.
The default is `100`.

bitmapPseudoMerge.<name>.sampleRate::
Determines the proportion of non-bitmapped commits (among
reference tips) which are selected for inclusion in an
unstable pseudo-merge bitmap. Must be between `0` and `100`
(inclusive). The default is `100`.

bitmapPseudoMerge.<name>.threshold::
Determines the minimum age of non-bitmapped commits (among
reference tips, as above) which are candidates for inclusion
in an unstable pseudo-merge bitmap. The default is
`1.week.ago`.

bitmapPseudoMerge.<name>.maxMerges::
Determines the maximum number of pseudo-merge commits among
which commits may be distributed.
+
For pseudo-merge groups whose pattern does not contain any capture
groups, this setting is applied for all commits matching the regular
expression. For patterns that have one or more capture groups, this
setting is applied for each distinct capture group.
+
For example, if your capture group is `refs/tags/`, then this setting
will distribute all tags into a maximum of `maxMerges` pseudo-merge
commits. However, if your capture group is, say,
`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
each remote's set of tags individually.
+
Must be non-negative. The default value is 64.

bitmapPseudoMerge.<name>.stableThreshold::
Determines the minimum age of commits (among reference tips,
as above, however stable commits are still considered
candidates even when they have been covered by a bitmap) which
are candidates for a stable a pseudo-merge bitmap. The default
is `1.month.ago`.

bitmapPseudoMerge.<name>.stableSize::
Determines the size (in number of commits) of a stable
psuedo-merge bitmap. The default is `512`.
205 changes: 205 additions & 0 deletions Documentation/technical/bitmap-format.txt
Original file line number Diff line number Diff line change
Expand Up @@ -255,3 +255,208 @@ triplet is -
xor_row (4 byte integer, network byte order): ::
The position of the triplet whose bitmap is used to compress
this one, or `0xffffffff` if no such bitmap exists.

Pseudo-merge bitmaps
--------------------

If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
bytes (preceding the name-hash cache, commit lookup table, and trailing
checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.

A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
follows:

Commit bitmap::

A bitmap whose set bits describe the set of commits included in the
pseudo-merge's "merge" bitmap (as below).

Merge bitmap::

A bitmap whose set bits describe the reachability closure over the set
of commits in the pseudo-merge's "commits" bitmap (as above). An
identical bitmap would be generated for an octopus merge with the same
set of parents as described in the commits bitmap.

Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
for a given pseudo-merge are listed on either side of the traversal,
either directly (by explicitly asking for them as part of the `HAVES`
or `WANTS`) or indirectly (by encountering them during a fill-in
traversal).

=== Use-cases

For example, suppose there exists a pseudo-merge bitmap with a large
number of commits, all of which are listed in the `WANTS` section of
some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
bitmap machinery can quickly determine there is a pseudo-merge which
satisfies some subset of the wanted objects on either side of the query.
Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
have to repeat the decompression and `OR`-ing step over a potentially
large number of individual bitmaps, which can take proportionally more
time.

Another benefit of pseudo-merges arises when there is some combination
of (a) a large number of references, with (b) poor bitmap coverage, and
(c) deep, nested trees, making fill-in traversal relatively expensive.
For example, suppose that there are a large enough number of tags where
bitmapping each of the tags individually is infeasible. Without
pseudo-merge bitmaps, computing the result of, say, `git rev-list
--use-bitmap-index --count --objects --tags` would likely require a
large amount of fill-in traversal. But when a large quantity of those
tags are stored together in a pseudo-merge bitmap, the bitmap machinery
can take advantage of the fact that we only care about the union of
objects reachable from all of those tags, and answer the query much
faster.

=== File format

If enabled, pseudo-merge bitmaps are stored in an optional section at
the end of a `.bitmap` file. The format is as follows:

....
+-------------------------------------------+
| .bitmap File |
+-------------------------------------------+
| |
| Pseudo-merge bitmaps (Variable Length) |
| +---------------------------+ |
| | commits_bitmap (EWAH) | |
| +---------------------------+ |
| | merge_bitmap (EWAH) | |
| +---------------------------+ |
| |
+-------------------------------------------+
| |
| Lookup Table |
| +------------+--------------+ |
| | commit_pos | offset | |
| +------------+--------------+ |
| | 4 bytes | 8 bytes | |
| +------------+--------------+ |
| |
| Offset Cases: |
| ------------- |
| |
| 1. MSB Unset: single pseudo-merge bitmap |
| + offset to pseudo-merge bitmap |
| |
| 2. MSB Set: multiple pseudo-merges |
| + offset to extended lookup table |
| |
+-------------------------------------------+
| |
| Extended Lookup Table (Optional) |
| |
| +----+----------+----------+----------+ |
| | N | Offset 1 | .... | Offset N | |
| +----+----------+----------+----------+ |
| | | 8 bytes | .... | 8 bytes | |
| +----+----------+----------+----------+ |
| |
+-------------------------------------------+
| |
| Pseudo-merge Metadata |
| +------------------+----------------+ |
| | # pseudo-merges | # Commits | |
| +------------------+----------------+ |
| | 4 bytes | 4 bytes | |
| +------------------+----------------+ |
| |
| +------------------+----------------+ |
| | Lookup offset | Extension size | |
| +------------------+----------------+ |
| | 8 bytes | 8 bytes | |
| +------------------+----------------+ |
| |
+-------------------------------------------+
....

* One or more pseudo-merge bitmaps, each containing:

** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
commits included in the this psuedo-merge.

** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
the set of objects reachable from all commits listed in the
`commits_bitmap`.

* A lookup table, mapping pseudo-merged commits to the pseudo-merges
they belong to. Entries appear in increasing order of each commit's
bit position. Each entry is 12 bytes wide, and is comprised of the
following:

** `commit_pos`, a 4-byte unsigned value (in network byte-order)
containing the bit position for this commit.

** `offset`, an 8-byte unsigned value (also in network byte-order)
containing either one of two possible offsets, depending on whether or
not the most-significant bit is set.

*** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
(relative to the beginning of the `.bitmap` file) at which the
pseudo-merge bitmap for this commit can be read. This indicates
only a single pseudo-merge bitmap contains this commit.

*** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
(again relative to the beginning of the `.bitmap` file) at which
the extended offset table can be located describing the set of
pseudo-merge bitmaps which contain this commit. This indicates
that multiple pseudo-merge bitmaps contain this commit.

* An (optional) extended lookup table (written if and only if there is
at least one commit which appears in more than one pseudo-merge).
There are as many entries as commits which appear in multiple
pseudo-merges. Each entry contains the following:

** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
which contain a given commit.

** An array of `N` 8-byte unsigned values, each of which is
interpreted as an offset (relative to the beginning of the
`.bitmap` file) at which a pseudo-merge bitmap for this commit can
be read. These values occur in no particular order.

* Positions for all pseudo-merges, each stored as an 8-byte unsigned
value (in network byte-order) containing the offset (relative to the
beginnign of the `.bitmap` file) of each consecutive pseudo-merge.

* A 4-byte unsigned value (in network byte-order) equal to the number of
pseudo-merges.

* A 4-byte unsigned value (in network byte-order) equal to the number of
unique commits which appear in any pseudo-merge.

* An 8-byte unsigned value (in network byte-order) equal to the number
of bytes between the start of the pseudo-merge section and the
beginning of the lookup table.

* An 8-byte unsigned value (in network byte-order) equal to the number
of bytes in the pseudo-merge section (including this field).

=== Pseudo-merge selection

Pseudo-merge commits are selected among non-bitmapped commits at the
tip of one or more reference(s). In addition, there are a handful of
constraints to further refine this selection:

`pack.bitmapPseudoMergeDecay`:: Defines the "decay rate", which
corresponds to how quickly (or not) consecutive pseudo-merges decrease
in size relative to one another.

`pack.bitmapPseudoMergeGroups`:: Defines the maximum number of
pseudo-merge groups.

`pack.bitmapPseudoMergeSampleRate`:: Defines the percentage of commits
(matching the above criteria) which are selected.

`pack.bitmapPseudoMergeThreshold`:: Defines the minimum age of a commit
in order to be considered for inclusion within one or more pseudo-merge
bitmaps.

The size of consecutive pseudo-merge groups decays according to a
power-law decay function, where the size of the `n`-th group is `f(n) =
C*n^-k`. The value of `C` is chosen accordingly to match the number of
desired groups, and `k` is 1/100th of the value of
`pack.bitmapPseudoMergeDecay`.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,7 @@ LIB_OBJS += prompt.o
LIB_OBJS += protocol.o
LIB_OBJS += protocol-caps.o
LIB_OBJS += prune-packed.o
LIB_OBJS += pseudo-merge.o
LIB_OBJS += quote.o
LIB_OBJS += range-diff.o
LIB_OBJS += reachable.o
Expand Down
3 changes: 2 additions & 1 deletion builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -1339,6 +1339,7 @@ static void write_pack_file(void)
hash_to_hex(hash));

if (write_bitmap_index) {
bitmap_writer_init(the_repository);
bitmap_writer_set_checksum(hash);
bitmap_writer_build_type_index(
&to_pack, written_list, nr_written);
Expand All @@ -1359,7 +1360,7 @@ static void write_pack_file(void)
stop_progress(&progress_state);

bitmap_writer_show_progress(progress);
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr);
if (bitmap_writer_build(&to_pack) < 0)
die(_("failed to write bitmap index"));
bitmap_writer_finish(written_list, nr_written,
Expand Down
18 changes: 18 additions & 0 deletions config.c
Original file line number Diff line number Diff line change
Expand Up @@ -2652,6 +2652,24 @@ int repo_config_get_pathname(struct repository *repo,
return ret;
}

int repo_config_get_expiry(struct repository *repo,
const char *key, const char **dest)
{
int ret;

git_config_check_init(repo);

ret = repo_config_get_string(repo, key, (char **)dest);
if (ret)
return ret;
if (strcmp(*dest, "now")) {
timestamp_t now = approxidate("now");
if (approxidate(*dest) >= now)
git_die_config(key, _("Invalid %s: '%s'"), key, *dest);
}
return ret;
}

/* Read values into protected_config. */
static void read_protected_config(void)
{
Expand Down
2 changes: 2 additions & 0 deletions config.h
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,8 @@ int repo_config_get_maybe_bool(struct repository *repo,
const char *key, int *dest);
int repo_config_get_pathname(struct repository *repo,
const char *key, const char **dest);
int repo_config_get_expiry(struct repository *repo,
const char *key, const char **dest);

/*
* Functions for reading protected config. By definition, protected
Expand Down
Loading

0 comments on commit 9822723

Please sign in to comment.