Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pack-objects: create new name-hash algorithm #1785

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Documentation/git-pack-objects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ SYNOPSIS
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> This is not meant to be cryptographic at all, but uniformly distributed
> across the possible hash values. This creates a hash that appears
> pseudorandom. There is no ability to consider similar file types as
> being close to each other.

Another consideration we had when designing the current mechanism,
which is more important than "compare .c files with each other", is
to handle the case where a file is moved across directory boundary
without changing its name.  These "hash collissions" are meant to be
a part of obtaining _good_ paring of blobs that ought to be similar
to each other.  In other words, we wanted them to collide so that we
do not have to be negatively affected by moves.

I am not saying that we should not update the pack name hash; I am
just saying that "consider similar file types" as if that is the
most important aspect of the current hash, is misleading.

Thanks.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/9/24 1:45 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> >> This is not meant to be cryptographic at all, but uniformly distributed
>> across the possible hash values. This creates a hash that appears
>> pseudorandom. There is no ability to consider similar file types as
>> being close to each other.
> > Another consideration we had when designing the current mechanism,
> which is more important than "compare .c files with each other", is
> to handle the case where a file is moved across directory boundary
> without changing its name.  These "hash collissions" are meant to be
> a part of obtaining _good_ paring of blobs that ought to be similar
> to each other.  In other words, we wanted them to collide so that we
> do not have to be negatively affected by moves.
> > I am not saying that we should not update the pack name hash; I am
> just saying that "consider similar file types" as if that is the
> most important aspect of the current hash, is misleading.

Thank you for this extra aspect, which has clarified some of my
thinking and I will use in future versions.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> +static inline uint32_t pack_full_name_hash(const char *name)
> +{
> +	const uint32_t bigp = 1234572167U;
> +	uint32_t c, hash = bigp;
> +
> +	if (!name)
> +		return 0;
> +
> +	/*
> +	 * Do the simplest thing that will resemble pseduo-randomness: add

"pseduo" -> "pseudo"

> +	 * random multiples of a large prime number with a binary shift.
> +	 * The goal is not to be cryptographic, but to be generally
> +	 * uniformly distributed.
> +	 */

Other than that, there is no substantial difference since the
previous iteration; this step looks good.

Thanks.

[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
[--shallow] [--keep-true-parents] [--[no-]sparse]
[--full-name-hash] < <object-list>


DESCRIPTION
Expand Down
4 changes: 3 additions & 1 deletion Documentation/git-repack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ git-repack - Pack unpacked objects in a repository
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> diff --git a/t/t0450/txt-help-mismatches b/t/t0450/txt-help-mismatches
> index 28003f18c92..c4a15fd0cb8 100644
> --- a/t/t0450/txt-help-mismatches
> +++ b/t/t0450/txt-help-mismatches
> @@ -45,7 +45,6 @@ rebase
>  remote
>  remote-ext
>  remote-fd
> -repack
>  reset
>  restore
>  rev-parse

This is very much appreciated ;-)

--------
[verse]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
[--write-midx] [--full-name-hash]

DESCRIPTION
-----------
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -812,6 +812,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
TEST_BUILTINS_OBJS += test-match-trees.o
TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
TEST_BUILTINS_OBJS += test-name-hash.o
TEST_BUILTINS_OBJS += test-online-cpus.o
TEST_BUILTINS_OBJS += test-pack-mtimes.o
TEST_BUILTINS_OBJS += test-parse-options.o
Expand Down
25 changes: 20 additions & 5 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,14 @@ struct configured_exclusion {
static struct oidmap configured_exclusions;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> Add a new environment variable to opt-in to the --full-name-hash option
> in 'git pack-objects'. This allows for extra testing of the feature
> without repeating all of the test scenarios.

This also allows the programmer on the C implementation side to be a
bit lazy, as the --full-name-hash option does not have to be plumbed
through from the end-user facing commands (like "bundle") down to
the underlying "pack-objects" command ;-).

As an end-user facing tweak mechanism, an environment variable is
the most clunky, followed by a configuration variable (which can be
used via "git -c" and exhibits the same clunkiness as an environment
variable), and a command line parameter is the most versatile in
allowing users to customize the behaviour per-invocation of the
commands.  So in the longer term, we probably want to plumb through
the option, like you did for "repack -> pack-objects" call chain,
for all end-user visible commands that call into pack-objects.

But for testing purposes, the solution presented here is of course
good enough.

> Second, there are two tests in t5616-partial-clone.sh that I believe are
> actually broken scenarios. While the client is set up to clone the
> 'promisor-server' repo via a treeless partial clone filter (tree:0),
> that filter does not translate to the 'server' repo. Thus, fetching from
> these repos causes the server to think that the client has all reachable
> trees and blobs from the commits advertised as 'haves'. This leads the
> server to providing a thin pack assuming those objects as delta bases.

In short, the tests are based on broken assumption and checking
bogus outcome?  Somebody familiar with the partial clone area should
probably take a look into it and fix the tests if that is the case.

> -	if (write_bitmap_index && use_full_name_hash)
> +	if (write_bitmap_index && use_full_name_hash > 0)
>  		die(_("currently, the --full-name-hash option is incompatible with --write-bitmap-index"));
> +	if (use_full_name_hash < 0)
> +		use_full_name_hash = git_env_bool("GIT_TEST_FULL_NAME_HASH", 0);

OK.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/19/24 6:22 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:

>> Second, there are two tests in t5616-partial-clone.sh that I believe are
>> actually broken scenarios. While the client is set up to clone the
>> 'promisor-server' repo via a treeless partial clone filter (tree:0),
>> that filter does not translate to the 'server' repo. Thus, fetching from
>> these repos causes the server to think that the client has all reachable
>> trees and blobs from the commits advertised as 'haves'. This leads the
>> server to providing a thin pack assuming those objects as delta bases.
>
> In short, the tests are based on broken assumption and checking
> bogus outcome?  Somebody familiar with the partial clone area should
> probably take a look into it and fix the tests if that is the case.

That is my understanding, yes. It is a common issue that I've had when
using partial clones with multiple remotes and forgetting to modify the
filter for each, so is also a usability wart that could use mending
(say, by adding the existing filter when adding a new remote).

Thanks,
-Stolee


static struct oidset excluded_by_config;
static int use_full_name_hash = -1;

static inline uint32_t pack_name_hash_fn(const char *name)
{
if (use_full_name_hash)
return pack_full_name_hash(name);
return pack_name_hash(name);
}

/*
* stats
Expand Down Expand Up @@ -1698,7 +1706,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type,
return 0;
}

create_object_entry(oid, type, pack_name_hash(name),
create_object_entry(oid, type, pack_name_hash_fn(name),
exclude, name && no_try_delta(name),
found_pack, found_offset);
return 1;
Expand Down Expand Up @@ -1912,7 +1920,7 @@ static void add_preferred_base_object(const char *name)
{
struct pbase_tree *it;
size_t cmplen;
unsigned hash = pack_name_hash(name);
unsigned hash = pack_name_hash_fn(name);

if (!num_preferred_base || check_pbase_path(hash))
return;
Expand Down Expand Up @@ -3422,7 +3430,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
* here using a now in order to perhaps improve the delta selection
* process.
*/
oe->hash = pack_name_hash(name);
oe->hash = pack_name_hash_fn(name);
oe->no_try_delta = name && no_try_delta(name);

stdin_packs_hints_nr++;
Expand Down Expand Up @@ -3572,7 +3580,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type
entry = packlist_find(&to_pack, oid);
if (entry) {
if (name) {
entry->hash = pack_name_hash(name);
entry->hash = pack_name_hash_fn(name);
entry->no_try_delta = no_try_delta(name);
}
} else {
Expand All @@ -3595,7 +3603,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type
return;
}

entry = create_object_entry(oid, type, pack_name_hash(name),
entry = create_object_entry(oid, type, pack_name_hash_fn(name),
0, name && no_try_delta(name),
pack, offset);
}
Expand Down Expand Up @@ -4426,6 +4434,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
N_("protocol"),
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
OPT_BOOL(0, "full-name-hash", &use_full_name_hash,
N_("optimize delta compression across identical path names over time")),
OPT_END(),
};

Expand Down Expand Up @@ -4573,6 +4583,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;

if (write_bitmap_index && use_full_name_hash > 0)
die(_("currently, the --full-name-hash option is incompatible with --write-bitmap-index"));
if (use_full_name_hash < 0)
use_full_name_hash = git_env_bool("GIT_TEST_FULL_NAME_HASH", 0);

if (use_delta_islands)
strvec_push(&rp, "--topo-order");

Expand Down
9 changes: 8 additions & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ static int run_update_server_info = 1;
static char *packdir, *packtmp_name, *packtmp;

static const char *const git_repack_usage[] = {
N_("git repack [<options>]"),
N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
"[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n"
"[--write-midx] [--full-name-hash]"),
NULL
};

Expand All @@ -57,6 +59,7 @@ struct pack_objects_args {
int no_reuse_object;
int quiet;
int local;
int full_name_hash;
struct list_objects_filter_options filter_options;
};

Expand Down Expand Up @@ -288,6 +291,8 @@ static void prepare_pack_objects(struct child_process *cmd,
strvec_pushf(&cmd->args, "--no-reuse-delta");
if (args->no_reuse_object)
strvec_pushf(&cmd->args, "--no-reuse-object");
if (args->full_name_hash)
strvec_pushf(&cmd->args, "--full-name-hash");
if (args->local)
strvec_push(&cmd->args, "--local");
if (args->quiet)
Expand Down Expand Up @@ -1176,6 +1181,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("pass --no-reuse-delta to git-pack-objects")),
OPT_BOOL('F', NULL, &po_args.no_reuse_object,
N_("pass --no-reuse-object to git-pack-objects")),
OPT_BOOL(0, "full-name-hash", &po_args.full_name_hash,
N_("pass --full-name-hash to git-pack-objects")),
OPT_NEGBIT('n', NULL, &run_update_server_info,
N_("do not run git-update-server-info"), 1),
OPT__QUIET(&po_args.quiet, N_("be quiet")),
Expand Down
1 change: 1 addition & 0 deletions ci/run-build-and-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ linux-TEST-vars)
export GIT_TEST_NO_WRITE_REV_INDEX=1
export GIT_TEST_CHECKOUT_WORKERS=2
export GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1
export GIT_TEST_FULL_NAME_HASH=1
;;
linux-clang)
export GIT_TEST_DEFAULT_HASH=sha1
Expand Down
21 changes: 21 additions & 0 deletions pack-objects.h
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,27 @@ static inline uint32_t pack_name_hash(const char *name)
return hash;
}

static inline uint32_t pack_full_name_hash(const char *name)
{
const uint32_t bigp = 1234572167U;
uint32_t c, hash = bigp;

if (!name)
return 0;

/*
* Do the simplest thing that will resemble pseudo-randomness: add
* random multiples of a large prime number with a binary shift.
* The goal is not to be cryptographic, but to be generally
* uniformly distributed.
*/
while ((c = *name++) != 0) {
hash += c * bigp;
hash = (hash >> 5) | (hash << 27);
}
return hash;
}

static inline enum object_type oe_type(const struct object_entry *e)
{
return e->type_valid ? e->type_ : OBJ_BAD;
Expand Down
4 changes: 4 additions & 0 deletions t/README
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,10 @@ a test and then fails then the whole test run will abort. This can help to make
sure the expected tests are executed and not silently skipped when their
dependency breaks or is simply not present in a new environment.

GIT_TEST_FULL_NAME_HASH=<boolean>, when true, sets the default name-hash
function in 'git pack-objects' to be the one used by the --full-name-hash
option.

Naming Tests
------------

Expand Down
24 changes: 24 additions & 0 deletions t/helper/test-name-hash.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* test-name-hash.c: Read a list of paths over stdin and report on their
* name-hash and full name-hash.
*/

#include "test-tool.h"
#include "git-compat-util.h"
#include "pack-objects.h"
#include "strbuf.h"

int cmd__name_hash(int argc UNUSED, const char **argv UNUSED)
{
struct strbuf line = STRBUF_INIT;

while (!strbuf_getline(&line, stdin)) {
uint32_t name_hash = pack_name_hash(line.buf);
uint32_t full_hash = pack_full_name_hash(line.buf);

printf("%10"PRIu32"\t%10"PRIu32"\t%s\n", name_hash, full_hash, line.buf);
}

strbuf_release(&line);
return 0;
}
1 change: 1 addition & 0 deletions t/helper/test-tool.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ static struct test_cmd cmds[] = {
{ "match-trees", cmd__match_trees },
{ "mergesort", cmd__mergesort },
{ "mktemp", cmd__mktemp },
{ "name-hash", cmd__name_hash },
{ "online-cpus", cmd__online_cpus },
{ "pack-mtimes", cmd__pack_mtimes },
{ "parse-options", cmd__parse_options },
Expand Down
1 change: 1 addition & 0 deletions t/helper/test-tool.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv);
int cmd__match_trees(int argc, const char **argv);
int cmd__mergesort(int argc, const char **argv);
int cmd__mktemp(int argc, const char **argv);
int cmd__name_hash(int argc, const char **argv);
int cmd__online_cpus(int argc, const char **argv);
int cmd__pack_mtimes(int argc, const char **argv);
int cmd__parse_options(int argc, const char **argv);
Expand Down
73 changes: 73 additions & 0 deletions t/perf/p5313-pack-objects.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> As custom options are added to 'git pack-objects' and 'git repack' to
> adjust how compression is done, use this new performance test script to
> demonstrate their effectiveness in performance and size.
>
> The recently-added --full-name-hash option swaps the default name-hash
> algorithm with one that attempts to uniformly distribute the hashes
> based on the full path name instead of the last 16 characters.
>
> This has a dramatic effect on full repacks for repositories with many
> versions of most paths. It can have a negative impact on cases such as
> pushing a single change.
>
> This can be seen by running pt5313 on the open source fluentui
> repository [1]. Most commits will have this kind of output for the thin
> and big pack cases, though certain commits (such as [2]) will have
> problematic thin pack size for other reasons.
>
> [1] https://github.com/microsoft/fluentui
> [2] a637a06df05360ce5ff21420803f64608226a875
>
> Checked out at the parent of [2], I see the following statistics:
>
> Test                                           this tree
> ------------------------------------------------------------------
> 5313.2: thin pack                              0.02(0.01+0.01)
> 5313.3: thin pack size                                    1.1K
> 5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
> 5313.5: thin pack size with --full-name-hash              3.0K
> 5313.6: big pack                               1.65(3.35+0.24)
> 5313.7: big pack size                                    58.0M
> 5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
> 5313.9: big pack size with --full-name-hash              57.6M
> 5313.10: repack                                176.52(706.60+3.53)
> 5313.11: repack size                                    446.7K
> 5313.12: repack with --full-name-hash          37.47(134.18+3.06)
> 5313.13: repack size with --full-name-hash              183.1K
>
> Note that this demonstrates a 3x size _increase_ in the case that
> simulates a small "git push". The size change is neutral on the case of
> pushing the difference between HEAD and HEAD~1000.
>
> However, the full repack case is both faster and more efficient.

Nice.

> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100755 t/perf/p5313-pack-objects.sh

"wc -c" -> "test_file_size" or "test-tool path-utils file-size"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/19/24 5:58 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:

>>   t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 71 insertions(+)
>>   create mode 100755 t/perf/p5313-pack-objects.sh
> > "wc -c" -> "test_file_size" or "test-tool path-utils file-size"?

Thanks. I was unfamiliar with those options.

I will squash this change into the next version. The use of a
variable for the packfile name was important at some point while
I was testing this with various repos on different platforms.

--- >8 ---

diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
index 5dcf52acb0..bf6f0d69e4 100755
--- a/t/perf/p5313-pack-objects.sh
+++ b/t/perf/p5313-pack-objects.sh
@@ -25,7 +25,7 @@ test_perf 'thin pack' '
 '

 test_size 'thin pack size' '
-	wc -c <out
+	test_file_size out
 '

 test_perf 'thin pack with --full-name-hash' '
@@ -33,7 +33,7 @@ test_perf 'thin pack with --full-name-hash' '
 '

 test_size 'thin pack size with --full-name-hash' '
-	wc -c <out
+	test_file_size out
 '

 test_perf 'big pack' '
@@ -41,7 +41,7 @@ test_perf 'big pack' '
 '

 test_size 'big pack size' '
-	wc -c <out
+	test_file_size out
 '

 test_perf 'big pack with --full-name-hash' '
@@ -49,7 +49,7 @@ test_perf 'big pack with --full-name-hash' '
 '

 test_size 'big pack size with --full-name-hash' '
-	wc -c <out
+	test_file_size out
 '

 test_perf 'repack' '
@@ -57,7 +57,8 @@ test_perf 'repack' '
 '

 test_size 'repack size' '
-	wc -c <.git/objects/pack/pack-*.pack
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
 '

 test_perf 'repack with --full-name-hash' '
@@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' '
 '

 test_size 'repack size with --full-name-hash' '
-	wc -c <.git/objects/pack/pack-*.pack
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
 '

 test_done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

> On 9/19/24 5:58 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
>>>   t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 71 insertions(+)
>>>   create mode 100755 t/perf/p5313-pack-objects.sh
>> "wc -c" -> "test_file_size" or "test-tool path-utils file-size"?
>
> Thanks. I was unfamiliar with those options.

I was, too ;-).  There are a few other test pieces that use "wc -c"
to count bytes, but it made me a bit nervous to use it to count
bytes in a binary file.

>  test_size 'repack size' '
> -	wc -c <.git/objects/pack/pack-*.pack
> +	pack=$(ls .git/objects/pack/pack-*.pack) &&
> +	test_file_size "$pack"
>  '

I am perfectly fine (and actually even prefer) to pay the cost to
fork "ls" here, but if you really wanted to avoid it, we could do
something like

	pack=$(
		set x .git/objects/pack/pack-*.pack &&
		test $# = 2 && echo "$2" || exit 1
	) &&
	test_file_size "$pack"

;-)

>  test_perf 'repack with --full-name-hash' '
> @@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' '
>  '
>
>  test_size 'repack size with --full-name-hash' '
> -	wc -c <.git/objects/pack/pack-*.pack
> +	pack=$(ls .git/objects/pack/pack-*.pack) &&
> +	test_file_size "$pack"
>  '
>
>  test_done


test_description='Tests pack performance using bitmaps'
. ./perf-lib.sh

GIT_TEST_PASSING_SANITIZE_LEAK=0
export GIT_TEST_PASSING_SANITIZE_LEAK

test_perf_large_repo

test_expect_success 'create rev input' '
cat >in-thin <<-EOF &&
$(git rev-parse HEAD)
^$(git rev-parse HEAD~1)
EOF

cat >in-big <<-EOF
$(git rev-parse HEAD)
^$(git rev-parse HEAD~1000)
EOF
'

test_perf 'thin pack' '
git pack-objects --thin --stdout --revs --sparse <in-thin >out
'

test_size 'thin pack size' '
test_file_size out
'

test_perf 'thin pack with --full-name-hash' '
git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out
'

test_size 'thin pack size with --full-name-hash' '
test_file_size out
'

test_perf 'big pack' '
git pack-objects --stdout --revs --sparse <in-big >out
'

test_size 'big pack size' '
test_file_size out
'

test_perf 'big pack with --full-name-hash' '
git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out
'

test_size 'big pack size with --full-name-hash' '
test_file_size out
'

test_perf 'repack' '
git repack -adf
'

test_size 'repack size' '
pack=$(ls .git/objects/pack/pack-*.pack) &&
test_file_size "$pack"
'

test_perf 'repack with --full-name-hash' '
git repack -adf --full-name-hash
'

test_size 'repack size with --full-name-hash' '
pack=$(ls .git/objects/pack/pack-*.pack) &&
test_file_size "$pack"
'

test_done
41 changes: 41 additions & 0 deletions t/perf/p5314-name-hash.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/bin/sh

test_description='Tests pack performance using bitmaps'
. ./perf-lib.sh

GIT_TEST_PASSING_SANITIZE_LEAK=0
export GIT_TEST_PASSING_SANITIZE_LEAK

test_perf_large_repo

test_size 'paths at head' '
git ls-tree -r --name-only HEAD >path-list &&
wc -l <path-list
'

test_size 'number of distinct name-hashes' '
cat path-list | test-tool name-hash >name-hashes &&
cat name-hashes | awk "{ print \$1; }" | sort -n | uniq -c >name-hash-count &&
wc -l <name-hash-count
'

test_size 'number of distinct full-name-hashes' '
cat name-hashes | awk "{ print \$2; }" | sort -n | uniq -c >full-name-hash-count &&
wc -l <full-name-hash-count
'

test_size 'maximum multiplicity of name-hashes' '
cat name-hash-count | \
sort -nr | \
head -n 1 | \
awk "{ print \$1; }"
'

test_size 'maximum multiplicity of fullname-hashes' '
cat full-name-hash-count | \
sort -nr | \
head -n 1 | \
awk "{ print \$1; }"
'

test_done
1 change: 0 additions & 1 deletion t/t0450/txt-help-mismatches
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ rebase
remote
remote-ext
remote-fd
repack
reset
restore
rev-parse
Expand Down
15 changes: 15 additions & 0 deletions t/t5300-pack-object.sh
Original file line number Diff line number Diff line change
Expand Up @@ -674,4 +674,19 @@ do
'
done

# The following test is not necessarily a permanent choice, but since we do not
# have a "name hash version" bit in the .bitmap file format, we cannot write the
# full-name hash values into the .bitmap file without risking breakage later.
#
# TODO: Make these compatible in the future and replace this test with the
# expected behavior when both are specified.
test_expect_success '--full-name-hash and --write-bitmap-index are incompatible' '
test_must_fail git pack-objects base --all \
--full-name-hash --write-bitmap-index 2>err &&
grep incompatible err &&

# --stdout option silently removes --write-bitmap-index
git pack-objects --stdout --all --full-name-hash --write-bitmap-index >out
'

test_done
Loading
Loading