Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libfetchers/git: Support export-ignore #9480

Merged
merged 17 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions doc/manual/rl-next/git-fetcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
synopsis: "Nix now uses `libgit2` for Git fetching"
prs:
- 9240
- 9241
- 9258
- 9480
issues:
- 5313
---

Nix has built-in support for fetching sources from Git, during evaluation and locking; outside the sandbox.
The existing implementation based on the Git CLI had issues regarding reproducibility and performance.

Most of the original `fetchGit` behavior has been implemented using the `libgit2` library, which gives the fetcher fine-grained control.

Known issues:
- The `export-subst` behavior has not been reimplemented. [Partial](https://github.com/NixOS/nix/pull/9391#issuecomment-1872503447) support for this Git feature is feasible, but it did not make the release window.
17 changes: 16 additions & 1 deletion src/libexpr/primops/fetchTree.cc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#include "libfetchers/attrs.hh"
#include "primops.hh"
#include "eval-inline.hh"
#include "eval-settings.hh"
Expand Down Expand Up @@ -25,7 +26,7 @@ void emitTreeAttrs(
{
assert(input.isLocked());

auto attrs = state.buildBindings(10);
auto attrs = state.buildBindings(100);

state.mkStorePathString(storePath, attrs.alloc(state.sOutPath));

Expand Down Expand Up @@ -135,6 +136,10 @@ static void fetchTree(
state.symbols[attr.name], showType(*attr.value)));
}

if (params.isFetchGit && !attrs.contains("exportIgnore") && (!attrs.contains("submodules") || !*fetchers::maybeGetBoolAttr(attrs, "submodules"))) {
attrs.emplace("exportIgnore", Explicit<bool>{true});
}

if (!params.allowNameArgument)
if (auto nameIter = attrs.find("name"); nameIter != attrs.end())
state.debugThrowLastTrace(EvalError({
Expand All @@ -152,6 +157,9 @@ static void fetchTree(
fetchers::Attrs attrs;
attrs.emplace("type", "git");
attrs.emplace("url", fixGitURL(url));
if (!attrs.contains("exportIgnore") && (!attrs.contains("submodules") || !*fetchers::maybeGetBoolAttr(attrs, "submodules"))) {
attrs.emplace("exportIgnore", Explicit<bool>{true});
}
input = fetchers::Input::fromAttrs(std::move(attrs));
} else {
if (!experimentalFeatureSettings.isEnabled(Xp::Flakes))
Expand Down Expand Up @@ -593,6 +601,13 @@ static RegisterPrimOp primop_fetchGit({

A Boolean parameter that specifies whether submodules should be checked out.

- `exportIgnore` (default: `true`)

A Boolean parameter that specifies whether `export-ignore` from `.gitattributes` should be applied.
This approximates part of the `git archive` behavior.

Enabling this option is not recommended because it is unknown whether the Git developers commit to the reproducibility of `export-ignore` in newer Git versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to say that enabling this option is not recommended, and then having the default as "enabled". Probably better to say "We recommend disabling this option because bla bla".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not enabled for fetchTree though; just for fetchGit.

(And we can't unrecommend fetchGit until this is stable.)


- `shallow` (default: `false`)

A Boolean parameter that specifies whether fetching from a shallow remote repository is allowed.
Expand Down
7 changes: 7 additions & 0 deletions src/libfetchers/fetchers.hh
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,13 @@ struct InputScheme
virtual bool isDirect(const Input & input) const
{ return true; }

/**
* A sufficiently unique string that can be used as a cache key to identify the `input`.
*
* Only known-equivalent inputs should return the same fingerprint.
*
* This is not a stable identifier between Nix versions, but not guaranteed to change either.
*/
virtual std::optional<std::string> getFingerprint(ref<Store> store, const Input & input) const
{ return std::nullopt; }
};
Expand Down
9 changes: 9 additions & 0 deletions src/libfetchers/filtering-input-accessor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,13 @@ ref<AllowListInputAccessor> AllowListInputAccessor::create(
return make_ref<AllowListInputAccessorImpl>(next, std::move(allowedPaths), std::move(makeNotAllowedError));
}

bool CachingFilteringInputAccessor::isAllowed(const CanonPath & path)
{
auto i = cache.find(path);
if (i != cache.end()) return i->second;
auto res = isAllowedUncached(path);
cache.emplace(path, res);
return res;
}

}
16 changes: 15 additions & 1 deletion src/libfetchers/filtering-input-accessor.hh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
namespace nix {

/**
* A function that should throw an exception of type
* A function that returns an exception of type
* `RestrictedPathError` explaining that access to `path` is
* forbidden.
*/
Expand Down Expand Up @@ -71,4 +71,18 @@ struct AllowListInputAccessor : public FilteringInputAccessor
using FilteringInputAccessor::FilteringInputAccessor;
};

/**
* A wrapping `InputAccessor` mix-in where `isAllowed()` caches the result of virtual `isAllowedUncached()`.
*/
struct CachingFilteringInputAccessor : FilteringInputAccessor
{
std::map<CanonPath, bool> cache;

using FilteringInputAccessor::FilteringInputAccessor;

bool isAllowed(const CanonPath & path) override;

virtual bool isAllowedUncached(const CanonPath & path) = 0;
};

}
129 changes: 122 additions & 7 deletions src/libfetchers/git-utils.cc
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
#include "git-utils.hh"
#include "fs-input-accessor.hh"
#include "input-accessor.hh"
#include "filtering-input-accessor.hh"
#include "cache.hh"
#include "finally.hh"
#include "processes.hh"
#include "signals.hh"

#include <boost/core/span.hpp>

#include <git2/attr.h>
#include <git2/blob.h>
#include <git2/commit.h>
#include <git2/config.h>
Expand All @@ -21,6 +24,7 @@
#include <git2/submodule.h>
#include <git2/tree.h>

#include <iostream>
#include <unordered_set>
#include <queue>
#include <regex>
Expand Down Expand Up @@ -50,6 +54,8 @@ bool operator == (const git_oid & oid1, const git_oid & oid2)

namespace nix {

struct GitInputAccessor;

// Some wrapper types that ensure that the git_*_free functions get called.
template<auto del>
struct Deleter
Expand Down Expand Up @@ -307,7 +313,7 @@ struct GitRepoImpl : GitRepo, std::enable_shared_from_this<GitRepoImpl>
return std::nullopt;
}

std::vector<std::tuple<Submodule, Hash>> getSubmodules(const Hash & rev) override;
std::vector<std::tuple<Submodule, Hash>> getSubmodules(const Hash & rev, bool exportIgnore) override;

std::string resolveSubmoduleUrl(
const std::string & url,
Expand Down Expand Up @@ -340,7 +346,14 @@ struct GitRepoImpl : GitRepo, std::enable_shared_from_this<GitRepoImpl>
return true;
}

ref<InputAccessor> getAccessor(const Hash & rev) override;
/**
* A 'GitInputAccessor' with no regard for export-ignore or any other transformations.
*/
ref<GitInputAccessor> getRawAccessor(const Hash & rev);

ref<InputAccessor> getAccessor(const Hash & rev, bool exportIgnore) override;

ref<InputAccessor> getAccessor(const WorkdirInfo & wd, bool exportIgnore, MakeNotAllowedError e) override;

static int sidebandProgressCallback(const char * str, int len, void * payload)
{
Expand Down Expand Up @@ -456,6 +469,9 @@ ref<GitRepo> GitRepo::openRepo(const CanonPath & path, bool create, bool bare)
return make_ref<GitRepoImpl>(path, create, bare);
}

/**
* Raw git tree input accessor.
*/
struct GitInputAccessor : InputAccessor
{
ref<GitRepoImpl> repo;
Expand Down Expand Up @@ -644,17 +660,114 @@ struct GitInputAccessor : InputAccessor
}
};

ref<InputAccessor> GitRepoImpl::getAccessor(const Hash & rev)
struct GitExportIgnoreInputAccessor : CachingFilteringInputAccessor {
ref<GitRepoImpl> repo;
std::optional<Hash> rev;

GitExportIgnoreInputAccessor(ref<GitRepoImpl> repo, ref<InputAccessor> next, std::optional<Hash> rev)
: CachingFilteringInputAccessor(next, [&](const CanonPath & path) {
return RestrictedPathError(fmt("'%s' does not exist because it was fetched with exportIgnore enabled", path));
})
, repo(repo)
, rev(rev)
{ }

bool gitAttrGet(const CanonPath & path, const char * attrName, const char * & valueOut)
{
const char * pathCStr = path.rel_c_str();

if (rev) {
git_attr_options opts = GIT_ATTR_OPTIONS_INIT;
opts.attr_commit_id = hashToOID(*rev);
// TODO: test that gitattributes from global and system are not used
// (ie more or less: home and etc - both of them!)
opts.flags = GIT_ATTR_CHECK_INCLUDE_COMMIT | GIT_ATTR_CHECK_NO_SYSTEM;
return git_attr_get_ext(
&valueOut,
*repo,
&opts,
pathCStr,
attrName
);
}
else {
return git_attr_get(
&valueOut,
*repo,
GIT_ATTR_CHECK_INDEX_ONLY | GIT_ATTR_CHECK_NO_SYSTEM,
pathCStr,
attrName);
}
}

bool isExportIgnored(const CanonPath & path)
{
const char *exportIgnoreEntry = nullptr;

// GIT_ATTR_CHECK_INDEX_ONLY:
// > It will use index only for creating archives or for a bare repo
// > (if an index has been specified for the bare repo).
// -- https://github.com/libgit2/libgit2/blob/HEAD/include/git2/attr.h#L113C62-L115C48
if (gitAttrGet(path, "export-ignore", exportIgnoreEntry)) {
if (git_error_last()->klass == GIT_ENOTFOUND)
return false;
else
throw Error("looking up '%s': %s", showPath(path), git_error_last()->message);
}
else {
// Official git will silently reject export-ignore lines that have
// values. We do the same.
return GIT_ATTR_IS_TRUE(exportIgnoreEntry);
}
}

bool isAllowedUncached(const CanonPath & path) override
{
return !isExportIgnored(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the performance penalty for export-ignore lookups? Should this be cached? The lazy-trees branch has a CachingFilteringInputAccessor that caches isAllowed(), which might be useful here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about 2× as long on my local nixpkgs clone. Not great.
Caching could reduce the overhead by a factor three, so we may expect no better than 1.3× from that.
Using the batch variation of the libgit2 call could bring down the overhead some more though. That makes the lookups more eager, which in turns means that it's not a great match with the CachingFilteringInputAccessor interface ("protected" interface part as it were).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I've picked CachingFilteringInputAccessor from the diff, and that brought it down to a 15% penalty.

  • batch variation

    I misremembered while libgit2.org was down. It retrieves multiple attrs, not multiple files, so this is not a solution.

  • New idea: turn off filter when export-ignore is not present at all

    I've tried this, but it seems to make fetching Nixpkgs with fetchGit slower (local repo, no fetcher cache, probably low single digit %) That's probably because I did an extra traversal of the whole repo before returning the accessor.

    It's possible that a per-directory, on-the-fly approach does go below that 15%, but I don't have a lot of confidence right now.

  • Another possible strategy:
    Cache it in a table like (parent_dir, filename, allow_bool), which could be queried quite efficiently for either the whole repo or particular directories when lazy. Nonetheless, there's a risk that it's not much better, and this is a lot more work to pull of, so again I'd say not for this release.

Conclusion: will stop optimizing now.

}

};

ref<GitInputAccessor> GitRepoImpl::getRawAccessor(const Hash & rev)
{
return make_ref<GitInputAccessor>(ref<GitRepoImpl>(shared_from_this()), rev);
auto self = ref<GitRepoImpl>(shared_from_this());
return make_ref<GitInputAccessor>(self, rev);
}

std::vector<std::tuple<GitRepoImpl::Submodule, Hash>> GitRepoImpl::getSubmodules(const Hash & rev)
ref<InputAccessor> GitRepoImpl::getAccessor(const Hash & rev, bool exportIgnore)
{
auto self = ref<GitRepoImpl>(shared_from_this());
ref<GitInputAccessor> rawGitAccessor = getRawAccessor(rev);
if (exportIgnore) {
return make_ref<GitExportIgnoreInputAccessor>(self, rawGitAccessor, rev);
}
else {
return rawGitAccessor;
}
}

ref<InputAccessor> GitRepoImpl::getAccessor(const WorkdirInfo & wd, bool exportIgnore, MakeNotAllowedError makeNotAllowedError)
{
auto self = ref<GitRepoImpl>(shared_from_this());
ref<InputAccessor> fileAccessor =
AllowListInputAccessor::create(
makeFSInputAccessor(path),
std::set<CanonPath> { wd.files },
std::move(makeNotAllowedError));
if (exportIgnore) {
return make_ref<GitExportIgnoreInputAccessor>(self, fileAccessor, std::nullopt);
}
else {
return fileAccessor;
}
}

std::vector<std::tuple<GitRepoImpl::Submodule, Hash>> GitRepoImpl::getSubmodules(const Hash & rev, bool exportIgnore)
{
/* Read the .gitmodules files from this revision. */
CanonPath modulesFile(".gitmodules");

auto accessor = getAccessor(rev);
auto accessor = getAccessor(rev, exportIgnore);
if (!accessor->pathExists(modulesFile)) return {};

/* Parse it and get the revision of each submodule. */
Expand All @@ -665,8 +778,10 @@ std::vector<std::tuple<GitRepoImpl::Submodule, Hash>> GitRepoImpl::getSubmodules

std::vector<std::tuple<Submodule, Hash>> result;

auto rawAccessor = getRawAccessor(rev);

for (auto & submodule : parseSubmodules(CanonPath(pathTemp))) {
auto rev = accessor.dynamic_pointer_cast<GitInputAccessor>()->getSubmoduleRev(submodule.path);
auto rev = rawAccessor->getSubmoduleRev(submodule.path);
result.push_back({std::move(submodule), rev});
}

Expand Down
7 changes: 5 additions & 2 deletions src/libfetchers/git-utils.hh
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#pragma once

#include "filtering-input-accessor.hh"
#include "input-accessor.hh"

namespace nix {
Expand Down Expand Up @@ -57,7 +58,7 @@ struct GitRepo
* Return the submodules of this repo at the indicated revision,
* along with the revision of each submodule.
*/
virtual std::vector<std::tuple<Submodule, Hash>> getSubmodules(const Hash & rev) = 0;
virtual std::vector<std::tuple<Submodule, Hash>> getSubmodules(const Hash & rev, bool exportIgnore) = 0;

virtual std::string resolveSubmoduleUrl(
const std::string & url,
Expand All @@ -71,7 +72,9 @@ struct GitRepo

virtual bool hasObject(const Hash & oid) = 0;

virtual ref<InputAccessor> getAccessor(const Hash & rev) = 0;
virtual ref<InputAccessor> getAccessor(const Hash & rev, bool exportIgnore) = 0;

virtual ref<InputAccessor> getAccessor(const WorkdirInfo & wd, bool exportIgnore, MakeNotAllowedError makeNotAllowedError) = 0;

virtual void fetch(
const std::string & url,
Expand Down
Loading
Loading