Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how to reference remote objects #52

Closed
psss opened this issue Apr 16, 2019 · 22 comments
Closed

Define how to reference remote objects #52

psss opened this issue Apr 16, 2019 · 22 comments
Assignees
Labels
docs enhancement New feature or request

Comments

@psss
Copy link
Collaborator

psss commented Apr 16, 2019

Currently each node in the metadata tree structure has a unique identifier called name so it is possible to reference individual objects from other attributes. This can be useful for example for mapping test coverage where tests are referenced from requirements.

This works nicely within a single metadata tree. However, we need to define a clear way how to reference objects which are outside of the current metadata tree. For this we could combine repository url with object name using a defined separator.

Also, in order to prevent too long identifiers and repeating urls it would be good to provide a way how to define a common prefix or remote repository shortcut. Perhaps it could work in a similar way how remotes like origin are defined in git.

@psss psss added the enhancement New feature or request label Apr 16, 2019
@psss
Copy link
Collaborator Author

psss commented Apr 16, 2019

What about using url#path+name schema (similar to gerrit) where url would be fetch-able url of the git repository, path would be path from the repository root to the metadata tree root and name would be fmf object identifier? Path could be omitted if git root is the same directory as fmf root. A couple of examples:

  • https://src.fedoraproject.org/tests/selinux/#/libselinux/getsebool
  • https://src.fedoraproject.org/tests/selinux/#/some/path/+/libselinux/getsebool

@kkaarreell
Copy link
Contributor

Don't know details of fmf format, however it would be nice to have some aliases defined within the fmf file of those repo prefixes so one can use them to shorten the text. So that something like this would work
fmf_aliases:
selinux_repo: https://src.fedoraproject.org/tests/selinux/
/requirement
coverage: selinux_repo#/libselinux/getsebool
I think that the fact itself that the test path doesn't start with / could suggest it may be an alias.
Just braninstorming.

@psss
Copy link
Collaborator Author

psss commented Apr 16, 2019

Make sense. We could also store these aliases under the special .fmf directory so that they could be used across all metadata files.

@AloisMahdal
Copy link
Contributor

AloisMahdal commented Apr 17, 2019

https://src.fedoraproject.org/tests/selinux/#/libselinux/getsebool -- I woiuld avoid on schema that relies on delimiters after URL, unless the delimiter is safe to be never included in the url. Considering praxis above specification, I'm not sure if there are even such characters nowadays

What about this syntax (I think CSS/XPATH have something like this): url('https://foo.bar', '/some/path', '/some/name')?

Also, I'm not sure but can FMF reference files within the structure? If not, then the same syntax could be reused for that, eg. file('version.txt').

@jscotka
Copy link
Collaborator

jscotka commented Apr 29, 2019

Otherwise my plugin solves it: #54

  • references are based on regexp of name (path)
  • it could use implicit tree for referencing, or explicitly say where you should look for items by plugin option (eg. --plugin=reference:examples/plugin_resolver/:examples/wget)
  • you have to have everything dowloaded explicitly, so that you don't need to to solve some magic

@psss psss self-assigned this May 16, 2019
@psss
Copy link
Collaborator Author

psss commented May 16, 2019

Thanks to all for sharing your thoughts. To summarize, we need the following fragments in order to be able to address arbitrary nodes:

  • url — git repository url (in any form supported by git clone)
  • path — relative path from the git root to metadata tree root
  • revision — branch, tag or hash (defaults to master branch)
  • name — tree node name (unique fmf object identifier)

In order to make references concise and prevent unnecessary duplication we would also like to introduce aliases which would be stored under .fmf directory and would serve similarly as remotes in git. We could introduce a similar command to add a new alias:

fmf remote add alias full-address

Each remote has to define the url fragment, optionally it can define path and revision as well. Now about the syntax. There some examples of using git url with revision specification. For example pip with git:

git+https://myvcs.com/some_dependency@sometag#egg=SomeDependency

We could use that as a base and extend with additional fields using extra delimiters but it is true as Alois mentioned that introducing more special characters is not the best approach. He suggests this option:

url('https://foo.bar', 'branch', '/tree/path', '/node/name')

Another option could be to name individual fragments, something like providing a dictionary including key names:

(url='https://foo.bar', rev='branch', path='/tree/path', node='/node/name')

Something similar could be also possible using url and parameters:

https://git.repo?revision=branch&path=/tree/path&name=/node/name

I would be also good to provide a distinct way how to do relative references within a single git repo, something like:

/node/name@revision
/node/name@revision:/tree/path

Perhaps all references (even local within the same metadata tree) should be prefixed with a distinct string or special character to make it clear at the first sight that this is a reference and also to give a clear hint to the parser that it should (could) handle it in a special way. Not sure about the most suitable character.

@/node/name
@/node/name@remote

End of today's brainstorm :)

@AloisMahdal
Copy link
Contributor

When I suggested

url('https://foo.bar', 'branch', '/tree/path', '/node/name')

I actually only meant to suggest the general approach: have a small set of builtin functions that would usable to reference (or re-create) content. I didn't use a good example (I tried to incorporate all elements from your OP even though I did not really understand them.)

I'll try to better illustrate:

url('https://foo.bar')

...just returns body of whatever document there is. If you want to have git-specific reference (shouldn't just URL's let you do that, though?), then you can have another function for that, say:

git('https://foo.bar/some.git', 'branch', '/tree/path', '/node/name')

Maybe you could have other kinds of functions; one that feels like would be immediately useful is:

file('somewhere/in/fmf/tree')

or

fmf('some/other/node')

One nice property this approach has is that it's infinitely extensible. well, you probably don't want to have infinite functions, but the immediate benefit is that you can start with implementing just the simple ones (file(), url()) and then see if there are use cases for more, and think about what arguments should they have.

Maybe it could make sense to make it extensible on some kind of local versions of these functions, eg. you could have project or org specific libraries of extra functions that would help with facts about the project (rhel()...). (OTOH you don't want to turn fmf into a platform, do you...)


I'm not sure about the aliases. While it could save typing, it would introduce extra complexity, and even inconsistency: one project could have bz for their bugzilla but another project could use rhbz instead, etc.

That said, you could actually solve it by local functions:

rhbz(12345)
gh('psss/fmf', 52)
jira('OSCI-99')
selinux_repo('libselinux/getsebool')

which could work even when the reference is not a single argument, and even when the identifier has to be somewhere in the middle of URL.


About "relative references within a single git repo", I'm not sure what you mean but it sounds scary when you add revisions. You don't want to make it possible to reference ... to the past, do you? (Please say no and forgive me for asking.)

@psss
Copy link
Collaborator Author

psss commented May 21, 2019

@AloisMahdal, thanks much for your detailed thoughts. It seems the purpose of this issue is not completely clear. The intention was not to introduce new functions which would implement fetching/accessing objects but a generic specification how to reference (and possibly access) an fmf node which is not present in the local fmf tree. We will need this for example when test coverage for a single component is scattered across multiple repositories. When we have the spec defined we can discuss further details about handy functions. Last thought: Your idea of being able to reference to the past actually seems very interesting! But this is something very natural for all git repositories: Be able to reference different branches with different content.

@AloisMahdal
Copy link
Contributor

AloisMahdal commented May 22, 2019

Well all of my examples were about referencing (and possibly accessing) stuff. The "functions" I'm talking about here really are the references.

Eg, main.fmf:

jats:
    domain: my.example.org
    ns: foodept
    sut_pname: barpkg
    format: 0.0
    version: file('.jats/version')

would end up showing as

{jats: {domain:... ... version: 0.1.0}}

..if .jats/version contained that text. See? I'm just referencing a resource: content of a local text file. bz(1234) would be referencing bug 1234. And I could do a similar thing with the target being other node in another fmf repo.

--

Anyway, I might have misunderstood the use case, but

We will need this for example when test coverage for a single component is scattered across multiple repositories

does not illustrate the use case at all. How does the fact that coverage for single component is scattered, warrant the need for special kind of reference? What do you even need to reference where?

--

(It's also not clear to me what you said about the git references, although it's probably getting OT so I'll let it be.)

@psss
Copy link
Collaborator Author

psss commented Jan 8, 2020

Today we've discussed the format with @lukaszachy, @hegerj, @pvalena and @t184256 with the following proposal: Similar to what @AloisMahdal suggested with the (url='https://foo.bar'...) syntax but using yaml format for this. It could be one line like this:

{url: 'https://git.repo', revision: 'branch', path: '/tree/path', name: '/node/name'}

Revision would default to master and path to / so the short version would be:

{url: 'https://git.repo', name: '/node/name'}

This is just a dictionary so multi line could be used as well where suitable:

url: 'https://git.repo'
revision: 'branch'
path: '/tree/path'
name: '/node/name'

As fmf is already using yaml for storing all data this would be a consistent choice. Thoughts?

@kkaarreell
Copy link
Contributor

@psss, could you please provide wider context in the example? Is this supposed to be a part of a remote object reference or a part of a definition of an alias?

@psss
Copy link
Collaborator Author

psss commented Jan 10, 2020

The examples above show how the complete unique fmf identifier would look like. For aliases I imagine definition could look like this:

tmt remote add <alias> <url>
tmt remote add <alias> <url> --path <directory> --revision <refspec>

Let's add now a remote tmt metadata tree with tests:

tmt remote add tests https://src.fedoraproject.org/tests/wget --revision devel

Brainstorming possible options how to reference remote objects inside fmf files:

/user/story:
    summary: As a user I want to safely download file.
    tested: {name: /tests/download/https, remote: tests}

We could have some special characaters introduced to make is shorter (but then those characters cannot be used in path):

/user/story:
    summary: As a user I want to safely download file.
    tested: /tests/download/https@tests

Here's a full variant without using the alias:

/user/story:
    summary: As a user I want to safely download file.
    tested: {url: 'https://src.fedoraproject.org/tests/wget', revision: 'devel', name: '/node/name'}

Of course it is still possible to use a multiline dictionary as well:

/user/story:
    summary: As a user I want to safely download file.
    tested:
        name: /node/name
        url: https://git.repo
        revision: branch

Finally, referencing local test would be simple as this:

/user/story:
    summary: As a user I want to safely download file.
    tested: /tests/download/https

Thoughts?

@psss
Copy link
Collaborator Author

psss commented Jan 20, 2020

OK, so it seems we have a concensus here. @kkaarreell, @AloisMahdal, @jscotka, @t184256, @thrix, @hegerj, can you confirm the proposal above is fine with you as well?

Before I'm going to prepare a pull request with the specification last question about the key naming: Shall we make it aligned / consistent with some existing config / tooling?

For example ansible git module is using repo for the url and version for the revision part. Also in standard-test-roles repo is used for repository url and version is used for branch/tag selection of the fmf repository. Finally in the tmt discover step example we have repository and revision (no spec has been created for that yet).

  • repository location brainstorm: url, repo, repository
  • branch/tag/commit name brainstorm: revision, version, reference, rev, ver, ref

Shall we stay consistent with ansible/str and use repo and version as well? Thoughts? Preferences?

@t184256
Copy link

t184256 commented Jan 20, 2020

Fine by me.

'Repository' and 'revision' are explicit and the meaning is clear, while 'reference' is, IIRC, a term for some internal git magic, which is different from revisions.

@comps
Copy link

comps commented Jan 20, 2020

As far as git terminology goes, "refspec" is specific to git-push/pull/remote and usually has the <src>:<dst> format, so I wouldn't use that. A simple "ref" or "reference" (see ref in gitglossary(7)) is about as accurate as you can get. Note that it can refer to more than just branches (ie. tags) and can have multiple specifications (like ie. refs/heads/master and master). On github, you can even reference pull requests as refs (ie. refs/pulls/123/head). See also gitrevisions(7), section SPECIFYING REVISIONS, subsection <refname>.

One thing I would definitely not do is try to merge it into the URL (like ie. Beaker Project did) as, like a filename on a filesystem, a git ref can have special characters in its name.

@psss
Copy link
Collaborator Author

psss commented Jan 20, 2020

Thanks for clarification, @comps. We would like to allow specifying individual commits too. Do you think that would fit under the ref umbrella as well? The description in gitglossary suggests that probably not. But ref or reference sounds to me quite natural for branches, tags and commits as well.

@comps
Copy link

comps commented Jan 20, 2020

Thanks for clarification, @comps. We would like to allow specifying individual commits too. Do you think that would fit under the ref umbrella as well? The description in gitglossary suggests that probably not. But ref or reference sounds to me quite natural for branches, tags and commits as well.

The git protocol (well, git-upload-pack protocol) does not support that by default, for security (DoS) reasons. The protocol works by the client giving the server a ref name it would like to have (ie. refs/heads/master), as checked/demonstrated by git-ls-remote, along with the latest commit hash the client has. The server then looks at the ref and sends any missing objects to the client. The client thus never actually "requests" objects.
If the client gave the server a hash ID directly, the server would have to check that it's reachable from one of the refs (to prevent security issues accessing a dangling object), which is a resource intensive operation.

Git v2.5.0 introduced this feature,

 * "git upload-pack" that serves "git fetch" can be told to serve
   commits that are not at the tip of any ref, as long as they are
   reachable from a ref, with uploadpack.allowReachableSHA1InWant
   configuration variable.

but virtually all major hosts (github, gitlab, etc.) have it disabled.

You can however support it transparently just by giving git-fetch the SHA hash in place of the ref.

Alternatively, if you always fully clone the remote repository (possibly with --mirror), you can access individual objects by ID. Similarly, if you give the user a way to specify both ref and a commit hash that is (supposedly) reachable via that ref, you can just fetch the ref to get access to the commit.

@pvalena
Copy link

pvalena commented Jan 20, 2020

Well, currently we simply clone it, right?

But, in case we want to optimise it, there should be no change from user-perspective, right?
As an optimization we could try to detect if ref is not a commit and checkout/fetch only that, or can git fetch do that for us transparently?

Could you possibly provide some example working with github.com?

@comps
Copy link

comps commented Jan 21, 2020

The problem is that refs and objects are very different things on the git internals level. Only some higher-level tools like git-log, git-show, etc. allow you to specify a commit hash in place of a ref. So there's really no easy way to check if "ref is not a commit" because a ref is never a commit. The higher-level tools work only on the local repo, so they can easily (and fast) check that ie. 1234abc is not a ref name by looking up the local ref database (accessible to users as git-show-ref) and if 1234abc is not there, treat it as an commit object.
You can do this remotely via git-ls-remote as well, but it's slow as you need to contact the remote, but that's a possible optimization compared to just fully cloning the repo with all branches, tags, etc.

I don't know how much you can change your current implementation w.r.t. user expectations or if you have some caches for the cloned repositories, but I would recommend unifying both ref and commit hash as:

url: 'https://git.repo'
ref: 'master'
commit: '123abcd'
path: '/tree/path'
name: '/node/name'

With the commit being optional (the tip of ref would be used by default). If commit is specified, it must be reachable via ref, or you get an error. If the use case is using an older version of some branch, this specification seems perfectly fine.

$ git init --bare delme; cd delme
Initialized empty Git repository in /home/user/delme/

$ git fetch -q https://github.com/psss/fmf.git devel

$ git cat-file blob FETCH_HEAD:docs/examples.rst > file1

$ git cat-file blob 6d3f6e971:docs/examples.rst > file2

$ git cat-file blob 123abc:docs/examples.rst
fatal: Not a valid object name 123abc:docs/examples.rst

$ diff -u file2 file1
--- file2       2020-01-21 12:29:56.469002947 +0100
+++ file1       2020-01-21 12:29:39.378840700 +0100
@@ -173,6 +173,7 @@
             /block_map_cache_size

 __ https://github.com/jkrysl/storage_setup
+
 You can find here not only how to use FMF for setup/cleanup
 and group tests based on that, but also installing requirements,
 passing values from metadata to tests themself and much more.

(You can re-use the same local repo for multiple fetches of unrelated remotes.)

@psss
Copy link
Collaborator Author

psss commented Jan 21, 2020

Thanks much for detailed feedback, @comps! Your proposal seems fine to me. For completeness I'm attaching description of ansible version field from the git module:

What version of the repository to check out. This can be the literal string HEAD, a branch name, a tag name. It can also be a SHA-1 hash, in which case refspec needs to be specified if the given revision is not already available.

Interesting that they are mentioning refspec in this context. I thought that one is used for "mapping between remote ref and local ref" as gitglossary says :)

@comps
Copy link

comps commented Jan 21, 2020

Interesting that they are mentioning refspec in this context. I thought that one is used for "mapping between remote ref and local ref" as gitglossary says :)

That's probably because refspec is what git-fetch(1) calls it. Note, however, that even for git-fetch, it can mean multiple things - it can be a ref, but also a refspec like src:dst.
If you look into your .git/config, you'll find something like

[remote "origin"]
        url = https://github.com/psss/fmf/
        fetch = +refs/heads/*:refs/remotes/origin/*

The fetch contains the default refspec for the origin remote. It means that whenever you git fetch origin (or pull), any refs contained in refs/heads/* on the remote end will get mirrored to refs/remotes/origin/* in your local repo.

$ ls .git/refs/remotes/origin/
HEAD

$ cat .git/refs/remotes/origin/HEAD 
ref: refs/remotes/origin/master

$ grep 'refs/remotes/origin/master' .git/packed-refs 
f24ef3f85195e1e6afdc8e290d95b1bddc12bfa4 refs/remotes/origin/master

(packed-refs is just a single-file representation of .git/refs/, created because of slow (NTFS) filesystems)

I really would just call it ref unless you want the user to manage src:dst pairs.

@psss
Copy link
Collaborator Author

psss commented Feb 6, 2020

Definition (for now without the commit key) ready for review in #71. Please, have a look. Thanks.

psss added a commit that referenced this issue Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants