Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Start documenting updateinfo.xml #1523

Conversation

stewartsmith
Copy link
Contributor

The yum/dnf repository format isn't as well documented as one would necessarily like.

For updateinfo.xml specifically, it is currently parsed by many third party tooling for the purpose of gathering information about updates applicable to Linux distributions using yum/dnf repositories. Since the format of updateinfo.xml has not been well defined over the years, this presents challenges to authors of these tools.

With clear documentation on what exists in the wild, all authors of tools producing or parsing updateinfo.xml can refer to one canonical place for what exists in the wild and how to process it.


In order to get to this schema, I've gathered as many samples of updateinfo.xml as I can find. That's about 189 files and 1.8GB - which likely should not sit in the dnf5 repository, but should probably sit somewhere.

Something that should come from that data set is various smaller snippets of updateinfo.xml that can be used by various tools for testing.

This is the list of the files I've used for validating the schema:
updateinfo-xml-urls.txt

If anyone can think of any more, especially ones that may be different in some interesting way, please let me know!

I have the intent to create a version of the schema that is a Strict variant - with the view towards using that for code that produces updateinfo.xml. I'm putting this PR up first for initial feedback.


One question I have is if the dnf5 repository is the right location for this. @Conan-Kudo assembled https://pagure.io/rpm-metadata/ from existing sources, but I did not find anywhere with complete documentation on the updateinfo.xml schema and what variations existed in the wild.

doc/repository_format/introduction.rst Outdated Show resolved Hide resolved
doc/repository_format/introduction.rst Outdated Show resolved Hide resolved
@Conan-Kudo
Copy link
Member

If we want to have a canonical rpm-metadata repository, we could move my archive of files to this org and start figuring out proper documentation and evolution of the repodata format.

The `yum`/`dnf` repository format isn't as well documented as one would
necessarily like.

For `updateinfo.xml` specifically, it is currently parsed by many third
party tooling for the purpose of gathering information about updates
applicable to Linux distributions using `yum`/`dnf` repositories. Since
the format of `updateinfo.xml` has not been well defined over the years,
this presents challenges to authors of these tools.

With clear documentation on what exists in the wild, all authors of
tools producing or parsing `updateinfo.xml` can refer to one canonical
place for what exists in the wild and how to process it.

Signed-off-by: Stewart Smith <[email protected]>
@jcpunk
Copy link

jcpunk commented Jun 5, 2024

I offer up https://github.com/fermitools/python-Updateinfo/blob/main/docs/updateinfo.xsd as I've been using it for years.

@stewartsmith
Copy link
Contributor Author

I offer up https://github.com/fermitools/python-Updateinfo/blob/main/docs/updateinfo.xsd as I've been using it for years.

Having a look at that, I notice a few differences from my XSD that would prevent validation of a number of updateinfo.xml files I have found in the wild. There's at least a couple of things you have that I missed, so I'm interested where they've been observed so I can add it to the corpus of real-world examples.

The unique constraint on an update ID doesn't work for all updateinfo.xml in the wild, I tried that in mine and found that Fedora 17 (at least) has duplicates.

The from attribute I've found missing in the wild, so the required constraint in your XSD wouldn't pass all those in the wild (I forget which one I found this in though).

There's also author which is in at least some of the Amazon Linux produced updateinfo.xml samples I've looked at, I don't see that in your XSD.

The version attribute is a fun one. I had thought the same as what you have, that there was a limited number of values there. I found that the story was way weirder. It's effectively meaningless given Alma, OpenSuSE, Oracle, and Rocky.

For the update type, I've seen a couple more in the wild than you cover. These are unspecified and update, although you have errata, and so either I've missed that somewhere or it's not present anywhere. Any idea where that came from?

I found that Rocky has the <description> twice, so the use of xs:all won't allow those to pass.

I've also seen <pushcount> in Rocky, <message> in OpenSuSE.

For reboot_suggested and friends, I've seen both True and 1 used, I'm not certain that the xs:boolean you use accepts that?

It looks like you have a more complete definition / list of the various values of status, as I just left it as a string.

The severity field is also fun. You cover the capitalization, but I've found that there's also Medium (equivalent to Moderate), and an empty string.

I also just noticed that I missed documenting the fact that seconds since the epoch has been part of the OpenSuSE metadata as a date format, i.e. there are 5 formats of this rather than 4. You do mention it here though.

For reference, I've seen some extra types in the wild. Your schema covers commit (which I missed, so would be interested where this has been observed), bugzilla, cve, self, and other. In addition, I've seen jira, fate, github, launchpad, sourceforge, rhsa and redhat.

For the Epoch, I agree on it being messy :) I have seen the value of None in the Fedora 10 metadata.

@jcpunk
Copy link

jcpunk commented Jun 7, 2024

As for the unique constraint, yum spits out warning messages when you've got duplicate IDs.

I found most of those fields back in 2017. I'm not sure I've still got the samples I drew from. Its been so long since I looked a these my memory is a bit fuzzy. I'd say if you can't corroborate it today the sample I used probably bit rotted out.

https://github.com/PackageKit/PackageKit/blob/main/backends/zypp/pk-backend-zypp.cpp#L2496 might it be worth adding distupgrade?

@Conan-Kudo
Copy link
Member

There's now an rpm-metadata repo that this kind of documentation would be appropriate to target.

@stewartsmith
Copy link
Contributor Author

Going to close this off in favor of rpm-software-management/rpm-metadata#2 as that's certainly the better place to have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants