From 62d1245ec1eae275edb996229585637035c02538 Mon Sep 17 00:00:00 2001 From: Anderson Torres Date: Mon, 8 Jul 2024 12:44:36 -0300 Subject: [PATCH] [RFC 0146] Meta.Categories, not Filesystem Directory Trees (#146) * Meta.Categories, not Filesystem Directory Trees * Whitespace cleanup * Add a short answer to the bikeshedding problem * Add a short line on "Do nothing" alternative * Extend an answer for the "Ignore/nuke" alternative * Add "update ci" to future work * Add a repl-like interaction example * Add more arguments for categorization and against its nuking * small rewording * Add an option of category data structure * reorder arguments against nuking * add argument for usefulness of categorization * add drawback * rework nuke argument * update metainfo - shepherd team and leader * Add prior art section Also, add extra links * typo * Categorization Team * Remove the optional data structure * typo * reword the creation of a team * Debtags FAQ * Update rules and duties of categorization team * The team shall have authority to carry out their duties * A section for the team * Appstream as prior art * Section for code implementation * Move categorization team to implementation * Update future work * A hybrid approach to be considered by the future team * extra duties for the team * reword duties from team * typo * A semantic detail: treat the first element of meta.categories as most important * Move hybrid approach to alternatives section * identify AndersonTorres' tag * Suggestions from FCP * remove infinisil from shepherd team * add an extra reference to the categorization team --- rfcs/0146-meta-categories.md | 356 +++++++++++++++++++++++++++++++++++ 1 file changed, 356 insertions(+) create mode 100755 rfcs/0146-meta-categories.md diff --git a/rfcs/0146-meta-categories.md b/rfcs/0146-meta-categories.md new file mode 100755 index 000000000..589ce7220 --- /dev/null +++ b/rfcs/0146-meta-categories.md @@ -0,0 +1,356 @@ +--- +feature: Decouple filesystem from categorization +start-date: 2023-04-23 +author: Anderson Torres (@AndersonTorres) +co-authors: +shepherd-team: @7c6f434c @natsukium @fgaz +shepherd-leader: @7c6f434c +related-issues: (will contain links to implementation PRs) +--- + +# Summary +[summary]: #summary + +Deploy a new method of categorization for the packages maintained by Nixpkgs, +not relying on filesystem idiosyncrasies. + +# Motivation +[motivation]: #motivation + +Currently, Nixpkgs uses the filesystem, or more accurately, the directory tree +layout in order to informally categorize the softwares it packages, as described +in the [Hierarchy](https://nixos.org/manual/nixpkgs/stable/#sec-hierarchy) +section of Nixpkgs manual. + +This is a simple, easy to understand and consecrated-by-use method of +categorization, partially employed by many other package managers like GNU Guix +and NetBSD pkgsrc. + +However this system of categorization has serious problems: + +1. It is bounded by the constraints imposed by the filesystem. + + - Restrictions on filenames, subdirectory tree depth, permissions, inodes, + quotas, and many other things. + - Some of these restrictions are not well documented and are found simply + by "bumping" on them. + - The restrictions can vary on an implementation basis. + - Some filesystems have more restrictions or less features than others, + forcing an uncomfortable lowest common denominator. + - Some operating systems can impose additional constraints over otherwise + full-featured filesystems because of backwards compatibility (8 dot + 3, anyone?). + +2. It requires a local checkout of the tree. + + Certainly this checkout can be "cached" using some form of `find . > + /tmp/pkgs-listing.txt`, or more sophisticated solutions like `locate + + updatedb`. Nonetheless such solutions still require access to a fresh, + updated copy of the Nixpkgs tree. + +3. The creation of a new category - and more generally the manipulation of + categories - requires an unpleaseant task of renaming and eventually patching + many seemingly unrelated files. + + - Moving files around Nixpkgs codebase requires updating their forward and + backward references. + - Especially in some auxiliary tools like editor plugins, testing suites, + autoupdate scripts and so on. + - Rewriting `all-packages.nix` can be error-prone (even using Metapad) and it + can generate huge, noisy patches. + +4. There is no convenient way to use multivalued categorization. + + A piece of software can fulfill many categories; e.g. + - an educational game + - a console emulator (vs. a PC emulator) + - and a special-purpose programming language (say, a smart-contracts one). + + The current one-size-fits-all restriction is artificial, imposes unreasonable + limitations and results in incomplete and confusing information. + + - No, symlinks or hardlinks are not convenient for this purpose; not all + environments support them (falling on the "less features than others" + problem expressed before) and they convey nothing besides confusion - just + think about writing the corresponding entry in `all-packages.nix`. + +5. It puts over the (possibly human) package writer the mental load of where to + put the files on the filesystem hierarchy, deviating them from the job of + really writing them. + + - Or just taking the shortest path and throw it on a folder under `misc`. + +6. It "locks" the filesystem, preventing its usage for other, more sensible + purposes. + +7. The most important: the categorization is not discoverable via Nix language + infrastructure. + + Indeed there is no higher level way to query about such categories besides + the one described in the bullet 2 above. + +In light of such a bunch of problems, this RFC proposes a novel alternative to +the above mess: new `meta` attributes. + +# Detailed design +[design]: #detailed-design + +## Code Implementation +[code-implementation]: #code-implementation + +A new attribute, `meta.categories`, will be included for every Nix expression +living inside Nixpkgs. + +This attribute will be a list, whose elements are one of the possible elements +of the `lib.categories` set. + +A typical snippet of `lib.categories` will be similar to: + +```nix +{ + assembler = { + name = "Assembler"; + description = '' + A program that converts text written in assembly language to binary code. + ''; + }; + + compiler = { + name = "Compiler"; + description = '' + A program that converts a source from a language to another, usually from + a higher, human-readable level to a lower, machine level. + ''; + }; + + font = { + name = "Font"; + description = '' + A set of files that defines a set of graphically-related glyphs. + ''; + }; + + game = { + name = "Game"; + description = '' + A program developed with entertainment in mind. + ''; + }; + + interpreter = { + name = "Interpreter"; + description = '' + A program that directly executes instructions written in a programming + language, without requiring compilation into the native machine language. + ''; + }; + +``` + +### Semantic Details +[semantic-details]: #semantic-details + +Given that `meta.categories` is implemented as a list, it is interesting to +treat the first element of this list as the "most important" categorization, the +one that mostly identifies with the software being classified. + +## Categorization Team +[categorization-team]: #categorization-team + +Given the typical complexities that arise from categorization, and expecting +that regular maintainers are not expected to understand its minuteness +(according to the experience from [Debtags +Team](https://wiki.debian.org/Debtags/FAQ#Why_don.27t_you_just_ask_the_maintainers_to_tag_their_own_packages.3F)), +it is strongly recommended the creation of a team entrusted with authority to +manage issues related to categorization and carry their corresponding duties. + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +In file bochs/default.nix: + +```nix +stdenv.mkDerivation { + +. . . + + meta = { + . . . + categories = with lib.categories; [ emulator debugger ]; + . . . + }; + }; +} + +``` + +In a `nix repl`: + +``` +nix-repl> :l +Added XXXXXX variables. + +nix-repl> pkgs.bochs.meta.categories +[ { ... } ] + +nix-repl> map (z: z.name) pkgs.bochs.meta.categories +[ "debugger" "emulator" ] +``` + +# Drawbacks +[drawbacks]: #drawbacks + +The most immediate drawbacks are: + +1. A huge treewide edit of Nixpkgs + + On the other hand, this is easily sprintable and amenable to automation. + +2. Bikeshedding + + How many and which categories we should create? Can we expand them later? + + For start, we can follow/take inspiration from many of the already existing + categories sets and add extra ones when the needs arise. Indeed, it is way + easier to create such categories using Nix language when compared to other + software collections. + + Further, the creation of a categorization team can resolve those litigations. + +3. Superfluous + + It can be argued that there are other ways to discover similar or related + package sets, like Repology. + + However, this argument is a bit circular, because e.g. the classification + shown by Repology effectively replicates the classification done by the many + software collections in its catalog. Therefore, relying in Repology merely + transfers the question to external sources. + + Further it becomes more pronounced when we take into account the fact Nixpkgs + is top 1 of most Repology statistics. The expected outcome, therefore, should + be precisely the opposite: Nixpkgs being _the_ source of structured metainfo + for other software collections. + +# Alternatives +[alternatives]: #alternatives + +1. Do nothing + + This will exacerbate the problems already listed. + +2. Ignore/nuke the categorization completely + + This is an alternative worthy of some consideration. After all, + categorization is not without its problems, as shown above. Removing or + ignoring classification removes all problems. + + However, there are good reasons to keep the categorization: + + - The complete removal of categorization is too harsh. A solution that keeps + and enhances the categorization is way more preferrable than one that nukes + it completely. + + - As said before, the categorization is already present; this RFC proposes to + expose it to a higher level, in a structured, more discoverable format. + + - Categorization is very traditional among software collections. Many of them + are doing this just fine for years on end, and Nixpkgs can imitate them + easily - and even surpass them, given the benefits of Nix language + machinery. + + - Categorization is useful in many scenarios and use cases - indeed they + are ubiquitous in software world: + - specialized search engines (from Repology to MELPA) + - code forges, from Sourceforge to Gitlab + - as said above, software collections from pkgsrc to slackbuilds + - organization and preservation (as Software Heritage) + +3. Debtags/Appstream hybrid approach + +A hybrid approach for code implementation would be implement two meta +attributes, namely + +- `meta.categories` for Appstream-based categories + - the corresponding `lib.categories` should follow Appstream closely, with + few room to custom/extra categories +- `meta.tags` for Debtags-like tags + - while being inspired from the venerable Debtags work, the corresponding + `lib.tags` is completely free to modify and even divert from Debtags, + following its own way +- generally speaking, `lib.tags` should be less bureaucratic than + `lib.categories` + +However, this approach arguably elevates the complexity of the whole work, and +adds too much redundancy. + +# Prior art +[prior-art]: #prior-art + +As said above, categorization is very traditional among software collections. It +is not hard to cite examples in this arena; the most interesting ones I have +found are listed below (linked at [references section](#references)): + +- FreeBSD Ports; +- Debtags; +- Appstream Project; + +# Unresolved questions +[unresolved]: #unresolved-questions + +There are remaining issues to be solved by the categorization team: + +- What data structure is suitable to represent a category? + - For now we stick to the most natural: a set `{ name, description }`. + +- Should we have a set of primary, "most important" categories with mandatory + status, in the sense each package should set at least one of them? + - The answer is most certainly positive. + +# Future work +[future]: #future-work + +- Create the [categorization team](#categorization-team) +- Carry out the duties correlated to categorization, including but not limited + to: + + - Decide between possibilities of implementation; + - Documentation updates; + - Category curation, integration and updates; + - Continuous Integration updates and adaptations; + - Coordinaton of efforts to import, integrate and update categorization of + packages; + - Litigations and disputations: + - Solve them, especially in corner cases; + - Enforce implementation issues + - Decide when a CI check should be converted to block + - Grace periods + +# References +[references]: #references + +- [Desktop Menu + Specification](https://specifications.freedesktop.org/menu-spec/latest/); + specifically, + - [Main + categories](https://specifications.freedesktop.org/menu-spec/latest/apa.html) + - [Additional + categories](https://specifications.freedesktop.org/menu-spec/latest/apas02.html) + - [Reserved + categories](https://specifications.freedesktop.org/menu-spec/latest/apas03.html) + +- [Appstream](https://www.freedesktop.org/wiki/Distributions/AppStream/) + +- [Debtags](https://wiki.debian.org/Debtags) + + - [Debtags FAQ](https://wiki.debian.org/Debtags/FAQ) + +- [NetBSD pkgsrc guide](https://www.netbsd.org/docs/pkgsrc/) + - Especially, [Chapter 12, Section + 1](https://www.netbsd.org/docs/pkgsrc/components.html#components.Makefile) + contains a short list of CATEGORIES. + +- [FreeBSD Porters + Handbook](https://docs.freebsd.org/en/books/porters-handbook/) + - Especially + [Categories](https://docs.freebsd.org/en/books/porters-handbook/makefiles/#porting-categories)