Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

01-main: + ripgrep #1026

Merged
merged 1 commit into from
Mar 2, 2024
Merged

01-main: + ripgrep #1026

merged 1 commit into from
Mar 2, 2024

Conversation

mralusw
Copy link
Contributor

@mralusw mralusw commented Feb 21, 2024

Compared to my other package contributions,

  • DRY, so other packages can reuse the template with as few changes as possible: everything is inferred from GH_OWNER_REPO, SUMMARY and DEB_PKGNAME. SUMMARY is not available from the cached json, and DEB_PKGNAME could presumably be different from the repo name (also, a repo can release multiple packages resulting in multiple debs for the same host architecture, e.g. pkg-common, pkg-full, pkg-tiny etc)
  • custom variables are local to a pkg_<NAME> function so as not to pollute global namespace and avoid collisions with other packages / future DEB_GET mechanisms
  • VERSION_PUBLISHED is inferred from the *.deb filename, as release versions / tags in the URL (which I've used previously) don't necessarily match that the .deb version

Discussion

At some point I'd like to propose standardizing get_github_url(), a function that sets the URL variable; the most commonly used mechanism of grep ... head ... cut is somewhat haphazard and also causes at least two unnecessary fork()'/exec()'s per package. These don't matter compared to network access, but a lot of deb_get ops involve cached files without any network updates.

I'm still using a single grep -m1 -o, but I think it's actually possible to read the cached file into a string and search with bash regexes, without any external utilities at all, and without subshell-ing. Such an optimization would only change the proposed get_github_url function.

@philclifford philclifford merged commit 4b8a855 into wimpysworld:main Mar 2, 2024
1 check passed
@philclifford
Copy link
Member

I'd like to see get_github_url() provide an efficient standard approach but would also add some words of caution. Standardisation across all the github repos is challenging. There's a variety of asset naming conventions and release behaviours in vogue and these change even within a single repository over time. In particular, whereas using "latest" in the api to get the current release mostly works, for some repos a .deb asset may take a while to appear after a release is made, or some repos may "release" incomplete releases (just source for example without builds, or just fixes for one architecture). This can necessitate getting more than just the latest release to provide the latest available download for our target architecture. So long as we cache the whole json response that can become quite a large "string". For me a single grep -m1 -o is the sweet spot that works for most cases.

grep -rh VERSION_PUBLISHED 01-main/packages/ | sort | uniq -c | sort -n

offers a lot of potential for standardising and efficiency but also shows some of the potential pain.

@mralusw mralusw deleted the add-ripgrep branch March 13, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants