Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing list of artifacts in database #195

Open
KAction opened this issue Jun 20, 2023 · 5 comments
Open

Storing list of artifacts in database #195

KAction opened this issue Jun 20, 2023 · 5 comments

Comments

@KAction
Copy link
Contributor

KAction commented Jun 20, 2023

Currently, Laminar lists $ARCHIVE directory of particular run every time to display links on the top:

    KJ_IF_MAYBE(dir, fsHome->tryOpenSubdir("archive"/runArchive)) {
        for(kj::StringPtr file : (*dir)->listNames()) {
            kj::FsNode::Metadata meta = (*dir)->lstat(kj::Path{file});

which means I can't move files in $ARCHIVE elsewhere without loosing these links. And want to move these files to cheaper storage.

So I suggest that Laminar saves list of artifacts right after job finishes, and it is up to reverse proxy to figure out where to find
laminar.example.com/archive/foo-job/10/debug.txt. What do you think?

@ohwgiles
Copy link
Owner

The original idea supports moving archived artefacts to cheaper storage but assumed this would be achieved by mounting or symlinking the archive directory appropriately. It's simpler to just dynamically iterate the folder, but iterating is more expensive especially on slow storage or if there are many artefacts. I'm not opposed to your suggestion, just wanted to check why mounting/symlinking would not work for you since this proposal has some (admittedly low) added complexity versus the current situation

@KAction
Copy link
Contributor Author

KAction commented Jul 23, 2023

If I want to archive artifacts on S3, mounting them so Laminar finds them means FUSE -- already extra complexity. Furthermore,
scanning S3 to render a job page (list of artifacts on the top) is both slow and costly.

Technically, I can keep empty files in /laminar/archive to inform Laminar about what artifacts are associated with the job, yet configure the reverse proxy to go to S3 instead, but that means using the filesystem as a database. Huge pain to back up. readdir(3) won't be happy.

@ohwgiles
Copy link
Owner

Fair enough

@ohwgiles
Copy link
Owner

@mitya57 are you still offering a PR for this? I'm happy with the justification

@KAction
Copy link
Contributor Author

KAction commented Dec 7, 2023

Sorry for the late response.

@mitya57 ended up with postgres-only fork. https://github.com/mitya57/laminar/tree/wip/postgres

That was necessary to speed up the things by taking advantage of Postgres materialized views and other nice features. Patch to keep the artifact list in the database ended up tightly coupled to other changes, so, I guess we can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants