Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better solution for storing large/many binary assets that arent hosted on main ALCF site #533

Open
felker opened this issue Nov 8, 2024 · 2 comments
Assignees

Comments

@felker
Copy link
Member

felker commented Nov 8, 2024

Came up in discussion over #528, where @kaushikvelusamy wanted to upload a ~10 MB PDF slide deck to link to in the docs. These exact slides were never presented an an ALCF tech talk or workshop, so they are not already hosted on https://www.alcf.anl.gov/ and this is a bit of an edge case.

Generally, I am usually hesitant to add any binary files to version control if 1) we don't really care about versioning the files, 2) they are modified frequently, and/or 3) are larger than a few MB, as a rule of thumb. In some of those cases, or if you have too many such files in the repo history, it can bloat the .git/ size and making cloning/checkout/push/pull slower over time.

Currently in docs/, we are doing OK with only a few such files larger than a MB:

$ find . -type f -not -name "*.md" -exec du -hs {} \; | sort -h
...
1020K	./aurora/performance-tools/images/GPU-offload-03.png
1.3M	./aurora/performance-tools/images/FireFox-VTune05.png
1.3M	./services/files/docker_hub_repo_build.gif
1.6M	./ai-testbed/files/home-cerebras-sambanova.png
1.7M	./aurora/images/Argonne_wireframe_white_transparent.eps
1.7M	./images/Argonne_wireframe_white_transparent.eps
1.8M	./policies/accounts/IT_Access_Agreement_for_ALCF.pdf
2.0M	./services/files/singularity_build.gif

We are already storing 204 PNGs, 2 PDFs, 7 GIFs, 18 JPGs, 3 Microsoft Word documents, 2 EPS files (ANL wireframe logos).

Binary files like images are fine for now, since they are directly included/used in the Markdown source, and you can preview the Markdown rendering with the images locally without running mkdocs. But any file that is simply linked to, like the PDFs and .docx, should be removed from this repo, for example:

### Templates for INCITE and ALCC:
- [Quarterly Report Template](files/PINAME_ALLOCATION_YEAR_QX.docx)
- [End of Project Report Template](files/PINAME_ALLOCATION_YEARS_EOP.docx)
- [End of Year Report Template](files/PINAME_ALLOCATION_YEAR_EOY.docx)

Some ideas for alternatives:

  • Store binary files in a Box folder that (any?) ALCF staff can edit, upload files, and make public Read links to said files
  • Separate GitHub repo to statically store PDF slides. Note, GitHub will block files beyond 100 MiB
  • Provide an easier method for uploading slides etc. to the main ALCF site storage
@kevin-harms
Copy link
Contributor

kevin-harms commented Nov 8, 2024 via email

@felker
Copy link
Member Author

felker commented Nov 8, 2024

yeah that is likely, but will be tabled until after SC24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants