-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need better solution for storing large/many binary assets that arent hosted on main ALCF site #533
Comments
You could ask Beth to host it on the site with other presentational
materials.
kevin
…On Thu, Nov 7, 2024 at 6:32 PM Kyle Gerard Felker ***@***.***> wrote:
Came up in discussion over #528
<#528>, where
@kaushikvelusamy <https://github.com/kaushikvelusamy> wanted to upload a
~10 MB PDF slide deck to link to in the docs. These exact slides were never
presented an an ALCF tech talk or workshop, so they are not already hosted
on https://www.alcf.anl.gov/ and this is a bit of an edge case.
Generally, I am usually hesitant to add any binary files to version
control if 1) we don't really care about versioning the files, 2) they are
modified frequently, and/or 3) are larger than a few MB, as a rule of
thumb. In some of those cases, or if you have too many such files in the
repo history, it can bloat the .git/ size and making
cloning/checkout/push/pull slower over time.
Currently in docs/, we are doing OK with only a few such files larger
than a MB:
$ find . -type f -not -name "*.md" -exec du -hs {} \; | sort -h
...
1020K ./aurora/performance-tools/images/GPU-offload-03.png
1.3M ./aurora/performance-tools/images/FireFox-VTune05.png
1.3M ./services/files/docker_hub_repo_build.gif
1.6M ./ai-testbed/files/home-cerebras-sambanova.png
1.7M ./aurora/images/Argonne_wireframe_white_transparent.eps
1.7M ./images/Argonne_wireframe_white_transparent.eps
1.8M ./policies/accounts/IT_Access_Agreement_for_ALCF.pdf
2.0M ./services/files/singularity_build.gif
We are already storing 204 PNGs, 2 PDFs, 7 GIFs, 18 JPGs, 3 Microsoft Word
documents, 2 EPS files (ANL wireframe logos).
Binary files like images are fine for now, since they are directly
included/used in the Markdown source, and you can preview the Markdown
rendering with the images locally without running mkdocs. But any file
that is simply linked to, like the PDFs and .docx, should be removed from
this repo, for example:
https://github.com/argonne-lcf/user-guides/blob/48f2566540a1469db79ed05edefa1931d5fc80a3/docs/account-project-management/project-management/project-reports.md?plain=1#L45-L48
Some ideas for alternatives:
- Store binary files in a Box folder that (any?) ALCF staff can edit,
upload files, and make public Read links to said files
- Separate GitHub repo to statically store PDF slides. Note, GitHub
will block files beyond 100 MiB
<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github>
- Provide an easier method for uploading slides etc. to the main ALCF
site storage
—
Reply to this email directly, view it on GitHub
<#533>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVK4J5YCKW2GGONYTS7JNLZ7QBDNAVCNFSM6AAAAABRMMHYWCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2DENJRG43DONA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
yeah that is likely, but will be tabled until after SC24 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Came up in discussion over #528, where @kaushikvelusamy wanted to upload a ~10 MB PDF slide deck to link to in the docs. These exact slides were never presented an an ALCF tech talk or workshop, so they are not already hosted on https://www.alcf.anl.gov/ and this is a bit of an edge case.
Generally, I am usually hesitant to add any binary files to version control if 1) we don't really care about versioning the files, 2) they are modified frequently, and/or 3) are larger than a few MB, as a rule of thumb. In some of those cases, or if you have too many such files in the repo history, it can bloat the
.git/
size and making cloning/checkout/push/pull slower over time.Currently in
docs/
, we are doing OK with only a few such files larger than a MB:We are already storing 204 PNGs, 2 PDFs, 7 GIFs, 18 JPGs, 3 Microsoft Word documents, 2 EPS files (ANL wireframe logos).
Binary files like images are fine for now, since they are directly included/used in the Markdown source, and you can preview the Markdown rendering with the images locally without running
mkdocs
. But any file that is simply linked to, like the PDFs and.docx
, should be removed from this repo, for example:user-guides/docs/account-project-management/project-management/project-reports.md
Lines 45 to 48 in 48f2566
Some ideas for alternatives:
The text was updated successfully, but these errors were encountered: