Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git LFS bandwidth issue #60

Open
bendnorman opened this issue Jul 19, 2022 · 4 comments
Open

Git LFS bandwidth issue #60

bendnorman opened this issue Jul 19, 2022 · 4 comments
Assignees

Comments

@bendnorman
Copy link
Member

bendnorman commented Jul 19, 2022

On July 15th, 2022 we saw our Git LFS bandwidth consumed within a couple of hours. We went from 100% to 127% in a few hours. This isn't a huge deal because an additional 50GB of bandwidth is $5 a month. However, we should figure out why this unexpected increase in bandwidth occurred.

image

(Screenshot from July 15th, 2022)
@zaneselvans
Copy link
Member

Which repositories have we been using Git LFS in? It's something that has to be enabled per repo isn't it? I don't even have git lfs installed locally.

@bendnorman
Copy link
Member Author

Git LFS bandwidth is consumed when downloads occur for users that have Git LFS enabled:

When you commit and push a change to a file tracked with Git LFS, a new version of the entire file is pushed and the total file size is counted against the repository owner's storage limit. When you download a file tracked with Git LFS, the total file size is counted against the repository owner's bandwidth limit. Git LFS uploads do not count against the bandwidth limit.

If collaborators on your repository don't have Git LFS installed, they won't have access to the original large file. If they attempt to clone your repository, they will only fetch the pointer files, and won't have access to any of the actual data.

As of July 19th, 20022 we only have 0.02 GB of data stored on Git LFS in the pudl-usage-metrics repo and 1.37 GB of bandwidth has been used. I enabled Git LFS on July 13th, 2022. That means the data had to have been downloaded 68 times.

Since July 13th, about 18 actions have been run on the branch with git lfs. However, the actions/checkout@v3 LFS option is disabled so the actions are not downloading the LFS files.

The more likely culprit are bots with git lfs enabled cloning the repo. The repo was cloned 48 times since July 13th - July 19th:
image

I don't think this accounts for all the bandwidth and assumes the cloners have git lfs enabled.

Solutions

I enabled git lfs on the pudl-usage-metrics repo so I could store some old old Audit logs that weren't captured by the log sink. Given we aren't using the audit logs anymore I could just remove the old logs from this repo and disable git lfs. If we want to keep the logs around I could store them in a GCS bucket or Google Drive.

If git lfs is going to produce egress bandwidth issues in the future it looks like we might be able to disable git lfs on the organization level.

@zaneselvans
Copy link
Member

If the additional user data that we'd hoped was available from the audit logs isn't actually there, and we aren't going to use them going forward for usage metrics, it seems like removing them and disabling LFS is probably the way to go. What do you think?

@bendnorman
Copy link
Member Author

bendnorman commented Jul 20, 2022

Yeah, that sounds good to me. I removed the git lfs files from clean-intake-logs branch. Helpful thread on disabling git lfs.

@jdangerx jdangerx moved this to 🆕 New in Catalyst Megaproject Feb 7, 2023
@jdangerx jdangerx moved this from 🆕 New to 📋 Backlog in Catalyst Megaproject Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Icebox
Development

No branches or pull requests

4 participants