We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Amazon Linux AMI 2023 contain the issue, that happening during Docker builds. Please refer to this PR: pytorch/pytorch#136544
The problem is Docker package a systemd service with LimitNOFILE=infinity (sets --ulimit).
Patch needs to be applied via: https://github.com/pytorch/test-infra/blob/main/terraform-aws-github-runner/modules/runners-instances/templates/user-data.sh#L89
This is the patch:
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service sudo systemctl daemon-reload sudo systemctl restart docker
The text was updated successfully, but these errors were encountered:
Use amazon linux 2023 runners for Docker builds (#136544)
11c5f9a
Migrate these builds to linux 2023. We want to build and test the Docker images in CD. Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023. Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544 Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712 I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544 Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576 Pull Request resolved: #136544 Approved by: https://github.com/ZainRizvi Co-authored-by: Nikita Shulga <[email protected]>
Use amazon linux 2023 runners for Docker builds (pytorch#136544)
b6d12d8
Migrate these builds to linux 2023. We want to build and test the Docker images in CD. Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023. Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544 Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712 I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544 Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576 Pull Request resolved: pytorch#136544 Approved by: https://github.com/ZainRizvi Co-authored-by: Nikita Shulga <[email protected]>
No branches or pull requests
New Amazon Linux AMI 2023 contain the issue, that happening during Docker builds. Please refer to this PR: pytorch/pytorch#136544
The problem is Docker package a systemd service with LimitNOFILE=infinity (sets --ulimit).
Patch needs to be applied via:
https://github.com/pytorch/test-infra/blob/main/terraform-aws-github-runner/modules/runners-instances/templates/user-data.sh#L89
This is the patch:
The text was updated successfully, but these errors were encountered: