-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我无法在我的EKS集群上调度我的GPU #791
Comments
Is the hami-device-plugin running successfully? please show the result of 'curl {scheduler node ip}:31993/metrics' |
you only have one GPU, so 'nvidia.com/gpu' cannot exceed 1 per task. see FAQ for more details: #646 |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
EKS version: 1.24.17
hami version: 2.4.1
1、我是通过helm pull到本地修改文件后才部署的
因为按照攻略文档,直接helm安装会报错 没有eks的版本
2、job webhook patch 启动时候也会报错,我手动修改了配置之后才能成功
3、scheduler日志无报错,apiserver 无报错
4、测试gpu pod无event,也没有分配节点,节点未配置污点
5、node是可以看到 gpu的
我的scheduler和device都是正常运行的
scheduler日志
创建pod后scheduler日志一直重复下面这张图
The text was updated successfully, but these errors were encountered: