Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

计算机集群报错 #62

Open
zfx947393091 opened this issue May 13, 2024 · 1 comment
Open

计算机集群报错 #62

zfx947393091 opened this issue May 13, 2024 · 1 comment

Comments

@zfx947393091
Copy link

报错内容如下:
[jupiter:24941][coll_ml_mca.c:126:get_default_hca] COLL-ML Unable to get list of available IB devices (ibv_get_device_list failed)
[jupiter:24941][coll_ml_mca.c:191:set_hcoll_device] COLL-ML You must specify a valid HCA device by setting:
-x HCOLL_MAIN_IB=<dev_name:port> or -x UCX_NET_DEVICES=<dev_name:port>.
If no device was specified for HCOLL (or the calling library), automatic device detection will be run.
In case of unfounded HCA device please contact your system administrator.
Invalid error code (-1) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in hcoll_initialize:102
在三个特征值下三次迭代均为3,且log文件中三次迭代只到20%并没有运行到100%,out文件中虽出现have a nice day但并没有出现相应的拟合函数,请问是哪里存在问题,感谢
SISSO-error.zip

@rouyang2017
Copy link
Owner

你的SISSO.in文件中某些参数来自旧版本。请用最新SISSO.in 模板 (某些参数与旧版本不同)。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants