GPU issues for DPGEN #1512
Replies: 1 comment
-
when i not use DP, it is normal ,as follows: (deepmd) zzf@ZZF: +---------------------------------------------------------------------------------------+ |
Beta Was this translation helpful? Give feedback.
-
When I don't use the DP program, it's normal to use nvidia-msi to check the GPU information, but when I start the DP program, the GPU information shows as follows,
NVIDIA-SMI has failed because it couldn't
communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is
installed and running.
Failed to properly shut down NVML: Driver Not Loaded
And in the DP process, only the CPU is called, not the GPU. The following is the relevant log information:
2024-04-10 20:00:06.119918: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (100)
2024-04-10 20:00:06.119954: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ZZF): /proc/driver/nvidia/version does not exist
2024-04-10 20:00:06.120025: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2024-04-10 20:00:21.315141: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
DeePMD-kit WARNING: Environmental variable TF_INTRA_OP_PARALLELISM_THREADS is not set. Tune TF_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable TF_INTER_OP_PARALLELISM_THREADS is not set. Tune TF_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
INVALID_ARGUMENT: Tensor spin_attr/ntypes_spin:0, specified in either feed_devices or fetch_devices was not found in the Graph
Thank you very much for your help
Beta Was this translation helpful? Give feedback.
All reactions