RuntimeError when using `--modified device` #345

drivenbyentropy · 2023-06-02T18:03:45Z

Hi,

When running bonito with a custom trained modified base model and specifying the --modified device option, it fails at runtime with the following error:

> reading pod5
> outputting aligned bam
> loading model [email protected]
> loading modified base model
> loaded modified base model to call (alt to T): T=XXXX
> loading reference
> calling:   0%|                                      | 1/5420253 [00:14<22525:53:46, 14.96s/ reads]/opt/bonito/lib/python3.9/site-packages/remora/data_chunks.py:515: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
  model.forward(
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/opt/bonito/lib/python3.9/site-packages/bonito/multiprocessing.py", line 261, in run
    for i, (k, v) in enumerate(self.iterator):
  File "/opt/bonito/lib/python3.9/site-packages/bonito/cli/basecaller.py", line 137, in <genexpr>
    results = ((k, call_mods(mods_model, k, v)) for k, v in results)
  File "/opt/bonito/lib/python3.9/site-packages/bonito/mod_util.py", line 91, in call_mods
    call_read_mods(
  File "/opt/bonito/lib/python3.9/site-packages/remora/inference.py", line 84, in call_read_mods
    nn_out, labels, pos = read.run_model(model)
  File "/opt/bonito/lib/python3.9/site-packages/remora/data_chunks.py", line 515, in run_model
    model.forward(
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: shape '[2, 0, 1]' is invalid for input of size 1474560

Ommiting the --modified_device parameter does work, however at a very slow speed (~40 reads/s).

Is there anything I am missing to move the modified base prediction from CPU to the GPU?

Thank you in advance

The text was updated successfully, but these errors were encountered:

davidnewman02 · 2024-12-30T11:07:53Z

Hi @drivenbyentropy

It seems that the model architecture does not match the model design expected in bonito. Can you please post your full command and which models you are trying to use?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when using `--modified device` #345

RuntimeError when using `--modified device` #345

drivenbyentropy commented Jun 2, 2023

davidnewman02 commented Dec 30, 2024

RuntimeError when using --modified device #345

RuntimeError when using --modified device #345

Comments

drivenbyentropy commented Jun 2, 2023

davidnewman02 commented Dec 30, 2024

RuntimeError when using `--modified device` #345

RuntimeError when using `--modified device` #345