Compiling YOLOX model error #976

mvinci12 · 2024-08-28T23:22:47Z

Errors:

2024-08-28 23:10:02.000380: 248385 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']: 2024-08-28T23:10:02Z [TEN404] Internal tensorizer error: TensorInitialization:Incorrect IR by <class 'neuronxcc.starfish.penguin.targets.transforms.TensorInitialization.TensorInitialization'> - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

RuntimeError: Bad StatusOr access: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']' returned non-zero exit status 70.

File "/home/ubuntu/aws-neuron-samples/torch-neuronx/training/neuron-adoption/yolox/core/trainer.py", line 97, in train_in_iter
xm.mark_step() # Ensure TPU operations are synchronized
│ └ <function mark_step at 0x7fcf8a8be170>
└ <module 'torch_xla.core.xla_model' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages...

File "/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/core/xla_model.py", line 969, in mark_step
torch_xla._XLAC._xla_step_marker(
│ │ └ <built-in method _xla_step_marker of PyCapsule object at 0x7fcf9a0b6d00>
│ └ <module '_XLAC' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/_XLAC.cpython-310-...
└ <module 'torch_xla' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/__in...

jyang-aws added compiler training Trn1 labels Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling YOLOX model error #976

Compiling YOLOX model error #976

mvinci12 commented Aug 28, 2024

Compiling YOLOX model error #976

Compiling YOLOX model error #976

Comments

mvinci12 commented Aug 28, 2024