You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team! We observed an issue when compiling the ESM model with Neuron SDK 2.21.0:
Reported stdout:
DEBUG: needsModular? No. macCnt 494817280
INFO: Switching to single-module compile. PrePartitionPipe skipped.
INFO: Found memory bound graph
2025-01-06 11:49:42.255433: F hilo/hlo_passes/VerifySupportedHloOps.cc:160] ERROR: Unsupported operator: erf
Detailed log
(aws_neuronx_venv_pytorch_2_5) ubuntu@ip-xx-xx-x-xx:~/optimum-neuron$ optimum-cli export neuron --model hf-internal-testing/tiny-random-EsmModel --batch_size 1 --sequence_length 16 --auto_cast matmul --auto_cast_type bf16 tiny_esm/
WARNING:root:MASTER_ADDR environment variable is not set, defaulting to localhost
WARNING:root:Found libneuronpjrt.so. Setting PJRT_DEVICE=NEURON.
/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/neuronx_distributed/modules/moe/expert_mlps.py:11: DeprecationWarning: torch_neuronx.nki_jit is deprecated, use nki.jit instead.
from neuronx_distributed.modules.moe.blockwise import (
/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/neuronx_distributed/modules/moe/expert_mlps.py:11: DeprecationWarning: torch_neuronx.nki_jit is deprecated, use nki.jit instead.
from neuronx_distributed.modules.moe.blockwise import (
pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████| 241k/241k [00:00<00:00, 55.4MB/s]
Some weights of EsmModel were not initialized from the model checkpoint at hf-internal-testing/tiny-random-EsmModel and are newly initialized: ['contact_head.regression.bias', 'contact_head.regression.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████| 277/277 [00:00<00:00, 3.67MB/s]
vocab.txt: 100%|████████████████████████████████████████████████████████████████████████████████| 93.0/93.0 [00:00<00:00, 1.27MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████| 125/125 [00:00<00:00, 1.79MB/s]
***** Compiling tiny-random-EsmModel *****
Using Neuron: --auto-cast matmul
Using Neuron: --auto-cast-type bf16
Using Neuron: --optlevel 2
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████| 223k/223k [00:00<00:00, 42.3MB/s]
.
Process Process-1:
Traceback (most recent call last):
File "neuronxcc/driver/CommandDriver.py", line 345, in neuronxcc.driver.CommandDriver.CommandDriver.run_subcommand
File "neuronxcc/driver/commands/CompileCommand.py", line 1353, in neuronxcc.driver.commands.CompileCommand.CompileCommand.run
File "neuronxcc/driver/commands/CompileCommand.py", line 1304, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
File "neuronxcc/driver/commands/CompileCommand.py", line 1324, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
File "neuronxcc/driver/commands/CompileCommand.py", line 1327, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline
File "neuronxcc/driver/Job.py", line 344, in neuronxcc.driver.Job.SingleInputJob.run
File "neuronxcc/driver/Job.py", line 370, in neuronxcc.driver.Job.SingleInputJob.runOnState
File "neuronxcc/driver/Pipeline.py", line 30, in neuronxcc.driver.Pipeline.Pipeline.runSingleInput
File "neuronxcc/driver/Job.py", line 344, in neuronxcc.driver.Job.SingleInputJob.run
File "neuronxcc/driver/Job.py", line 370, in neuronxcc.driver.Job.SingleInputJob.runOnState
File "neuronxcc/driver/jobs/Frontend.py", line 454, in neuronxcc.driver.jobs.Frontend.Frontend.runSingleInput
File "neuronxcc/driver/jobs/Frontend.py", line 218, in neuronxcc.driver.jobs.Frontend.Frontend.runXLAFrontend
File "neuronxcc/driver/jobs/Frontend.py", line 190, in neuronxcc.driver.jobs.Frontend.Frontend.runHlo2Tensorizer
neuronxcc.driver.Exceptions.CompilerInvalidInputException: ERROR: Failed command /opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /tmp/tmpp3mv4kkw/model/graph.hlo --out-dir ./ --output penguin.py --layers-per-module=1 --emit-tensor-level-dropout-ops --emit-tensor-level-rng-ops
------------
Reported stdout:
DEBUG: needsModular? No. macCnt 600064
INFO: Switching to single-module compile. PrePartitionPipe skipped.
INFO: Found memory bound graph
2025-01-06 11:46:54.763940: F hilo/hlo_passes/VerifySupportedHloOps.cc:160] ERROR: Unsupported operator: erf
------------
Reported stderr:
None
------------
Import of the HLO graph into the Neuron Compiler has failed.
This may be caused by unsupported operators or an internal compiler error.
More details can be found in the error message(s) above.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "neuronxcc/driver/CommandDriver.py", line 352, in neuronxcc.driver.CommandDriver.CommandDriver.run_subcommand_in_process
File "neuronxcc/driver/CommandDriver.py", line 347, in neuronxcc.driver.CommandDriver.CommandDriver.run_subcommand
File "neuronxcc/driver/CommandDriver.py", line 111, in neuronxcc.driver.CommandDriver.handleError
File "neuronxcc/driver/GlobalState.py", line 102, in neuronxcc.driver.GlobalState.FinalizeGlobalState
File "neuronxcc/driver/GlobalState.py", line 82, in neuronxcc.driver.GlobalState._GlobalStateImpl.shutdown
File "/usr/lib/python3.10/shutil.py", line 715, in rmtree
onerror(os.lstat, path, sys.exc_info())
File "/usr/lib/python3.10/shutil.py", line 713, in rmtree
orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/optimum-neuron/neuronxcc-u6vnj9h_'
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/__main__.py", line 781, in <module>
main()
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/__main__.py", line 751, in main
main_export(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/__main__.py", line 634, in main_export
_, neuron_outputs = export_models(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/convert.py", line 372, in export_models
neuron_inputs, neuron_outputs = export(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/convert.py", line 455, in export
return export_neuronx(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/exporters/neuron/convert.py", line 585, in export_neuronx
neuron_model = neuronx.trace(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 589, in trace
neff_filename, metaneff, flattener, packer, weights = _trace(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 654, in _trace
neff_artifacts = generate_neff(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 506, in generate_neff
neff_filename = hlo_compile(
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 396, in hlo_compile
raise RuntimeError(f"neuronx-cc failed with {status}")
RuntimeError: neuronx-cc failed with 1
Traceback (most recent call last):
File "/opt/aws_neuronx_venv_pytorch_2_5/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/opt/aws_neuronx_venv_pytorch_2_5/lib/python3.10/site-packages/optimum/commands/export/neuronx.py", line 305, in run
subprocess.run(full_command, shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model hf-internal-testing/tiny-random-EsmModel --batch_size 1 --sequence_length 16 --auto_cast matmul --auto_cast_type bf16 tiny_esm/' returned non-zero exit status 1.
Hi team! We observed an issue when compiling the ESM model with Neuron SDK 2.21.0:
Detailed log
To reproduce:
The text was updated successfully, but these errors were encountered: