arcee-ai/Virtuoso-Lite (LlamaForCausalLM) compilation fails #1098

juliensimon · 2025-01-30T10:49:57Z

arcee-ai/Virtuoso-Lite is a LlamaForCausalLM model, which I expected to be supported.

Compilation with LMI 0.30 fails:

2025-01-30 10:43:02.000490:  410  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/ff7ea03f-6b4c-4174-83e5-c9e847ff5520/model.MODULE_cc3503ed4b78fe12d0fa+54293761.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/ff7ea03f-6b4c-4174-83e5-c9e847ff5520/model.MODULE_cc3503ed4b78fe12d0fa+54293761.neff', '--target=trn1', '--logfile', '/tmp/compile.log', '--temp-dir=/tmp', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2025-01-30T10:43:02Z [F134] neuronx-cc terminated abnormally - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new

How to reproduce:

DJL_IMAGE="763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.30.0-neuronx-sdk2.20.1"

docker run -t --rm --network=host \
  -v $PWD/model:/opt/ml/input/data/training \
  $DEVICES \
  $DJL_IMAGE \
  partition --model-dir /opt/ml/input/data/training --skip-copy

serving.properties:

engine=Python
option.dtype=bf16
option.entryPoint=djl_python.transformers_neuronx
option.tensor_parallel_degree=24
option.n_positions=32768
option.max_rolling_batch_size=1
option.model_loading_timeout=3600
option.save_mp_checkpoint_path=/opt/ml/input/data/training/partition-test

The text was updated successfully, but these errors were encountered:

aws-rishyraj · 2025-01-30T23:43:02Z

Hi @juliensimon,

Thanks for filing the issue. We will take a look and get back to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arcee-ai/Virtuoso-Lite (LlamaForCausalLM) compilation fails #1098

arcee-ai/Virtuoso-Lite (LlamaForCausalLM) compilation fails #1098

juliensimon commented Jan 30, 2025

aws-rishyraj commented Jan 30, 2025

arcee-ai/Virtuoso-Lite (LlamaForCausalLM) compilation fails #1098

arcee-ai/Virtuoso-Lite (LlamaForCausalLM) compilation fails #1098

Comments

juliensimon commented Jan 30, 2025

aws-rishyraj commented Jan 30, 2025