I can't set params `optlevel` to `1` with torch_neuronx.trace #67

Suprhimp · 2024-02-20T13:53:15Z

My environment is aws server inf2.8xlarge

python : 3.8.10
torch-neuronx : 2.1.1.2.0.1b0
neuronx-cc : 2.12.68.0+4480452af

I'm trying to compile esrgan torch model to neuron but I have an issue.

from PIL import Image
import requests

import torch
import torch_neuronx
from torchvision import models
from torchvision.transforms import functional

from modules.esrgan_upscale import upscale_model_loader
import os
os.environ["NEURON_CC_FLAGS"] = "-O1"
# load the model
model = upscale_model_loader('modules/weight/4x-Ultrasharp.pth')
model.eval()

# Get an example input
image = Image.open('/home/ubuntu/diffusers-ultimate-upscale/testIm.png')
image = image.convert('RGB')
image = functional.to_tensor(image)
image = torch.unsqueeze(image, 0)

# Run inference on CPU
output_cpu = model(image)

# Compile the model
model_neuron = torch_neuronx.trace(model, image,compiler_args=['--optlevel','1'])

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

when I run this code
first it gives me this log

2024-02-20T13:36:54Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

I want to compile with -O1 because of this error log (yes, i failed compile)

[XTP002] Too many instructions after unroll for function sg0000! - Compiling under --optlevel=1 may result in smaller graphs. If you are using a transformer model, try using a smaller context_length_estimate value.

I can't set the optlevel flag to 1 ... even I changed inside the module code like this

    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--output",
        neff_filename,
        "--optlevel",
        "1"
    ]
    command.extend(compiler_args)

what should I do if I want to compile with --optlevel=1 with torch_neuronx.trace ?

The text was updated successfully, but these errors were encountered:

aws-donkrets · 2024-02-26T07:58:26Z

Hi Suprhimp, took a quick look at your code and it seems to be correct.
The torch_neuronx.trace call can pass compiler options and the way you have done it looks correct as does your command definition. I'll note you don't need to use the os.environ["NEURON_CC_FLAGS"] = "-O1" line so that can be removed. One suggestion is to move the neff_filename parameter to the end of the command setting, allowing all the cmd-line flags to appear before the filename. So, the cmd-line would look like:
neuronx_cc compile input_file_name --framework XLA --target trn1 --optlevel 1 --output neff_filename

Another suggestion would be to run the above command by hand to see if you get the same result.

Suprhimp · 2024-02-29T05:38:10Z

Hi, Thanks for checking my issue @aws-donkrets :)

even if I change the code like this
in trace.py function name hlo_compile like this

if neuron_cc is None:
        raise RuntimeError("neuronx-cc compiler binary does not exist")
    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--optlevel",
        "1",
        "--output",
        neff_filename,
    ]
    command.extend(compiler_args)

it gives me this log

2024-02-29T02:01:25Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

and also I faild my compile ;)

Suprhimp · 2024-03-11T02:15:46Z

@aws-donkrets hello, let me add question, Is there any way to compile .pth file to run my torch file in inf2 instance?

faster compile flag still not work.

Can you check it please?

aws-taylor · 2024-05-22T15:34:19Z

Hello @Suprhimp,

We do not directly support compiling .pth files, you would need to load it first, perhaps using load_state_dict(), then trace the loaded model to trigger compilation.

Could you share your model or more of the failure logs from the compiler (usually log-neuronx-cc.txt)? That will give us more of an idea of why the failure is occurring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can't set params `optlevel` to `1` with torch_neuronx.trace #67

I can't set params `optlevel` to `1` with torch_neuronx.trace #67

Suprhimp commented Feb 20, 2024

aws-donkrets commented Feb 26, 2024

Suprhimp commented Feb 29, 2024

Suprhimp commented Mar 11, 2024

aws-taylor commented May 22, 2024

I can't set params optlevel to 1 with torch_neuronx.trace #67

I can't set params optlevel to 1 with torch_neuronx.trace #67

Comments

Suprhimp commented Feb 20, 2024

aws-donkrets commented Feb 26, 2024

Suprhimp commented Feb 29, 2024

Suprhimp commented Mar 11, 2024

aws-taylor commented May 22, 2024

I can't set params `optlevel` to `1` with torch_neuronx.trace #67

I can't set params `optlevel` to `1` with torch_neuronx.trace #67