Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can't set params optlevel to 1 with torch_neuronx.trace #67

Open
Suprhimp opened this issue Feb 20, 2024 · 4 comments
Open

I can't set params optlevel to 1 with torch_neuronx.trace #67

Suprhimp opened this issue Feb 20, 2024 · 4 comments

Comments

@Suprhimp
Copy link

My environment is aws server inf2.8xlarge

python : 3.8.10
torch-neuronx : 2.1.1.2.0.1b0
neuronx-cc : 2.12.68.0+4480452af

I'm trying to compile esrgan torch model to neuron but I have an issue.

from PIL import Image
import requests

import torch
import torch_neuronx
from torchvision import models
from torchvision.transforms import functional

from modules.esrgan_upscale import upscale_model_loader
import os
os.environ["NEURON_CC_FLAGS"] = "-O1"
# load the model
model = upscale_model_loader('modules/weight/4x-Ultrasharp.pth')
model.eval()

# Get an example input
image = Image.open('/home/ubuntu/diffusers-ultimate-upscale/testIm.png')
image = image.convert('RGB')
image = functional.to_tensor(image)
image = torch.unsqueeze(image, 0)

# Run inference on CPU
output_cpu = model(image)

# Compile the model
model_neuron = torch_neuronx.trace(model, image,compiler_args=['--optlevel','1'])

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

when I run this code
first it gives me this log

2024-02-20T13:36:54Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

I want to compile with -O1 because of this error log (yes, i failed compile)

[XTP002] Too many instructions after unroll for function sg0000! - Compiling under --optlevel=1 may result in smaller graphs. If you are using a transformer model, try using a smaller context_length_estimate value.

I can't set the optlevel flag to 1 ... even I changed inside the module code like this

    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--output",
        neff_filename,
        "--optlevel",
        "1"
    ]
    command.extend(compiler_args)

what should I do if I want to compile with --optlevel=1 with torch_neuronx.trace ?

@aws-donkrets
Copy link

Hi Suprhimp, took a quick look at your code and it seems to be correct.
The torch_neuronx.trace call can pass compiler options and the way you have done it looks correct as does your command definition. I'll note you don't need to use the os.environ["NEURON_CC_FLAGS"] = "-O1" line so that can be removed. One suggestion is to move the neff_filename parameter to the end of the command setting, allowing all the cmd-line flags to appear before the filename. So, the cmd-line would look like:
neuronx_cc compile input_file_name --framework XLA --target trn1 --optlevel 1 --output neff_filename

Another suggestion would be to run the above command by hand to see if you get the same result.

@Suprhimp
Copy link
Author

Hi, Thanks for checking my issue @aws-donkrets :)

even if I change the code like this
in trace.py function name hlo_compile like this

if neuron_cc is None:
        raise RuntimeError("neuronx-cc compiler binary does not exist")
    command = [
        neuron_cc,
        "compile",
        filename,
        "--framework",
        "XLA",
        "--target",
        "trn1",
        "--optlevel",
        "1",
        "--output",
        neff_filename,
    ]
    command.extend(compiler_args)

it gives me this log

2024-02-29T02:01:25Z Compilation is optimized for best performance and compilation time. For faster compilation time please use -O1

and also I faild my compile ;)

@Suprhimp
Copy link
Author

@aws-donkrets hello, let me add question, Is there any way to compile .pth file to run my torch file in inf2 instance?

faster compile flag still not work.

Can you check it please?

@aws-taylor
Copy link

Hello @Suprhimp,

We do not directly support compiling .pth files, you would need to load it first, perhaps using load_state_dict(), then trace the loaded model to trigger compilation.

Could you share your model or more of the failure logs from the compiler (usually log-neuronx-cc.txt)? That will give us more of an idea of why the failure is occurring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants