Introduce LayerNorm IR ? #14447

seanshpark · 2024-12-13T04:22:38Z

Transformers(including ViT) has LayerNorm Op in the graph.
Circle model from ONNX has decomposed sub graph but would it be better to process as a single Node ?

test code to generate onnx

import onnx
import torch
import torch.nn as nn


class LayerNormNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.ln = nn.LayerNorm((3, 16))

    def forward(self, x):
        out = self.ln(x)
        return out


net = LayerNormNet()
inp = torch.randn(1, 3, 16)

torch.onnx.export(net, inp, "ln11.onnx", opset_version=11)
onnx.shape_inference.infer_shapes_path('ln11.onnx', 'ln11-si.onnx')

torch.onnx.export(net, inp, "ln17.onnx", opset_version=17)
onnx.shape_inference.infer_shapes_path('ln17.onnx', 'ln17-si.onnx')

onnx graph

opset=11	opset=17

The text was updated successfully, but these errors were encountered:

seanshpark · 2024-12-13T04:23:08Z

seanshpark · 2024-12-13T04:24:23Z

@Samsung/one_compiler , @Samsung/one_onert , comments ?

glistening · 2024-12-13T06:29:22Z

At least for this year, we haven't need LayerNorm. Instead, RMSNorm was used in our target model. Thus, it is low priority at this moment in runtime perspective. However, if it is necessary for front-end or some model I am not aware of. Please feel free to add.

jinevening · 2024-12-13T06:32:59Z

@seanshpark Could you check if the parameter of layernorm in your example code matches with the real model? normalized_shape is usually the last dimension in langue models. I'm asking because characteristics of LN operation is very different according to normalized_shape.

If LN is given as a single Op, our backend device may convert it to InstanceNorm. For example,

Before
Input [N, L, D] -> LayerNorm [N, L, D]

After
Input [N, L, D] -> Transpose [N, D, L] -> Reshape [N, 1, D, L] -> InstanceNorm [N, 1, D, L] -> Reshape [N, D, L] -> Transpose [N, L, D]

I'm not against introducing LN (maybe for backends other than npu), but for now it would be also possible to just use InstanceNorm.

seanshpark · 2024-12-13T06:44:27Z

Could you check if the parameter of layernorm in your example code matches with the real model?

Not sure what match means, but from our customer, input shape is [1, 16384, 128].
ReduceMean has different attributes.

seanshpark · 2024-12-13T07:27:47Z

got same attribute with this

import onnx
import torch
import torch.nn as nn


class LayerNormNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.ln = nn.LayerNorm((128))

    def forward(self, x):
        out = self.ln(x)
        return out


net = LayerNormNet()
inp = torch.randn(1, 16384, 128)

torch.onnx.export(net, inp, "ln11.onnx", opset_version=11)
onnx.shape_inference.infer_shapes_path('ln11.onnx', 'ln11-si.onnx')

torch.onnx.export(net, inp, "ln17.onnx", opset_version=17)
onnx.shape_inference.infer_shapes_path('ln17.onnx', 'ln17-si.onnx')

jinevening · 2024-12-13T07:43:40Z

got same attribute with this

I expected this :)

jinevening · 2024-12-13T07:47:27Z

Introducing LN may lead to a lot of works to do (including npu compiler) but no visible benefit as of now. If onnx2circle generates the sequence in #14447 (comment), no additional work would be required.

seanshpark · 2024-12-13T08:02:13Z

If onnx2circle generates the sequence in (comment), no additional work would be required.

If I understand this correctly, in anyway around,
(1) ONNX model has LayerNorm
(2) onnx2circle converts to Transpose -> Reshape -> InstanceNorm -> Reshape -> Transpose

or

(1) ONNX model has LayerNorm
(2) onnx2circle converts to LayerNorm (of Circle IR)
(3) circle2circle converts to Transpose -> Reshape -> InstanceNorm -> Reshape -> Transpose

something like this?

Add;

if ONNX model has decomposed LayerNorm sub graph, we have to first fuse to LayerNorm(circle)
or support opset_version=17

jinevening · 2024-12-16T00:01:37Z

something like this?

Yes. I imagined the first approach which does not need modification on circle schema.

if ONNX model has decomposed LayerNorm sub graph, we have to first fuse to LayerNorm(circle)

Even if onnx model has decomposed LN (due to low opset version), the fusion can be done inside onnx2circle, which does not affect circle schema.

Please note that it's just my opinion to minimize our workload considering the current generation of npu. For the next generation npu, we may need further discussion.

seanshpark · 2024-12-16T00:38:31Z

As there is no particular benefit adding CircleLayerNorm IR as of now,
I'll close this issue after a day or two if there is no other opinions.
We may add new issue to provide as #14447 (comment) suggestion.

seanshpark mentioned this issue Dec 17, 2024

[luci/pass] Experiment with decomposed LayerNorm to InstanceNorm #14467

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce LayerNorm IR ? #14447

Introduce LayerNorm IR ? #14447

seanshpark commented Dec 13, 2024 •

edited

Loading

seanshpark commented Dec 13, 2024

seanshpark commented Dec 13, 2024

glistening commented Dec 13, 2024

jinevening commented Dec 13, 2024

seanshpark commented Dec 13, 2024

seanshpark commented Dec 13, 2024

jinevening commented Dec 13, 2024

jinevening commented Dec 13, 2024

seanshpark commented Dec 13, 2024 •

edited

Loading

jinevening commented Dec 16, 2024

seanshpark commented Dec 16, 2024

Introduce LayerNorm IR ? #14447

Introduce LayerNorm IR ? #14447

Comments

seanshpark commented Dec 13, 2024 • edited Loading

seanshpark commented Dec 13, 2024

seanshpark commented Dec 13, 2024

glistening commented Dec 13, 2024

jinevening commented Dec 13, 2024

seanshpark commented Dec 13, 2024

seanshpark commented Dec 13, 2024

jinevening commented Dec 13, 2024

jinevening commented Dec 13, 2024

seanshpark commented Dec 13, 2024 • edited Loading

jinevening commented Dec 16, 2024

seanshpark commented Dec 16, 2024

seanshpark commented Dec 13, 2024 •

edited

Loading

seanshpark commented Dec 13, 2024 •

edited

Loading