[Bug]: Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working #28171

azhuvath · 2024-12-20T13:38:50Z

OpenVINO Version

2024.6

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

NPU

Framework

None

Model used

openai/clip-vit-large-patch14

Issue description

Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working. Error observed is as follows.

[ERROR] 05:26:28.301 [vpux-compiler] Got Diagnostic at loc(fused<{name = "__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution", type = "Convolution"}>["__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution"]) : Channels count of input tensor shape and filter shape must be the same: -9223372036854775808 != 3

loc(fused<{name = "__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution", type = "Convolution"}>["__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution"]): error: Channels count of input tensor shape and filter shape must be the same: -9223372036854775808 != 3
LLVM ERROR: Failed to infer result type(s).

Step-by-step reproduction

Create Environment

python -m venv npu_env
./npu_env/Scripts/activate
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install pillow scikit-learn requests transformers openvino

Code to execute. Change CPU to NPU

import requests
import numpy as np
import openvino as ov
from scipy.special import softmax
from PIL import Image
from pathlib import Path
from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

classes = ["a photo of a cat", "a photo of a dog"]
inputs = processor(text=classes, images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
predicted_idx = probs.argmax().item()
print(classes[predicted_idx])

ov_model_path = "clip-vit-large-patch14-fp32.xml"
fp32_model_path = Path(ov_model_path)
model.config.torchscript = True

ov_model = ov.convert_model(model, example_input=dict(inputs))
ov.save_model(ov_model, fp32_model_path, compress_to_fp16=False)

device = 'NPU'
core = ov.Core()
compiled_model = core.compile_model(ov_model_path, device)
inputs = dict(inputs)
outputs = compiled_model(inputs)[0]
probs = softmax(outputs, axis=1)
[predicted_idx] = np.argmax(probs, axis=1)
print(classes[predicted_idx])

Relevant log output

[ERROR] 05:26:28.301 [vpux-compiler] Got Diagnostic at loc(fused<{name = "__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution", type = "Convolution"}>["__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution"]) : Channels count of input tensor shape and filter shape must be the same: -9223372036854775808 != 3
loc(fused<{name = "__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution", type = "Convolution"}>["__module.vision_model.embeddings.patch_embedding/aten::_convolution/Convolution"]): error: Channels count of input tensor shape and filter shape must be the same: -9223372036854775808 != 3
LLVM ERROR: Failed to infer result type(s).

Issue submission checklist

I'm reporting an issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.

mlyashko · 2024-12-20T16:17:29Z

There is a new version of Linux driver available, please use this driver: https://github.com/intel/linux-npu-driver/releases/tag/v1.10.1

azhuvath added bug Something isn't working support_request labels Dec 20, 2024

ilya-lavrenov assigned mlyashko and PatrikStepan Dec 20, 2024

ilya-lavrenov added the category: NPU OpenVINO NPU plugin label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working #28171

[Bug]: Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working #28171

azhuvath commented Dec 20, 2024

mlyashko commented Dec 20, 2024

[Bug]: Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working #28171

[Bug]: Model inference on Windows LNL NPU for openai/clip-vit-large-patch14 is not working #28171

Comments

azhuvath commented Dec 20, 2024

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Create Environment

Code to execute. Change CPU to NPU

Relevant log output

Issue submission checklist

mlyashko commented Dec 20, 2024