Using CLIP ViT-B-32, getting errors about invalid input dimensions #154
-
Hi! I'm busy trying to use
Which I found in https://github.com/Lednik7/CLIP-ONNX. I can get the model to work, but only if I provide it exactly 77 tokens. I'm hoping someone can help me figure out how to get it to work with arbitrary numbers of tokens? Here's the code that works, but I've had to make the input string be exactly 77 tokens long: use instant_clip_tokenizer::{Token, Tokenizer};
use ndarray::{Array1, Array2, Axis};
use ort::{inputs, GraphOptimizationLevel, Session};
pub fn load_text_model() -> ort::Result<()> {
// The ONNX file has already been saved to models/textual.onnx
let text_model = Session::builder()?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(1)?
.with_model_from_file("models/textual.onnx")?;
// The tokenizer comes from
// https://docs.rs/instant-clip-tokenizer/0.1.0/instant_clip_tokenizer
let tokenizer = Tokenizer::new();
// See `tokenize(...)` below. The string I give here is just a dummy piece of text that
// ends up being 77 tokens long.
let tokens = tokenize(tokenizer, "Hi there my name is john and I like to walk in the park with my son and daughter. when we go walking in the sun I like to feel it warm my neck and I like to hold their hands as they tell me about their day. sometimes they have had a poor day and it makes me sad to hear about their poor day but other times I hear about")
.iter()
.map(|tk| *tk as i64)
.collect::<Vec<_>>();
let mut tokens = Array1::from_iter(tokens);
// Preprocess the tokens into the right shape
let array = tokens.view().insert_axis(Axis(0));
let inputs = inputs!["input" => array]?;
// Pass the inputs through the model
let model_output = text_model.run(inputs)?;
// Extract the embedding from the model
let outputs = model_output["output"].extract_tensor::<f32>()?;
// This tensor is correct, I've verified it with a Python CLIP model
println!("Output Tensor: {:?}", outputs);
Ok(())
}
fn tokenize(tokenizer: Tokenizer, text: &str) -> Vec<u16> {
let mut tokens = vec![tokenizer.start_of_text()];
tokenizer.encode(text, &mut tokens);
tokens.push(tokenizer.end_of_text());
tokens.into_iter().map(Token::to_u16).collect()
} If I change the string to be a bit shorter or a bit longer: ...
let tokens = tokenize(tokenizer, "short string")
... Then I get this error:
I know this isn't quite the right place to ask, because this seems like an ONNX issue, but I'm hoping you can point me in the right direction |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Use See here for an example: https://github.com/pykeio/diffusers/blob/67158927fb847ebfb63986c14b31fdbb6a2569e7/src/clip.rs#L40-L54 |
Beta Was this translation helpful? Give feedback.
Use
Tokenizer
from thetokenizers
crate, which supports padding.See here for an example: https://github.com/pykeio/diffusers/blob/67158927fb847ebfb63986c14b31fdbb6a2569e7/src/clip.rs#L40-L54