-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLIP4STR] Integrate CLIP4STR #20
base: main
Are you sure you want to change the base?
Conversation
Bin-NV
commented
Jan 2, 2025
- Support CLIP4STR inference with visual branch trt engine and text branch trt engine simultaneously
- Support upper and lower output
bool ocrnet_only_alnum = false; | ||
bool ocrnet_only_lowercase = false; | ||
char* ocrnet_vocab_file; | ||
int ocrnet_vocab_size = 32000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why can't we use a dict_file or vocabulary file to determine those parameters:
ocrnet_only_alnum
ocrnet_only_lowercase
ocrnet_vocab_file
ocrnet_vocab_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the raw output from CLIP4STR includes0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_
{|}~`
so ocrnet_only_alnum and ocrnet_only_lowercase flags are used to control if we should filter the upper and symbol chars
ocrnet_vocab_file includs 26k words, we only need 32000 words here
char* ocrnet_vocab_file; | ||
int ocrnet_vocab_size = 32000; | ||
// char* charset_train = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"; | ||
// char* charset_test = "0123456789abcdefghijklmnopqrstuvwxyz"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused code/
}; | ||
|
||
|
||
class SimpleTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the tokenizer to a separate file?
initTokenizer(bpe_path, vocab_size); | ||
} | ||
|
||
void initTokenizer(const std::string& bpe_path, int vocab_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you separate the declaration and definition in two files?
private: | ||
std::unique_ptr<TRTEngine> mEngine; | ||
std::vector<std::string> mDict; | ||
bool mUDFlag; | ||
DecodeMode mDecodeMode; | ||
|
||
// int mDecodeOutputBufferIndex; | ||
// CLIP4STR | ||
std::unique_ptr<TRTEngine> mTextEngine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, as you already broke the notion of "one model, one engine", I would suggest to create a new class inherited from OCRNetEngine for the CLIP4STR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do