Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLIP4STR] Integrate CLIP4STR #20

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

[CLIP4STR] Integrate CLIP4STR #20

wants to merge 1 commit into from

Conversation

Bin-NV
Copy link
Collaborator

@Bin-NV Bin-NV commented Jan 2, 2025

  • Support CLIP4STR inference with visual branch trt engine and text branch trt engine simultaneously
  • Support upper and lower output

@Bin-NV Bin-NV requested a review from Tyler-D January 2, 2025 09:09
bool ocrnet_only_alnum = false;
bool ocrnet_only_lowercase = false;
char* ocrnet_vocab_file;
int ocrnet_vocab_size = 32000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why can't we use a dict_file or vocabulary file to determine those parameters:

ocrnet_only_alnum
ocrnet_only_lowercase
ocrnet_vocab_file
ocrnet_vocab_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the raw output from CLIP4STR includes0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_{|}~`
so ocrnet_only_alnum and ocrnet_only_lowercase flags are used to control if we should filter the upper and symbol chars

ocrnet_vocab_file includs 26k words, we only need 32000 words here

char* ocrnet_vocab_file;
int ocrnet_vocab_size = 32000;
// char* charset_train = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~";
// char* charset_test = "0123456789abcdefghijklmnopqrstuvwxyz";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused code/

};


class SimpleTokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the tokenizer to a separate file?

initTokenizer(bpe_path, vocab_size);
}

void initTokenizer(const std::string& bpe_path, int vocab_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you separate the declaration and definition in two files?

private:
std::unique_ptr<TRTEngine> mEngine;
std::vector<std::string> mDict;
bool mUDFlag;
DecodeMode mDecodeMode;

// int mDecodeOutputBufferIndex;
// CLIP4STR
std::unique_ptr<TRTEngine> mTextEngine;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, as you already broke the notion of "one model, one engine", I would suggest to create a new class inherited from OCRNetEngine for the CLIP4STR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants