Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教关于Tokenizer的问题 #37

Open
magnificent1208 opened this issue Apr 10, 2023 · 1 comment
Open

请教关于Tokenizer的问题 #37

magnificent1208 opened this issue Apr 10, 2023 · 1 comment

Comments

@magnificent1208
Copy link

自制jsonl中,含有()这种符号无法识别。
我理解,本repo按照bert token的格式来做的,所以具体逻辑可以介绍下吗?
感谢

@HarderThenHarder
Copy link
Owner

Hi,如果您需要扩展 special token 可以尝试下使用下面这种方式:

special_tokens = ['(', ')']
tokenizer.add_tokens(special_tokens, special_tokens=True)
model.resize_token_embeddings(len(tokenizer))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants