Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training_special_tokens skip loss #95

Open
Hanzhang-lang opened this issue Nov 24, 2024 · 0 comments
Open

training_special_tokens skip loss #95

Hanzhang-lang opened this issue Nov 24, 2024 · 0 comments

Comments

@Hanzhang-lang
Copy link

Hanzhang-lang commented Nov 24, 2024

if context_markups is not None:
for i, (label_id_list, source_len) in enumerate(zip(labels, sources_tokenized["input_ids_lens"])):
context_start = False
for j, orig_token in enumerate(label_id_list[source_len:]):
if context_start is False and orig_token == context_markups[0]:
context_start = True
start_idx = j+source_len
for k, orig_token_2 in enumerate(label_id_list[start_idx:]):
if orig_token_2 == context_markups[1]:
end_idx = start_idx + k
labels[i][start_idx+1:end_idx] = IGNORE_INDEX
context_start = False

If multiple paragraph references(<paragraph>) appear here, will all the content in between be set to IGNORE_INDEX? Is this incorrect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant