training_special_tokens skip loss #95

Hanzhang-lang · 2024-11-24T06:16:44Z

self-rag/data_creation/train_special_tokens.py

Lines 213 to 224 in 1fcdc42

    
           if context_markups is not None: 
        
               for i, (label_id_list, source_len) in enumerate(zip(labels, sources_tokenized["input_ids_lens"])): 
        
                   context_start = False 
        
                   for j, orig_token in enumerate(label_id_list[source_len:]): 
        
                       if context_start is False and orig_token == context_markups[0]: 
        
                           context_start = True 
        
                           start_idx = j+source_len 
        
                           for k, orig_token_2 in enumerate(label_id_list[start_idx:]): 
        
                               if orig_token_2 == context_markups[1]: 
        
                                   end_idx = start_idx + k 
        
                           labels[i][start_idx+1:end_idx] = IGNORE_INDEX 
        
                           context_start = False

If multiple paragraph references(<paragraph>) appear here, will all the content in between be set to IGNORE_INDEX? Is this incorrect?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training_special_tokens skip loss #95

training_special_tokens skip loss #95

Hanzhang-lang commented Nov 24, 2024 •

edited

Loading

training_special_tokens skip loss #95

training_special_tokens skip loss #95

Comments

Hanzhang-lang commented Nov 24, 2024 • edited Loading

Hanzhang-lang commented Nov 24, 2024 •

edited

Loading