Confused by the code #26

2004learner · 2025-01-09T09:30:59Z

Can I ask some questions?
In the single_inference_7b_13b.py, there exists code in the implementation of SupervisedDataset class,

self.input_ids = data_dict["input_ids"] + data_dict["input_ids"][-100:]
self.labels = data_dict["labels"] + data_dict["labels"][-100:]

I don't know the reason of data_dict["input_ids"][-100:];

And here is the implementation of padding function:

def padding(inputs, padding_token, cutoff = None):
    num_elems = len(inputs)
    if cutoff is None:
        cutoff = max([len(item) for item in inputs])
    else:
        cutoff = min(max([len(item) for item in inputs]), cutoff)

    tokens = torch.ones(num_elems, cutoff).long().to(inputs[0].device) * padding_token
    for i in range(num_elems):
        toks = inputs[i]
        length = min(cutoff, len(toks))
        tokens[i, -length:] = toks[-length:]
    return tokens

It seems that the side of padding is left, why here the side is left?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused by the code #26

Confused by the code #26

2004learner commented Jan 9, 2025

Confused by the code #26

Confused by the code #26

Comments

2004learner commented Jan 9, 2025