Problem about value range of valenc in func get_c4() #59

88099981 · 2024-11-13T07:17:58Z

As I tried to use c4 dataset to run gptq, got an ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width)) , during debugging i found there may be some logical trouble about the code below when the number of tmp.input_ids.shape[1] is equal to the value of seqlen. We got a random.randint(0, -1).

    import random
    random.seed(0)
    valenc = []
    for _ in range(256):
        while True:
            i = random.randint(0, len(valdata) - 1)
            tmp = tokenizer(valdata[i]['text'], return_tensors='pt')
            if tmp.input_ids.shape[1] >= seqlen:
                break
        i = random.randint(0, tmp.input_ids.shape[1] - seqlen - 1)
        j = i + seqlen
        valenc.append(tmp.input_ids[:, i:j])

Please find attached my debugging record.

The text was updated successfully, but these errors were encountered:

88099981 · 2024-11-13T07:30:27Z

It works after I changed the code to i = random.randint(0, tmp.input_ids.shape[1] - seqlen)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem about value range of valenc in func get_c4() #59

Problem about value range of valenc in func get_c4() #59

88099981 commented Nov 13, 2024 •

edited

Loading

88099981 commented Nov 13, 2024

Problem about value range of valenc in func get_c4() #59

Problem about value range of valenc in func get_c4() #59

Comments

88099981 commented Nov 13, 2024 • edited Loading

88099981 commented Nov 13, 2024

88099981 commented Nov 13, 2024 •

edited

Loading