Canot from dataloader.bigquery_pypi import LLMDataset in file pl_data.py in line 11 #2

jswjc555 · 2024-05-31T08:23:26Z

Hello, I am stuck at Step 2: running training.

Could you please let me know if the function dataloader.bigquery_pypi import LLMDataset is from a third-party library of dataloader or a specific implementation within this project? My local version of dataloader is 2.0, and it cannot be imported. Moreover, there is no dataloader file within the project, which makes it impossible to import and run pl_data. Am I missing the implementation of LLMDataset?

Expecting a response from the author. Thx

The text was updated successfully, but these errors were encountered:

xiaowu0162 · 2024-05-31T08:29:14Z

Hi,

Thank you for raising the issue. We missed a file when preparing the code. I will submit a pr to fix that. For now, you can create a file dataloader/bigquery_pypi.py and put the following content in it:

from torch.utils.data import Dataset

class LLMDataset(Dataset):
    def __init__(self, data):
        super(LLMDataset, self).__init__()
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, ind):
        # indexing the chunked data directly
        source_tokens = torch.tensor(self.data[ind]['token_ids'])
        return {"input_ids": source_tokens}

xiaowu0162 mentioned this issue May 31, 2024

Add missing file #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canot from dataloader.bigquery_pypi import LLMDataset in file pl_data.py in line 11 #2

Canot from dataloader.bigquery_pypi import LLMDataset in file pl_data.py in line 11 #2

jswjc555 commented May 31, 2024

xiaowu0162 commented May 31, 2024

Canot from dataloader.bigquery_pypi import LLMDataset in file pl_data.py in line 11 #2

Canot from dataloader.bigquery_pypi import LLMDataset in file pl_data.py in line 11 #2

Comments

jswjc555 commented May 31, 2024

xiaowu0162 commented May 31, 2024