You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you please let me know if the function dataloader.bigquery_pypi import LLMDataset is from a third-party library of dataloader or a specific implementation within this project? My local version of dataloader is 2.0, and it cannot be imported. Moreover, there is no dataloader file within the project, which makes it impossible to import and run pl_data. Am I missing the implementation of LLMDataset?
Expecting a response from the author. Thx
The text was updated successfully, but these errors were encountered:
Thank you for raising the issue. We missed a file when preparing the code. I will submit a pr to fix that. For now, you can create a file dataloader/bigquery_pypi.py and put the following content in it:
from torch.utils.data import Dataset
class LLMDataset(Dataset):
def __init__(self, data):
super(LLMDataset, self).__init__()
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, ind):
# indexing the chunked data directly
source_tokens = torch.tensor(self.data[ind]['token_ids'])
return {"input_ids": source_tokens}
Hello, I am stuck at Step 2: running training.
Could you please let me know if the function
dataloader.bigquery_pypi import LLMDataset
is from a third-party library ofdataloader
or a specific implementation within this project? My local version ofdataloader
is 2.0, and it cannot be imported. Moreover, there is nodataloader
file within the project, which makes it impossible to import and runpl_data
. Am I missing the implementation ofLLMDataset
?Expecting a response from the author. Thx
The text was updated successfully, but these errors were encountered: