-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irc_disentangle - Issue with splitting data #6906
Comments
Thank you I will try this out!
…On Tue, Jun 11, 2024 at 3:55 AM Vincent Lau ***@***.***> wrote:
I add a "streaming=True" after the name of the dataset, and it
works.....hope it can help you
And if you install the version datasets==2.15.0, this bug will not happen.
I don't know why, but all of them works
—
Reply to this email directly, view it on GitHub
<#6906 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3HXU7AMBT2MNO34SC3Z5G3ZG2UOXAVCNFSM6AAAAABH45CNPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGA2DCOBRGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I still find out that there are some strange bug in v2.15.0 of datasets. it seems like that the *.arrow file cannot be established. it may be an index of the subsets. well I still try to debug it. but, one of the most efficient way may be using the google colab to build this index in the ~/huggingface/datasets, and than download them to replace the local file.....lol......it works! |
Yeah I did try what you suggested and it didn’t work. I was able to get it
on a local from someone who access the dataset in the past. Let me know
when you end up fixing this bug.
…On Tue, Jun 11, 2024 at 10:33 PM Vincent Lau ***@***.***> wrote:
I still find out that there are some strange bug in v2.15.0 of datasets.
it seems like that the *.arrow file cannot be established. it may be an
index of the subsets. well I still try to debug it. but, one of the most
efficient way may be using the google colab to build this index in the
~/huggingface/datasets, and than download them to replace the local
file.....lol......it works!
—
Reply to this email directly, view it on GitHub
<#6906 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3HXU7BCJE2LOCWRVWPMNODZG6XPJAVCNFSM6AAAAABH45CNPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRHE4DQNZZHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Could you please provide more information, as required by the Bug template: https://github.com/huggingface/datasets/issues/new?assignees=&labels=&projects=&template=bug-report.yml Without all that information, it is very difficult for us to understand the underlying issue and to give a pertinent answer. What are the versions of the libraries you are using? Datasets, pyarrow, fsspec,...
What is the output you get after executing these code lines? import datasets
ds = datasets.load_dataset('irc_disentangle')
ds |
Describe the bug
I am trying to access your database through python using "datasets.load_dataset("irc_disentangle")" and I am getting this error message:
ValueError: Instruction "train" corresponds to no data!
Steps to reproduce the bug
import datasets
ds = datasets.load_dataset('irc_disentangle')
ds
Expected behavior
The data is supposed to load into ds and be accessable as such:
ds['train'][1050], ds['train'][1055]
Environment info
I tired Python 3.12 and 3.10
The text was updated successfully, but these errors were encountered: