Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache directory bug #7

Open
dvolgyes opened this issue Nov 4, 2022 · 1 comment
Open

cache directory bug #7

dvolgyes opened this issue Nov 4, 2022 · 1 comment

Comments

@dvolgyes
Copy link

dvolgyes commented Nov 4, 2022

The user home directories are actually not a good place to store data.
This is configurable in the huggingface interface with something like this:

import datasets
datasets.load_dataset("sjyhne/mapai_training_data", cache_dir="/mnt/experiment-3/huggingface")

However, it seems there is a bug in the mapai_training_data.py, and it can't properly handle nondefault directory.
I would suggest to add an optional cache_dir to the create_dataset function which is propagated
into the load_dataset, and also fixing the bug in the huggingface datasets.

It is hackable, but especially on long term, after the competition, it would be nice to have a conformant dataset
which could be used in the future easily.

Reproduction:

  • execute the above lines

Possible cause:

  • very likely in the _split_genetators function the current working dir is not the one which is assumed,
    therefore the os.makedirs refer to wrong location.
@Sjyhne
Copy link
Owner

Sjyhne commented Nov 7, 2022

Yeah, I 100% agree with you. This will be fixed in the final version of the dataset released after the competition. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants