Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when loading dataset with date format "yyyy" #32

Closed
kbrueckmann opened this issue Oct 15, 2024 · 8 comments
Closed

error when loading dataset with date format "yyyy" #32

kbrueckmann opened this issue Oct 15, 2024 · 8 comments

Comments

@kbrueckmann
Copy link

I'm updating files in a dataset without touching anything else. The dataset has a set "time period" in its metadata with these values:

Start Date: 1594
End Date: 1636

When loading the dataset they apparently lead to a ValidationError (I assume because only a year is given):

File "venv/lib/python3.12/site-packages/easyDataverse/dataverse.py", line 315, in load_dataset
self._construct_block_classes(blocks, dataset)
File "venv/lib/python3.12/site-packages/easyDataverse/dataverse.py", line 416, in _construct_block_classes
dataset.metadatablocks[name] = metadatablock.class.model_validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "venv/lib/python3.12/site-packages/pydantic/main.py", line 596, in model_validate
return cls.pydantic_validator.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Citation
time_period_covered.0.start
Datetimes provided to dates should have zero time - e.g. be exact dates [type=date_from_datetime_inexact, input_value='1594', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_inexact
time_period_covered.0.end
Datetimes provided to dates should have zero time - e.g. be exact dates [type=date_from_datetime_inexact, input_value='1636', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/date_from_datetime_inexact

Is there any way to change that behavior?

@kbrueckmann
Copy link
Author

Forgot to mention: I cannot change the date to something like "01.01.1594" because Dataverse won't accept that in that field. Otherwise I get this error message: Time Period Start Date is not a valid date. "yyyy" is a supported format.

@pdurbin
Copy link
Member

pdurbin commented Oct 15, 2024

Interesting. Indeed I can enter these values fine at https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/RN55IT

Screenshot 2024-10-15 at 4 25 53 PM

@kbrueckmann
Copy link
Author

Yes, entering them is no problem. What happens if you now try to load this dataset via the load_dataset()-function?

@JR-1991
Copy link
Member

JR-1991 commented Oct 16, 2024

@kbrueckmann, thank you for bringing up this issue! It’s a known limitation with Python’s date module when used with pydantic, as it requires a full date and doesn’t support year-only entries.

There’s an open PR (#27) that resolves this by reverting to a str input. Due to the variety of date formats in Dataverse, using the date module has become impractical. I’ll be reviewing and merging the open PRs over the next two weeks for the upcoming release, which will include all the new features as well as fixes.

@JR-1991
Copy link
Member

JR-1991 commented Oct 16, 2024

I have merged the PR and the fix is now available on the main branch. You can use the updated version now, by using the following command:

pip install git+https://github.com/gdcc/easyDataverse.git

Here is a colab notebook that uses the current version and assigns the time period via strings. Loading the dataset now also works:

image

@kbrueckmann
Copy link
Author

Thanks for your quick replies and help, @pdurbin and @JR-1991 ! I just tested the fix (after the pip install, of course), but I'm still having difficulties. The rich.print() in my code below never happens, because the dataset loading fails with the same ValidationError as before. I think the difference to the shared colab might be that I'm not setting the time period values but rather just loading a dataset in which they were previously entered via the GUI (or somehow the update to the fix didn't work, but I got no error messages indicating that).

Here is what I'm doing:

    dataverse = Dataverse(
        server_url="https://heidata.uni-heidelberg.de/",
        api_token=api_token
    )
   
    dataset = dataverse.load_dataset(
        pid=pid,
        download_files=False
    )

    rich.print(dataset.citation)

The pid is the string "https://doi.org/10.11588/data/DVU14P". I can't share my API token, but at least for fetching data this one should work: 637c97c7-042e-4f00-b597-3736f07fe8a4 .

@JR-1991
Copy link
Member

JR-1991 commented Oct 18, 2024

@kbrueckmann thanks for sharing! I have tested your case and the issue stems from the wrong pid format. Dataverse expects the DOI in the format that is presented at your dataset instead of a link. You can find it within the Citation metadata block:

image

When using doi:10.11588/data/DVU14P the code does not fail anymore and the dataset is printed as expected. I have also tested it with your API Token and it worked as well. I would suggest recreating your token to prevent any malicious use.

image

Hope that helped. Please let me know if there are any other issues, happy to help 🙌

@kbrueckmann
Copy link
Author

After changing the pid to the required format, I still had the same problem – so just to make sure it wasn't connected to any issues with the update I set up a new venv; did a fresh install of the necessary packages and now it's working perfectly. Thank you so much, @JR-1991 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants