Skip to content

add disclaimer when downloading a dataset #1075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 8 tasks
SiQube opened this issue Apr 1, 2025 · 3 comments
Open
1 of 8 tasks

add disclaimer when downloading a dataset #1075

SiQube opened this issue Apr 1, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request essential important

Comments

@SiQube
Copy link
Member

SiQube commented Apr 1, 2025

Description of the problem

currently when a user uses a public pymovements dataset it is assumed that they will do the proper research within our method to attribute/cite the dataset correctly.

a disclaimer should be integated

Description of a solution

  • integrate a general disclaimer about dataset attribution if the dataset is used in their project.
  • add citation key(s) from pymovements bibliography to each dataset
  • add a dataset description property (similar to the docstring)

Minimum acceptance criteria

  • disclaimer automatically appears when downloading a dataset
>>> import pymovements as pm
>>> pm.Dataset('BSC', 'data').download()
You are downloading the BSC dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
  • property showing general dataset information
>>> import pymovements as pm
>>> pm.Dataset('BSC', 'data').description
Some general description of the BSC dataset.
  • add citation_keys property
>>> import pymovements as pm
>>> pm.Dataset('BSC', 'data').citation_keys
BSC
  • add test checking that citation keys are not the empty string for public datasets
  • add citation (key) o disclaimer
>>> import pymovements as pm
>>> pm.Dataset('BSC', 'data').download()
You are downloading the BSC dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.

You can lookup the citation in our bibliography using ...

Sample Code

to implement the disclaimer add to dataset.py

def disclaimer(self) -> str:
    _disclaimer_text = f"""\
You are downloading the {self.name} dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
"""
    return _disclaimer_text

add to download:

self.disclaimer()

add to dataset_definition.py

citation_keys: Sequence[str] = ''

this can also be a list or tuple in case one dataset spans multiple citations without changing the type annotation

citation_keys: Sequence[str] = ['citation1', 'citation2', 'citation3']

add an assert to tests/unit/datasets/datasets_test.py checking that citation_keys are not the empty string:

assert definition_from_library['citation_keys'] != ''
assert python_definition['citation_keys'] != ''

extend disclaimer to include citation

_disclaimer_text = f"""\
...code from above

Check the pymovements bibliography for {' '.join(citation_keys)} for citation.
"""

@dkrako I self-assign, please edit this issue for anything you want differently, then you can delete this tagging and I'll implement it asap.

@SiQube SiQube added enhancement New feature or request essential important labels Apr 1, 2025
@SiQube SiQube self-assigned this Apr 1, 2025
@dkrako
Copy link
Contributor

dkrako commented Apr 1, 2025

I guess this is on one side a bit too complicated, and then too simple as the disclaimer doesn't provide the citation, just links to our bibliography.

Also, the citation key not useful for a user, as it is not shown in the bibliogrpahy: https://pymovements.readthedocs.io/en/stable/bibliography.html (just like bibtex item-keys are never shown in a rendered document).

For the disclaimer we need the fields name and citation (which should be in a human-readable citation format).

Also I wouldn't add anything to Dataset except for Dataset.download(disclaimer: bool = True).
There's not much use in explicitly calling Dataset.disclaimer(), because the download either has not happened, or already has finished, so then the disclaimer doesn't make sense anymore.
Instead, I would like to keep all logic in dataset/dataset_download.py.

This issue depends on #1057 (or at least the citation field), so don't worry about this issue here for now, and finish #1057 (or create a new PR just for the citation field).

@SiQube
Copy link
Member Author

SiQube commented Apr 1, 2025

should the disclaimer be optional?

@dkrako
Copy link
Contributor

dkrako commented Apr 1, 2025

Honestly I'm perfectly fine with having to explicitly opt-out (most users won't even notice that this is possible) but I also wouldn't mind much with not having the option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request essential important
Projects
None yet
Development

No branches or pull requests

2 participants