-
Notifications
You must be signed in to change notification settings - Fork 28
Dataset info #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Dataset info #1057
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1057 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 80 84 +4
Lines 3602 3679 +77
Branches 646 646
=========================================
+ Hits 3602 3679 +77 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
def info(self) -> None: | ||
"""The information about the dataset. | ||
|
||
Print dataset information and citation key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the property should return a string instead of printing it. A user can easily call print(dataset.info)
if necessary
Great! Let's discuss this PR tomorrow. I guess some of the formatting should be edited into a more human-readable format. Currently the publications are cited in two non-human-readable formats: first as the sphinx-citation and then a bibtex citation. Furthermore, I would like to split the
This way we can simply store the main part of the disclaimer somewhere else and just fill in the name and the citation of the dataset. I would be in favor of a human-readable citation format instead of bibtex, because something like this is just hell to parse visually:
Moreover, not everyone uses bibtex and we don't want to force our citation preferences on users. If you want to keep the bibtex in someway, then please add a Of course we could additionally link to our documentation page in the disclaimer, but I don't know if this is necessary. Planning further ahead, the |
Maybe instead of We already have the I like |
@@ -47,6 +47,9 @@ class DatasetDefinition: | |||
---------- | |||
name: str | |||
The name of the dataset. (default: '.') | |||
info: str | |||
Information about the dataset including but not limited to original citation, | |||
general information. (default: '.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default is an empty string isn't it?
@@ -105,6 +109,31 @@ class BSCII(DatasetDefinition): | |||
|
|||
name: str = 'BSCII' | |||
|
|||
info: str = """\ | |||
BSCII dataset :cite:p:`BSCII`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor to remove the first line from all of the description strings, as the name of the dataset is already known to the user and the sphinx cite directive is not very useful when calling the property.
Moreover, if we use the string as a basis for autogenerating dataset docpages, the first line can be easily recreated by something like f'{dataset.name} dataset :cite:p:`{_get_bibtex_id(dataset.bibtex)}`'
(within the autogenerator script, and not included in the description
string or any definition file)
Nevertheless, one thing that we could add to the description is the verbose name of the dataset.
For example instead of writing:
This dataset includes monocular eye tracking data from several ...
It would be nicer to write:
The Beijing Sentence Corpus II (BSCII) includes monocular eye tracking data from several ...
regarding the |
add information about the dataset as a property.
additionally, resolves #987.
a final version of info
property
could be integrated into the download process, to make the user aware of the underlying information and citation.