Skip to content

Dataset info #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/pymovements/dataset/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1057,6 +1057,14 @@ def extract(
)
return self

@property
def info(self) -> None:
"""The information about the dataset.

Print dataset information and citation key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the property should return a string instead of printing it. A user can easily call print(dataset.info) if necessary

"""
print(self.definition.info)

@property
def path(self) -> Path:
"""The path to the dataset directory.
Expand Down
6 changes: 6 additions & 0 deletions src/pymovements/dataset/dataset_definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ class DatasetDefinition:
----------
name: str
The name of the dataset. (default: '.')
info: str
Information about the dataset including but not limited to original citation,
general information. (default: '.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default is an empty string isn't it?

has_files: dict[str, bool]
Indicate whether the dataset contains 'gaze', 'precomputed_events', and
'precomputed_reading_measures'.
Expand Down Expand Up @@ -142,6 +145,9 @@ class DatasetDefinition:

# pylint: disable=too-many-instance-attributes
name: str = '.'

info: str = ''

has_files: dict[str, bool] = field(default_factory=dict)

mirrors: dict[str, list[str]] | dict[str, tuple[str, ...]] = field(default_factory=dict)
Expand Down
34 changes: 31 additions & 3 deletions src/pymovements/datasets/bsc.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
class BSC(DatasetDefinition):
"""BSC dataset :cite:p:`BSC`.

This dataset includes monocular eye tracking data from a single participant in a single
This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

Expand All @@ -44,6 +44,10 @@ class BSC(DatasetDefinition):
name: str
The name of the dataset.

info: str
Information about the dataset including but not limited to original citation,
general information.

has_files: dict[str, bool]
Indicate whether the dataset contains 'gaze', 'precomputed_events', and
'precomputed_reading_measures'.
Expand Down Expand Up @@ -84,11 +88,11 @@ class BSC(DatasetDefinition):
Examples
--------
Initialize your :py:class:`~pymovements.dataset.Dataset` object with the
:py:class:`~pymovements.datasets.SBSAT` definition:
:py:class:`~pymovements.datasets.BSC` definition:

>>> import pymovements as pm
>>>
>>> dataset = pm.Dataset("SBSAT", path='data/SBSAT')
>>> dataset = pm.Dataset("BSC", path='data/BSC')

Download the dataset resources:

Expand All @@ -104,6 +108,30 @@ class BSC(DatasetDefinition):

name: str = 'BSC'

info: str = """\
BSC dataset :cite:p:`BSC`.

This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

The participant is instructed to read texts and answer questions.

Check the respective paper for details :cite:p:`BSC`.

If you use the dataset, please cite:

@article{BSC,
author={Pan, Jinger and Yan, Ming and Richter, Eike M. and Shu, Hua and Kliegl, Reinhold},
title={The {B}eijing {S}entence {C}orpus: A {C}hinese sentence corpus
with eye movement data and predictability norms},
journal={Behavior Research Methods},
year={2022},
volume={54},
issue={4},
}
"""

has_files: dict[str, bool] = field(
default_factory=lambda: {
'gaze': False,
Expand Down
23 changes: 23 additions & 0 deletions src/pymovements/datasets/bsc.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
name: BSC

info: |
BSC dataset :cite:p:`BSC`.

This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

The participant is instructed to read texts and answer questions.

Check the respective paper for details :cite:p:`BSC`.

If you use the dataset, please cite:

@article{BSC,
author={Pan, Jinger and Yan, Ming and Richter, Eike M. and Shu, Hua and Kliegl, Reinhold},
title={The {B}eijing {S}entence {C}orpus: A {C}hinese sentence corpus
with eye movement data and predictability norms},
journal={Behavior Research Methods},
year={2022},
volume={54},
issue={4},
}

has_files:
gaze: false
precomputed_events: true
Expand Down
31 changes: 30 additions & 1 deletion src/pymovements/datasets/bsc2.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
class BSCII(DatasetDefinition):
"""BSCII dataset :cite:p:`BSCII`.

This dataset includes monocular eye tracking data from a single participant in a single
This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

Expand All @@ -45,6 +45,10 @@ class BSCII(DatasetDefinition):
name: str
The name of the dataset.

info: str
Information about the dataset including but not limited to original citation,
general information.

has_files: dict[str, bool]
Indicate whether the dataset contains 'gaze', 'precomputed_events', and
'precomputed_reading_measures'.
Expand Down Expand Up @@ -105,6 +109,31 @@ class BSCII(DatasetDefinition):

name: str = 'BSCII'

info: str = """\
BSCII dataset :cite:p:`BSCII`.
Copy link
Contributor

@dkrako dkrako Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor to remove the first line from all of the description strings, as the name of the dataset is already known to the user and the sphinx cite directive is not very useful when calling the property.

Moreover, if we use the string as a basis for autogenerating dataset docpages, the first line can be easily recreated by something like f'{dataset.name} dataset :cite:p:`{_get_bibtex_id(dataset.bibtex)}`' (within the autogenerator script, and not included in the description string or any definition file)

Nevertheless, one thing that we could add to the description is the verbose name of the dataset.
For example instead of writing:

This dataset includes monocular eye tracking data from several ...

It would be nicer to write:

The Beijing Sentence Corpus II (BSCII) includes monocular eye tracking data from several ...


This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

The participant is instructed to read texts and answer questions. The original purpose was to
look into the differences in processing when reading simplified and traditional Chinese.

Check the respective paper for details :cite:p:`BSCII`.

If you use the dataset, please cite:

@article{BSCII,
author={Yan, Ming and Pan, Jinger and Kliegl, Reinhold},
title={The {B}eijing {S}entence {C}orpus {II}: A cross-script comparison
between traditional and simplified Chinese sentence reading},
journal={Behavior Research Methods},
year={2025},
volume={57},
issue={2},
}
"""

has_files: dict[str, bool] = field(
default_factory=lambda: {
'gaze': False,
Expand Down
24 changes: 24 additions & 0 deletions src/pymovements/datasets/bsc2.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
name: "BSCII"

info: |
BSCII dataset :cite:p:`BSCII`.

This dataset includes monocular eye tracking data from several participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an EyeLink 1000
eye tracker and precomputed events on aoi level are reported.

The participant is instructed to read texts and answer questions. The original purpose was to
look into the differences in processing when reading simplified and traditional Chinese.

Check the respective paper for details :cite:p:`BSCII`.

If you use the dataset, please cite:

@article{BSCII,
author={Yan, Ming and Pan, Jinger and Kliegl, Reinhold},
title={The {B}eijing {S}entence {C}orpus {II}: A cross-script comparison
between traditional and simplified Chinese sentence reading},
journal={Behavior Research Methods},
year={2025},
volume={57},
issue={2},
}

has_files:
gaze: false
precomputed_events: true
Expand Down
34 changes: 34 additions & 0 deletions src/pymovements/datasets/codecomprehension.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ class CodeComprehension(DatasetDefinition):
name: str
The name of the dataset.

info: str
Information about the dataset including but not limited to original citation,
general information.

has_files: dict[str, bool]
Indicate whether the dataset contains 'gaze', 'precomputed_events', and
'precomputed_reading_measures'.
Expand Down Expand Up @@ -98,6 +102,36 @@ class CodeComprehension(DatasetDefinition):

name: str = 'CodeComprehension'

info: str = """\
CodeComprehension dataset :cite:p:`CodeComprehension`.

This dataset includes eye-tracking-while-code-reading data from participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an
EyeLink 1000 eye tracker and are provided as pixel coordinates.

The participant is instructed to read the code snippet and answer a code comprehension question.

If you use the dataset, please cite:

@article{CodeComprehension,
author = {Alakmeh, Tarek and Reich, David and J\\"{a}ger, Lena and Fritz, Thomas},
title = {Predicting Code Comprehension: A Novel Approach to
Align Human Gaze with Code using Deep Neural Networks},
year = {2024},
issue_date = {July 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {1},
number = {FSE},
url = {https://doi.org/10.1145/3660795},
doi = {10.1145/3660795},
journal = {Proc. ACM Softw. Eng.},
month = {jul},
articleno = {88},
numpages = {23},
}
"""

has_files: dict[str, bool] = field(
default_factory=lambda: {
'gaze': False,
Expand Down
29 changes: 29 additions & 0 deletions src/pymovements/datasets/codecomprehension.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,34 @@
name: CodeComprehension

info: |
CodeComprehension dataset :cite:p:`CodeComprehension`.

This dataset includes eye-tracking-while-code-reading data from participants in a single
session. Eye movements are recorded at a sampling frequency of 1,000 Hz using an
EyeLink 1000 eye tracker and are provided as pixel coordinates.

The participant is instructed to read the code snippet and answer a code comprehension question.

If you use the dataset, please cite:

@article{CodeComprehension,
author = {Alakmeh, Tarek and Reich, David and J\"{a}ger, Lena and Fritz, Thomas},
title = {Predicting Code Comprehension: A Novel Approach to
Align Human Gaze with Code using Deep Neural Networks},
year = {2024},
issue_date = {July 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {1},
number = {FSE},
url = {https://doi.org/10.1145/3660795},
doi = {10.1145/3660795},
journal = {Proc. ACM Softw. Eng.},
month = {jul},
articleno = {88},
numpages = {23},
}

has_files:
gaze: false
precomputed_events: true
Expand Down
Loading
Loading