dataset: add MECO datasets #1054

SiQube · 2025-03-22T20:54:59Z

add meco datasets, there are some precomputed_reading_measures missing but we can add those later

resolves #947

Type of change

New feature (non-breaking change which adds functionality)
New dataset

✨ Enhancements

add feature to read precomputed event R files

📀 Datasets

add MECO first wave native reader (MECOL1W1)
add MECO first wave second language reader (MECOL2W1)
add MECO second wave second language reader (MECOL2W2)

How Has This Been Tested?

passing tests/unit/datasets/datasets_test.py::test_public_dataset_registered for MECOL1W1
passing tests/unit/datasets/datasets_test.py::test_public_dataset_registered for MECOL2W1
passing tests/unit/datasets/datasets_test.py::test_public_dataset_registered for MECOL2W2
passing adjusted test for loading precomputed event files tests/unit/dataset/dataset_files_test.py::test_load_precomputed_file_unsupported_file_format
add and passing test for tests/unit/dataset/dataset_files_test.py::test_load_precomputed_file_rda_raise_value_error, which fails because the 'r_dataframe_key': 'joint.fix' is not defined.
load precomputed events for R file tests/unit/dataset/dataset_files_test.py::test_load_precomputed_file_rda

codecov · 2025-03-22T21:01:23Z

Codecov Report

Attention: Patch coverage is 93.15068% with 5 lines in your changes missing coverage. Please review.

Project coverage is 99.87%. Comparing base (8768ff9) to head (6eab3d2).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/pymovements/dataset/dataset_files.py	73.68%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main    #1054      +/-   ##
===========================================
- Coverage   100.00%   99.87%   -0.13%     
===========================================
  Files           87       91       +4     
  Lines         3818     3908      +90     
  Branches       679      683       +4     
===========================================
+ Hits          3818     3903      +85     
- Misses           0        4       +4     
- Partials         0        1       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SiQube · 2025-03-22T21:15:54Z

@dkrako don't know what is going on here...maybe you have an idea?

SiQube · 2025-03-24T08:41:24Z

possibly no wheels for pyreadr on windows and python 3.9?

src/pymovements/dataset/dataset_files.py

dkrako · 2025-03-25T17:45:17Z

possibly no wheels for pyreadr on windows and python 3.9?

bummer. how do we want to proceed?

dkrako

The descirption and the citations of the datasets are missing. Please add the python class implementation of the dataset definitions, as the datasets won't be visible in the documentation otherwise.

dkrako · 2025-03-25T20:03:23Z

This error is very strange. You specified the pyreadr dependency as:

pymovements/pyproject.toml

Line 43 in fc7ca57

"pyreadr>=0.5.2,<0.6",

There is a windows wheel for v0.5.2: https://pypi.org/project/pyreadr/0.5.2/#files
but there is none for v0.5.3: https://pypi.org/project/pyreadr/0.5.3/#files

Why does pip try to build v0.5.3 instead of downloading v0.5.2?

Maybe there's a command-line option for pip to resolve this in the github workflow?

dkrako · 2025-03-25T20:06:02Z

Found it: --prefer-binary

Maybe it has to be combined with --no-binary :pymovements: to make sure that pymovements is build from source.

Here's the line to update in the github workflow:

pymovements/.github/workflows/tests.yml

Line 61 in f8152b3

run: tox -vv --notest -e ${{ matrix.tox_env }}

SiQube · 2025-03-30T07:16:21Z

this seems to work now. I'm pretty unsure about the line I've added to tox.ini. @dkrako can you quadruple check it? I'll add the py files later

dkrako · 2025-03-30T11:51:04Z

this seems to work now. I'm pretty unsure about the line I've added to tox.ini. @dkrako can you quadruple check it? I'll add the py files later

Ah great! even better to include it in tox.ini than in the github workflow.

for more information, see https://pre-commit.ci

…to dataset-mecol1w1

SiQube · 2025-04-17T04:00:01Z

@dkrako in this PR we have two contributions, one are the three datasets. second is the logic to read R files, e.g. .rda

dkrako

Please add a full documentation on the new functionality.

SiQube · 2025-04-28T12:39:11Z

Please review the rest, I want to merge it at some point, and not account for 200 prs that happened in the meantime

dkrako · 2025-04-28T13:41:38Z

Alright, then first please incorporate the changes from:

SiQube · 2025-04-28T14:31:22Z

done, forced push to retrigger readthedocs which failed

SiQube · 2025-04-28T14:31:34Z

added changes as well

…to dataset-mecol1w1

dkrako · 2025-05-12T15:51:00Z

src/pymovements/dataset/dataset_files.py

        precomputed_reading_measure_df = pl.read_csv(data_path, **custom_read_kwargs)
+    elif data_path.suffix in r_extensions:


please document this functionality in Dataset.load_precomputed_events()

will add it after #1099 is merged due to merge conflicts

dkrako · 2025-05-12T15:51:39Z

src/pymovements/dataset/dataset_files.py

        precomputed_event_df = pl.read_csv(data_path, **custom_read_kwargs)
+    elif data_path.suffix in r_extensions:


please document this functionality in Dataset.load_precomputed_reading_measures()

will add it after #1099 is merged due to merge conflicts

dkrako · 2025-05-12T15:53:47Z

tests/files/rda_test_file.rda

if this file is a dataset related test file this should be visible in the filename.

please also make sure to not use any data from the datasets but fill it with own data.

src/pymovements/dataset/dataset_files.py

SiQube requested review from dkrako and prassepaul as code owners March 22, 2025 20:55

dkrako reviewed Mar 25, 2025

View reviewed changes

src/pymovements/dataset/dataset_files.py Show resolved Hide resolved

dkrako requested changes Mar 25, 2025

View reviewed changes

dkrako marked this pull request as draft March 26, 2025 13:35

dkrako changed the title ~~add meco data~~ dataset: add MECO datasets Mar 26, 2025

dkrako added the dataset label Mar 26, 2025

add meco data

384ee62

SiQube force-pushed the dataset-mecol1w1 branch 2 times, most recently from 0367c54 to a735986 Compare March 30, 2025 06:35

try prefer-binary

0c6d876

SiQube force-pushed the dataset-mecol1w1 branch from a735986 to 0c6d876 Compare March 30, 2025 07:07

SiQube marked this pull request as ready for review March 30, 2025 07:15

SiQube and others added 9 commits April 17, 2025 05:02

Merge remote-tracking branch 'origin/main' into dataset-mecol1w1

35d7d82

add python scripts

b866a59

add tests

7f9d4f3

add documentation

adc1b87

[pre-commit.ci] auto fixes from pre-commit.com hooks

fc24e39

for more information, see https://pre-commit.ci

add datasets to all in __init__

684c816

[pre-commit.ci] auto fixes from pre-commit.com hooks

9615731

for more information, see https://pre-commit.ci

add docstring topublic module

9d7733a

Merge branch 'dataset-mecol1w1' of github.com:aeye-lab/pymovements in…

6381079

…to dataset-mecol1w1

SiQube enabled auto-merge (squash) April 17, 2025 03:58

SiQube disabled auto-merge April 28, 2025 06:38

dkrako requested changes Apr 28, 2025

View reviewed changes

SiQube and others added 2 commits April 28, 2025 15:54

Merge branch 'main' into dataset-mecol1w1

7bcc17c

incorporate non-breaking changes

80e51db

SiQube force-pushed the dataset-mecol1w1 branch from 5a6ba35 to 80e51db Compare April 28, 2025 14:30

SiQube requested a review from dkrako April 28, 2025 15:31

SiQube added 2 commits May 6, 2025 14:55

add reading measures for L2

29440e8

Merge branch 'dataset-mecol1w1' of github.com:aeye-lab/pymovements in…

6eab3d2

…to dataset-mecol1w1

SiQube force-pushed the dataset-mecol1w1 branch from d51de2a to 6eab3d2 Compare May 6, 2025 13:09

dkrako reviewed May 12, 2025

View reviewed changes

src/pymovements/dataset/dataset_files.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset: add MECO datasets #1054

dataset: add MECO datasets #1054

SiQube commented Mar 22, 2025 •

edited

Loading

codecov bot commented Mar 22, 2025 •

edited

Loading

SiQube commented Mar 22, 2025 •

edited

Loading

SiQube commented Mar 24, 2025

dkrako commented Mar 25, 2025

dkrako left a comment

dkrako commented Mar 25, 2025

dkrako commented Mar 25, 2025

SiQube commented Mar 30, 2025

dkrako commented Mar 30, 2025

SiQube commented Apr 17, 2025

dkrako left a comment

SiQube commented Apr 28, 2025

dkrako commented Apr 28, 2025

SiQube commented Apr 28, 2025

SiQube commented Apr 28, 2025

dkrako May 12, 2025

SiQube May 20, 2025

dkrako May 12, 2025

SiQube May 20, 2025

dkrako May 12, 2025

		precomputed_reading_measure_df = pl.read_csv(data_path, **custom_read_kwargs)
		elif data_path.suffix in r_extensions:

		precomputed_event_df = pl.read_csv(data_path, **custom_read_kwargs)
		elif data_path.suffix in r_extensions:

dataset: add MECO datasets #1054

Are you sure you want to change the base?

dataset: add MECO datasets #1054

Conversation

SiQube commented Mar 22, 2025 • edited Loading

Type of change

How Has This Been Tested?

codecov bot commented Mar 22, 2025 • edited Loading

Codecov Report

SiQube commented Mar 22, 2025 • edited Loading

SiQube commented Mar 24, 2025

dkrako commented Mar 25, 2025

dkrako left a comment

Choose a reason for hiding this comment

dkrako commented Mar 25, 2025

dkrako commented Mar 25, 2025

SiQube commented Mar 30, 2025

dkrako commented Mar 30, 2025

SiQube commented Apr 17, 2025

dkrako left a comment

Choose a reason for hiding this comment

SiQube commented Apr 28, 2025

dkrako commented Apr 28, 2025

SiQube commented Apr 28, 2025

SiQube commented Apr 28, 2025

dkrako May 12, 2025

Choose a reason for hiding this comment

SiQube May 20, 2025

Choose a reason for hiding this comment

dkrako May 12, 2025

Choose a reason for hiding this comment

SiQube May 20, 2025

Choose a reason for hiding this comment

dkrako May 12, 2025

Choose a reason for hiding this comment

SiQube commented Mar 22, 2025 •

edited

Loading

codecov bot commented Mar 22, 2025 •

edited

Loading

SiQube commented Mar 22, 2025 •

edited

Loading