Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] clarify/re-consider use of "n/a" in numeric columns #1938

Open
yarikoptic opened this issue Oct 3, 2024 · 3 comments · May be fixed by #1940
Open

[BUG] clarify/re-consider use of "n/a" in numeric columns #1938

yarikoptic opened this issue Oct 3, 2024 · 3 comments · May be fixed by #1940
Labels
bug Something isn't working

Comments

@yarikoptic
Copy link
Collaborator

Describe your problem in detail.

ATM specification for Tabular files https://bids-specification.readthedocs.io/en/stable/common-principles.html#tabular-files states

String values containing tabs MUST be escaped using double quotes. Missing and non-applicable values MUST be coded as n/a. Numerical values MUST employ the dot (.) as decimal separator and MAY be specified in scientific notation, using e or E to separate the significand from the exponent. TSV files MUST be in UTF-8 encoding.

So, in the best reading of it, it mandates use of explicit n/a for a missing value in any (not only "String values" column) column.
As n/a is not a standard placeholder, that unnecessarily complicates loading of such files using anything which expects numeric values for the column (e.g. onset).

Describe what you expected.

I have not investigated this further yet and do not have any specific recommendation ATM (e.g. after looking how pandas would expect to have float.nan to be defined in tsv etc). Just raising a possible discussion point.

At least we might want to reorder sentences to remove possible misassociation with string only columns, i.e. to have it

Missing and non-applicable values MUST be coded as n/a. String values containing tabs MUST be escaped using double quotes. Numerical values MUST employ the dot (.) as decimal separator and MAY be specified in scientific notation, using e or E to separate the significand from the exponent. TSV files MUST be in UTF-8 encoding.

BIDS specification section

https://bids-specification.readthedocs.io/en/latest/...

@effigies
Copy link
Collaborator

effigies commented Oct 3, 2024

Yes, n/a applies to all columns, and that is how the validator has handled it the whole time. Proposing nan or another alternative for numeric columns would not change the need for tools to work with n/a in historical datasets. I'm okay with the suggested reordering, if that clarifies things.

yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 3, 2024
@VisLab
Copy link
Member

VisLab commented Oct 4, 2024

I think tools have adapted to n/a in all columns and such a change would trigger a lot of changes.

@yarikoptic
Copy link
Collaborator Author

yarikoptic commented Oct 4, 2024

Cool, let's then plan #1940 to fix this issue with just minute tune up to "wording".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants