synchronize tests across langs, add helper makefile #767

schristley · 2024-02-26T22:30:52Z

Start building the base so we only need to edit the OpenAPI v3 spec, and the v2 spec can be generated from v3. Also clean up the test suites so there is one common set of test data files used by all languages.

Restarted as I messed up the master merge with the old PR #758

schristley · 2024-02-26T22:32:00Z

$ make

Helper commands for AIRR Standards repository

make gen-v2       -- Generate OpenAPI V2 spec from the V3 spec
make build-docs   -- Build documentation
make spec-copy    -- Copy spec files to language directories
make data-copy    -- Copy test data files to language directories
make checks       -- Run consistency checks on spec files
make tests        -- Run all language test suites
make python-tests -- Run Python test suite
make r-tests      -- Run R test suite
make js-tests     -- Run Javascript test suite

schristley · 2024-02-26T22:57:44Z

@javh @bussec Are we allowing NA to be in the rearrangement TSV? I'm reconciling the test data and for the bad_rearrangement.tsv file, the R version has an NA while the python version does not. If I try to use the R version with the NA then python crashes:

Traceback (most recent call last):
  File "/work/lang/python/tests/test_interface.py", line 59, in test_load_rearrangement
    result = airr.load_rearrangement(self.rearrangement_bad)
  File "/work/lang/python/airr/interface.py", line 103, in load_rearrangement
    df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1036, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1075, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1220, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Bool column has NA values in column 4

it seems even if we don't accept NA, this is maybe a python bug?

schristley · 2024-02-26T23:21:04Z

Catching the exception prevents the crash, but I'm not sure if that is what we want, or if we want pandas to allow NA and transform to None.

diff --git a/lang/python/airr/interface.py b/lang/python/airr/interface.py
index 590c3a7..07b194b 100644
--- a/lang/python/airr/interface.py
+++ b/lang/python/airr/interface.py
@@ -100,10 +100,15 @@ def load_rearrangement(filename, validate=False, debug=False):
     # TODO: test pandas.DataFrame.read_csv with converters argument as an alterative
     schema = RearrangementSchema
 
-    df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
-                     dtype=schema.pandas_types(), true_values=schema.true_values,
-                     false_values=schema.false_values)
-    # added to use RearrangementReader without modifying it:
+    try:
+        df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
+                         dtype=schema.pandas_types(), true_values=schema.true_values,
+                         false_values=schema.false_values)
+        # added to use RearrangementReader without modifying it:
+    except Exception as e:
+        sys.stderr.write('Error occurred while loading AIRR rearrangement file: %s\n' % e)
+        return None
+
     buffer = StringIO()  # create an empty buffer
     df.to_csv(buffer, sep='\t', index=False)  # fill buffer
     buffer.seek(0)  # set to the start of the stream

javh · 2024-03-06T00:40:02Z

The R library will accept "" (empty string), NA, or None for null values. Though, the spec officially only recognizes an empty string as a null value.

I think allowing NA to equate to None in python would be fine (though NA is valid amino acid sequence), but I think it's less of a python bug and more of an invalid bad_rearrangement.tsv file... It is supposed to be "bad", I guess. But, how bad?

schristley · 2024-03-06T05:06:47Z

It is supposed to be "bad", I guess. But, how bad?

bad, but no so bad it causes a crash!

It isn't so much about what to test in the "bad" file. I'm more worried that in a "good" file there is an NA and R accepts it, but python doesn't, and/or we get incompatibility where an R output file cannot be fed into python because it has NAs.

bcorrie · 2024-03-06T18:33:28Z

The R library will accept "" (empty string), NA, or None for null values. Though, the spec officially only recognizes an empty string as a null value.

I would suggest that on input from an AIRR file, NA/None should not be interpreted as null and this should be rejected. It is non compliant if the data has NA/None for null, no?

Also on output, it should never output NA/None for null, it should always output an empty string.

javh · 2024-03-06T21:58:17Z

Yeah, it's true that the files are non-compliant if they include NA/None. And the R and python libraries do output empty string for NA/None values.

But, NA/None tend to be the default outputs from TSV writers outside the airr reference libraries. So, it's a compromise to deal with typical TSV output.

schristley · 2024-03-11T21:24:12Z

Sounds like there are two things here. 1) change the test so it works for both R and python. That should be easy then. 2) python and R need more support for null-like values, for cross-language interoperability of AIRR TSV. That should probably be it's own issue, as it's new code to write, with the task to write additional tests to handle the null-like values.

javh · 2024-08-12T18:32:48Z

@schristley Can you remind me, did we decide to back out of the V2 conversion script, just make the V3 spec the default, and manually maintain the V2 spec?

schristley · 2024-08-12T21:17:54Z

@schristley Can you remind me, did we decide to back out of the V2 conversion script, just make the V3 spec the default, and manually maintain the V2 spec?

yes, I believe so

schristley · 2024-10-19T23:05:19Z

@javh I changed this PR to be primarily to synchronize the tests, and add the Makefile.

schristley mentioned this pull request Feb 26, 2024

Initial moves to OpenAPI v3 #758

Closed

schristley mentioned this pull request Mar 4, 2024

Add consistency checks to make sure unit test data are identical between R and python #775

Closed

make targets to copy specs and test data, centralize test data

ac5e1e1

javh force-pushed the issue-739-openapi3 branch from 38234fd to ac5e1e1 Compare April 8, 2024 17:07

javh added this to the AIRR 2.0 milestone Aug 12, 2024

schristley added 4 commits October 17, 2024 12:32

Merge branch 'master' into issue-739-openapi3

1eb5c77

Merge branch 'master' into issue-739-openapi3

dbbec46

update ubuntu

74810a4

sync tests for langs

a7e8334

schristley changed the title ~~Initial moves to OpenAPI v3~~ synchronize tests across langs, add helper makefile Oct 19, 2024

remove old test file

0bad8ca

schristley merged commit 5c6f228 into master Oct 19, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synchronize tests across langs, add helper makefile #767

synchronize tests across langs, add helper makefile #767

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

javh commented Mar 6, 2024 •

edited

Loading

schristley commented Mar 6, 2024

bcorrie commented Mar 6, 2024

javh commented Mar 6, 2024

schristley commented Mar 11, 2024

javh commented Aug 12, 2024

schristley commented Aug 12, 2024

schristley commented Oct 19, 2024

synchronize tests across langs, add helper makefile #767

synchronize tests across langs, add helper makefile #767

Conversation

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

schristley commented Feb 26, 2024

javh commented Mar 6, 2024 • edited Loading

schristley commented Mar 6, 2024

bcorrie commented Mar 6, 2024

javh commented Mar 6, 2024

schristley commented Mar 11, 2024

javh commented Aug 12, 2024

schristley commented Aug 12, 2024

schristley commented Oct 19, 2024

javh commented Mar 6, 2024 •

edited

Loading