Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

698: library API for non-path input, accept markdown and dict input #712

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Jun 25, 2024

Closes #698
Closes #599

  • add: xls2xform.convert for library users to call pyxform without needing to use files (accepts bytes/bufferedreader (from open())/strings)
    • accepts markdown input since this is widely used by pyxform
    • accepts dict to avoid needing to use internal funcs that may change (also you could more easily go from pandas/database -> dict -> pyxform -> xform; the dict use case could be improved by adding a schema or something).
  • chg: avoid writing to files unless validate=True (for ODK Validate)
    • also avoid assuming any files were written, e.g. missing_ok=True
  • chg: move xls/x_sheet_to_csv, sheet_to_csv from utils.py to xls2json_backends.py because they are backends for csv input.
  • chg: move md_to_dict from test directory into xls2json_backends.py
  • chg: refactor pyxform_test_case.py to use xls2xform.convert only, instead of internal funcs associated with md_to_dict, so that the existing tests check API stability e.g. file types, dict input, etc.
  • chg: survey.py print_xform_to_file returns xml, so now survey.to_xml() doesn't create xml twice (once itself, once via print_xform_to_file).
    • similarly, xls2xform would read the input XLSForm a 2nd time if there were external choices. Now it uses the data from the first reading.

Why is this the best possible solution? Were any other approaches considered?

Differences to #698: as shown in the new tests in test_xls2xform.py, there are a few reasonable call patterns that don't return BytesIO; and adding a new function allows more easily tailoring it to the library use case.

Was not initially planning on the markdown/dict changes but considering a) #599 has been open for a while, b) the recent review of pyxform-related projects showed there is a need for alternative input formats, or at least more flexibility, c) I figured that it would be best to support markdown/dict now so that the API was designed for that, rather than needing to potentially refactor for it later. I think the main value in pyxform is in workbook_to_json -> survey.py -> xform, so by adding the option of providing a dict, library users can do whatever deserialisation is necessary and pass the data to pyxform.

What are the regression risks?

There are some functions moved around as mentioned above, so at upgrade time, library users would need to change import locations or switch to using xls2xform.convert. Functionality should be the same.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

I think there's a good amount of info in the docstrings and tests, but a readme section could be added to make it easier to take in, if required.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- add: xls2xform.convert for library users to call pyxform without
  needing to use files (accepts bytes/file handles/strings)
  - accepts markdown input since this is widely used by pyxform
  - accepts dict to avoid needing to use internal funcs that may change
- chg: avoid writing to files unless validate=True (for ODK Validate)
  - also avoid assuming any files were written, e.g. missing_ok=True
- chg: move xls/x_sheet_to_csv, sheet_to_csv from utils.py to
  xls2json_backends.py because they are backends for csv input.
- chg: move md_to_dict from test directory into xls2json_backends.py
- chg: refactor pyxform_test_case.py to use xls2xform.convert only,
  instead of internal funcs associated with md_to_dict, so that the
  existing tests check API stability e.g. file types, dict input, etc.
@lindsay-stevens
Copy link
Contributor Author

FYI @spwoodcock

@lindsay-stevens
Copy link
Contributor Author

lindsay-stevens commented Jun 26, 2024

Forum thread for community input: https://forum.getodk.org/t/handling-xlsform-xform-conversion-in-memory/45830/1

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of trouble getting through this one! Maybe I should have ignored all the test changes initially. convert looks good. I almost recommended some parameter name changes like enketo -> enketo_validate but matching the existing xls2xform_convert feels right after all.

Couple of tiny questions/suggestions inline but I don't think they need to block merge.

pyxform/constants.py Show resolved Hide resolved
tests/test_form_name.py Show resolved Hide resolved
tests/xform_test_case/test_bugs.py Outdated Show resolved Hide resolved
@lindsay-stevens lindsay-stevens merged commit f6285f9 into XLSForm:master Jul 31, 2024
10 checks passed
@lindsay-stevens lindsay-stevens deleted the pyxform-698 branch July 31, 2024 08:01
@lognaturel
Copy link
Contributor

@lindsay-stevens pyxform-http used sheet_to_csv from utils: https://github.com/getodk/xlsform-online/pull/81/files Was that intended to be a public API or not?

@lindsay-stevens
Copy link
Contributor Author

@lognaturel it's still public, just moved to be with the other backend functions.

chg: move xls/x_sheet_to_csv, sheet_to_csv from utils.py to xls2json_backends.py because they are backends for csv input.

This PR set the scene for making parts of pyxform non-public (by offering a library API that does what most users would want to do), but didn't make changes in that regard.

@lognaturel
Copy link
Contributor

@lognaturel it's still public, just moved to be with the other backend functions.

That's a breaking change, then, and should be released as a major version bump, right? I believe that otherwise the changes are non-breaking so maybe it's worth preserving the prior APIs (calling the new functions) so it's non-breaking? Or we bend the semver rules because we think usage as a library is rare?

@lindsay-stevens
Copy link
Contributor Author

Agree it's a breaking change, and agree it's probably not going to affect many people (possibly no-one?) - the solution is to update the import path (the function bodies are unchanged). Equally, incrementing the version is OK.

@jnm jnm mentioned this pull request Nov 5, 2024
7 tasks
jnm added a commit to kobotoolbox/kpi that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move markdown to pyxform functionality out of tests and into main package
2 participants