Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pyclient): add support for GraphQL API to 'get' method #4558

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

YpeZ
Copy link
Contributor

@YpeZ YpeZ commented Dec 11, 2024

What are the main changes you did

This PR adds functionality to the Pyclients get method to be able to retrieve data through the GraphQL API.

  • create parsing function to convert a table's metadata to a GraphQL query
  • implement the distinction between CSV and GraphQL API call in get
  • add columns parameter to get, specifying which columns need to be returned
  • add expand parameter to expand the query on reference columns to gather all columns from the referenced table
  • implement query filter on GraphQL query

In addition I added/fixed the following

How to test

  • explain here what to do to test this (or point to unit tests)

Checklist

  • updated docs in case of new feature
  • added/updated tests

@YpeZ YpeZ changed the title feat(Pyclient): add support for GraphQL API to 'get' method feat(pyclient): add support for GraphQL API to 'get' method Dec 11, 2024
@YpeZ YpeZ linked an issue Dec 11, 2024 that may be closed by this pull request
@YpeZ YpeZ marked this pull request as draft December 11, 2024 15:52
@YpeZ YpeZ self-assigned this Dec 12, 2024
YpeZ added 3 commits December 17, 2024 17:12
* added column types to new constants.py
* started creating 'get_pkeys' for metadata.py
* finished '_parse_get_table_query' method
@YpeZ
Copy link
Contributor Author

YpeZ commented Dec 18, 2024

In my latest commit I implemented a filter on the columns of the output of get, i.e. get me the values of only these column of the table.
This leads to discrepancies between the results of the GraphQL and CSV API, as the columns of a table from the GraphQL API and the CSV API may differ. In my GraphQL query building I include REFBACKs whereas the CSV API does not.

This and my expectation that the implementation of the query_filter for GraphQL will differ significantly from the CSV one makes me consider separating these methods into a method for each API, as for the user the arguments they fill in for either format will differ significantly.

@YpeZ YpeZ requested a review from dtroelofsprins January 14, 2025 15:17
@YpeZ YpeZ requested a review from davidruvolo51 January 14, 2025 15:17
@YpeZ YpeZ marked this pull request as ready for review January 14, 2025 15:17
Copy link
Contributor

@dtroelofsprins dtroelofsprins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review:

  • Looks good and makes sense

Functional tests:

Truncate table

  • Truncate table works fine
  • Reference from other table => Referenced by Exception

Get as_df = False or not included NOT APPROVED:

  • ref/ontology column values are not returned as 'country': 'US', but as 'country': {'name': 'US'}. This is very inconvenient and for sure will break existing scripts.
  • Get from an empty table returns a NoneType, where I would expect an empty list with len = 0
  • Tested with biobank_directory template (with demo-data), not able to fetch from Collections and CollectionFacts: Table with id 'DiseaseTypes' not found in schema 'DieuwkesTest' => DiseaseTypes is in another schema, but not referenced as an ontology, but as a ref. F.e. Biobanks has also refs to other schema, but these are all real ontologies and this table can be fetched.
  • Datatypes: bools go fine
  • string type columns with decimal value with zero at the end go fine
  • Empty columns are not returned (which is OK)

Get as_df = True:

  • Retrieve data from an empty tables returns and empty dataframe with len=0
  • dataTypes => NaN in bool type columns are filled with True
  • string type columns with decimal value with zero at the end still zero removed if there are no missing values in the column

Get limited amount of columns

  • as_df = True => nice filter on duplicates
  • as_df = False => doesn't filter on duplicates, which is fine I guess

@YpeZ
Copy link
Contributor Author

YpeZ commented Jan 23, 2025

@dtroelofsprins

ref/ontology column values are not returned as 'country': 'US', but as 'country': {'name': 'US'}. This is very inconvenient and for sure will break existing scripts.

This has been fixed, but for ontology/ontology_array only as ontologies are only queried by their name property. It's not possible to do this for reference columns, as the primary keys to the referenced tables differ.

Get from an empty table returns a NoneType, where I would expect an empty list with len = 0

This has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants