Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value validation of Field classes needs to be improved #1601

Open
Ariana-B opened this issue Jun 19, 2024 · 0 comments
Open

Value validation of Field classes needs to be improved #1601

Ariana-B opened this issue Jun 19, 2024 · 0 comments

Comments

@Ariana-B
Copy link
Contributor

The ability to validate a field value is inconsistent between Field classes, as is the moment at which a field's value is evaluated (and by extension, the source of the resultant error).
PgDocField and inheriting classes have a parse_value method which, when overwritten by non-string fields (e.g. IntDocField, DateDocField, etc, but not SimpleDocField) so as to cast the value to a specific type, will raise a ValueError or similar depending on the library used. NativeField, however, has no such method.
Unless external logic explicitly calls parse_value, such as in datacube-explorer when handling url queries, it is not invoked until extract or evaluate is called. For all Fields, this then means that an invalid value often causes a sqlalchemy error.
The time field is typically an outlier in this regard, as it more often gets passed to some sort of datetime function before it gets the chance to be extracted/evaluated via DateDocField.

These discrepancies can be highlighted by comparing results when calling dc.find_datasets and dc.index.datasets.search with invalid values for fields of different types.

import datacube
dc = datacube.Datacube()
dc.find_datasets(product="ga_ls_wo_3", limit=1, time="asdf")
> ParserError: Unknown string format: asdf present at position 0

(via pandas_to_datetime(t, utc=True, infer_datetime_format=True).to_pydatetime() in Query)
Compare to the same query when calling search directly:

list(dc.index.datasets.search(product="ga_ls_wo_3", limit=1, time="asdf"))
> DataError: (psycopg2.errors.InvalidDatetimeFormat) invalid input syntax for type timestamp: "asdf"

We see similar results with spatial fields such as lat. For most other fields though, both find_datasets and search raise the same error:

dc.find_datasets(product="ga_ls_wo_3", limit=1, dataset_maturity=1)
> ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: text = integer

Providing a SimpleDocField as a Range also has differing outcomes:

dc.find_datasets(product="ga_ls_wo_3", limit=1, dataset_maturity=["asdf", "asdf"])
> NotImplementedError: Simple field between expression

list(dc.index.datasets.search(product="ga_ls_wo_3", limit=1, dataset_maturity=["asdf", "asdf"]))
> []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant