Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get data type for value #241

Open
m-mohr opened this issue Apr 27, 2021 · 4 comments
Open

Get data type for value #241

m-mohr opened this issue Apr 27, 2021 · 4 comments
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2021

It seems it's useful to have a type function that returns the data type of the value given.

A first use case is given here: Open-EO/openeo-python-driver#64 (comment)

The question for me is whether to let the function work on types or subtypes.

Types would just be: object, array, integer, number, boolean, null (JSON Schema basically, with no way to distinguish between for example vector-cubes or raster-cubes)
Subtypes would be all types defined in meta/subtype-schemas.json, but that seems also pretty hard to achieve as for some types it's hard to determine what it is, especially subtypes for scalars.

Maybe the solution is to go with one of those options:

  1. Detect only types, but allow detecting subtypes for objects (and arrays?) in the same process
  2. Detect only types, but add an additional function to detect what kind of object something is
@soxofaan
Copy link
Member

(I'm not sure if I should respond here or under Open-EO/openeo-python-driver#64 (comment))

I agree with @jdries comment in Open-EO/openeo-python-driver#64 (comment) : a get_type process is not that useful on its own (in the context of a process graph) and it requires a lot more (boilerplate) logic to do something useful.

Maybe it is more useful to cover use cases like we had in Open-EO/openeo-python-driver#64 with "cast" or "coerce" processes that transform "any" input in best effort fashion to a desired target type. That way the user can directly express what data type they actually want (which is the most import use case for type checking I would guess). For example:

  • get_geometry (like we are experimenting with in Support feature collection in read vector openeo-python-driver#64): construct a geojson object from a file path, url, a bounding box object, an existing geojson object, ...
  • get_array: convert given input to an array, e.g. if you pass a single int: wrap it in array
  • get_labeled_array: convert to labeled array, e.g. if it is a normal array, label with enumeration, if it is a JSON object, use field names as labels, ...
  • ...

(I'm using get_ prefix here, but that's just an example, you could also standardize on cast_ or something like that)

@m-mohr
Copy link
Member Author

m-mohr commented Apr 27, 2021

I think spurring the discussion here is the better place than a merged PR.

I agree with @jdries comment in Open-EO/openeo-python-driver#64 (comment) : a get_type process is not that useful on its own (in the context of a process graph) and it requires a lot more (boilerplate) logic to do something useful.

Indeed, usually, you'd need if and eq / neq to do A or B depending on the type. I think it's likely that such things will come up more and more once UDP have been established more and people implement their own UDP.

Maybe it is more useful to cover use cases [...] with "cast" or "coerce" processes that transform "any" input in best effort fashion to a desired target type.

Of course this "best effort" needs to be documented clearly and fully implemented, but then it's not "best effort" anymore. Without clear documentation, you'll not be able to write proper tests and/or fail to pass the tests (see #204). This is an issue for interoperability.

That way the user can directly express what data type they actually want (which is the most import use case for type checking I would guess).

I'm not sure this is really correct. It is true after a certain amount of steps that you usually result in one specific outcome (return value), but I'm not sure we can solve them all with cast processes. I'd assume there are cases where you do different things depending on whether you work for example on raster or vector, number or string, ... But we need to find a lot of additional use cases to confirm this.

  • get_geometry (like we are experimenting with in Open-EO/openeo-python-driver#64): construct a geojson object from a file path, url, a bounding box object, an existing geojson object, ...

Such a process sounds pretty heavy-weight and is basically a replacement for a number of other processes. The question is also where the scope ends. You already mention file path (i.e. load_uploaded_file), URL (not available yet), bounding box (not available yet), geojson (direct use). One could add load from batch job result as well (load_results). That's a lot of things to document and do in a single process for which we also have other processes. The return value would be a vector data cube? GeoJSON is obviously not the answer (see #235).

  • get_array: convert given input to an array, e.g. if you pass a single int: wrap it in array
  • get_labeled_array: convert to labeled array, e.g. if it is a normal array, label with enumeration, if it is a JSON object, use field names as labels, ...
  • ...

There would be quite a lot of potential conversions that I'm not sure whether we want to start this route.

@soxofaan
Copy link
Member

Indeed, usually, you'd need if and eq / neq to do A or B depending on the type. I think it's likely that such things will come up more and more once UDP have been established more and people implement their own UDP.

That doesn't seem to align with openEO's promise to make things easy for the end user. You give them a low level get_type construct but they have to add a lot of boilerplate to make it useful. They could indeed use UDPs to hide that, but UDPs are not trivial to define, manage and use, especially for novices. The fact that you expect users to put it in UDPs illustrates that get_type is too low level for general usage.

Apart from the usability issues, which are bit subjective of course, there is also a technical challenge: "if" in openEO is currently a process (aka function), which is subtly different from a traditional if-else-construct in this case. After rewriting this paragraph over and over, I decided to fork it off to a different issue: #246

@m-mohr
Copy link
Member Author

m-mohr commented Apr 28, 2021

I agree that in the best case get_type would not be required, but I don't see a good solution yet.

@m-mohr m-mohr modified the milestones: 1.1.0, 1.2.0 May 18, 2021
@m-mohr m-mohr removed their assignment Jun 28, 2021
@m-mohr m-mohr modified the milestones: 1.2.0, 1.3.0 Oct 25, 2021
@m-mohr m-mohr modified the milestones: 1.3.0, 2.0.0, 2.1.0 Feb 1, 2023
@m-mohr m-mohr modified the milestones: 2.1.0, future Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants