Get data type for value #241

m-mohr · 2021-04-27T09:09:25Z

It seems it's useful to have a type function that returns the data type of the value given.

A first use case is given here: Open-EO/openeo-python-driver#64 (comment)

The question for me is whether to let the function work on types or subtypes.

Types would just be: object, array, integer, number, boolean, null (JSON Schema basically, with no way to distinguish between for example vector-cubes or raster-cubes)
Subtypes would be all types defined in meta/subtype-schemas.json, but that seems also pretty hard to achieve as for some types it's hard to determine what it is, especially subtypes for scalars.

Maybe the solution is to go with one of those options:

Detect only types, but allow detecting subtypes for objects (and arrays?) in the same process
Detect only types, but add an additional function to detect what kind of object something is

The text was updated successfully, but these errors were encountered:

soxofaan · 2021-04-27T14:30:38Z

(I'm not sure if I should respond here or under Open-EO/openeo-python-driver#64 (comment))

I agree with @jdries comment in Open-EO/openeo-python-driver#64 (comment) : a get_type process is not that useful on its own (in the context of a process graph) and it requires a lot more (boilerplate) logic to do something useful.

Maybe it is more useful to cover use cases like we had in Open-EO/openeo-python-driver#64 with "cast" or "coerce" processes that transform "any" input in best effort fashion to a desired target type. That way the user can directly express what data type they actually want (which is the most import use case for type checking I would guess). For example:

get_geometry (like we are experimenting with in Support feature collection in read vector openeo-python-driver#64): construct a geojson object from a file path, url, a bounding box object, an existing geojson object, ...
get_array: convert given input to an array, e.g. if you pass a single int: wrap it in array
get_labeled_array: convert to labeled array, e.g. if it is a normal array, label with enumeration, if it is a JSON object, use field names as labels, ...
...

(I'm using get_ prefix here, but that's just an example, you could also standardize on cast_ or something like that)

m-mohr · 2021-04-27T15:00:52Z

I think spurring the discussion here is the better place than a merged PR.

I agree with @jdries comment in Open-EO/openeo-python-driver#64 (comment) : a get_type process is not that useful on its own (in the context of a process graph) and it requires a lot more (boilerplate) logic to do something useful.

Indeed, usually, you'd need if and eq / neq to do A or B depending on the type. I think it's likely that such things will come up more and more once UDP have been established more and people implement their own UDP.

Maybe it is more useful to cover use cases [...] with "cast" or "coerce" processes that transform "any" input in best effort fashion to a desired target type.

Of course this "best effort" needs to be documented clearly and fully implemented, but then it's not "best effort" anymore. Without clear documentation, you'll not be able to write proper tests and/or fail to pass the tests (see #204). This is an issue for interoperability.

That way the user can directly express what data type they actually want (which is the most import use case for type checking I would guess).

I'm not sure this is really correct. It is true after a certain amount of steps that you usually result in one specific outcome (return value), but I'm not sure we can solve them all with cast processes. I'd assume there are cases where you do different things depending on whether you work for example on raster or vector, number or string, ... But we need to find a lot of additional use cases to confirm this.

get_geometry (like we are experimenting with in Open-EO/openeo-python-driver#64): construct a geojson object from a file path, url, a bounding box object, an existing geojson object, ...

Such a process sounds pretty heavy-weight and is basically a replacement for a number of other processes. The question is also where the scope ends. You already mention file path (i.e. load_uploaded_file), URL (not available yet), bounding box (not available yet), geojson (direct use). One could add load from batch job result as well (load_results). That's a lot of things to document and do in a single process for which we also have other processes. The return value would be a vector data cube? GeoJSON is obviously not the answer (see #235).

get_array: convert given input to an array, e.g. if you pass a single int: wrap it in array

get_labeled_array: convert to labeled array, e.g. if it is a normal array, label with enumeration, if it is a JSON object, use field names as labels, ...

...

There would be quite a lot of potential conversions that I'm not sure whether we want to start this route.

soxofaan · 2021-04-28T12:04:16Z

Indeed, usually, you'd need if and eq / neq to do A or B depending on the type. I think it's likely that such things will come up more and more once UDP have been established more and people implement their own UDP.

That doesn't seem to align with openEO's promise to make things easy for the end user. You give them a low level get_type construct but they have to add a lot of boilerplate to make it useful. They could indeed use UDPs to hide that, but UDPs are not trivial to define, manage and use, especially for novices. The fact that you expect users to put it in UDPs illustrates that get_type is too low level for general usage.

Apart from the usability issues, which are bit subjective of course, there is also a technical challenge: "if" in openEO is currently a process (aka function), which is subtly different from a traditional if-else-construct in this case. After rewriting this paragraph over and over, I decided to fork it off to a different issue: #246

m-mohr · 2021-04-28T14:25:14Z

I agree that in the best case get_type would not be required, but I don't see a good solution yet.

m-mohr added the new process label Apr 27, 2021

m-mohr added this to the 1.1.0 milestone Apr 27, 2021

m-mohr self-assigned this Apr 27, 2021

m-mohr mentioned this issue Apr 27, 2021

Support feature collection in read vector Open-EO/openeo-python-driver#64

Merged

m-mohr mentioned this issue Apr 27, 2021

load_result: Align with load_collection #220

Closed

soxofaan mentioned this issue Apr 28, 2021

if: branching behavior? #246

Closed

m-mohr modified the milestones: 1.1.0, 1.2.0 May 18, 2021

soxofaan mentioned this issue May 19, 2021

define aggregate_bbox method Open-EO/openeo-python-client#203

Open

m-mohr removed their assignment Jun 28, 2021

m-mohr modified the milestones: 1.2.0, 1.3.0 Oct 25, 2021

m-mohr modified the milestones: 1.3.0, 2.0.0, 2.1.0 Feb 1, 2023

m-mohr modified the milestones: 2.1.0, future Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get data type for value #241

Get data type for value #241

m-mohr commented Apr 27, 2021

soxofaan commented Apr 27, 2021

m-mohr commented Apr 27, 2021 •

edited

Loading

soxofaan commented Apr 28, 2021

m-mohr commented Apr 28, 2021

Get data type for value #241

Get data type for value #241

Comments

m-mohr commented Apr 27, 2021

soxofaan commented Apr 27, 2021

m-mohr commented Apr 27, 2021 • edited Loading

soxofaan commented Apr 28, 2021

m-mohr commented Apr 28, 2021

m-mohr commented Apr 27, 2021 •

edited

Loading