Replies: 3 comments
-
Hi @tenthe, I think handling nested events is one of the biggest pain points for end users in StreamPipes. I would love to see us find a good solution to this issue in general. If we manage to handle this cleanly, I think there are not many downsides, and if there are, the benefits may outweigh the downsides. Interested to hear the others thoughts. |
Beta Was this translation helpful? Give feedback.
-
HI @tenthe thanks for starting the discussion. I think in general that would ease things a lot assuming that there is an easy way to flatten events so that users are able to integrate arbitrary events from MQTT and other brokers. This would be important for users who already have defined an internal data format based on nested structures. Also, I'm wondering if there is any idea how to better support nested lists, e.g., as we see in the OI4 standard? |
Beta Was this translation helpful? Give feedback.
-
Hi all, thanks for the constructive feedback. I think there are two topics that we should clarify here.
I think Topic 1 is basically quite clear and we only have to make a few minor API decisions. I'm still quite unsure about Topic 2 as there are many things that need to be taken into account. Topic 1: Flattening Nested Event StructuresThe primary function of the flattening feature would be to convert nested event structures into a flat format, simplifying downstream processing. When a user selects this feature, the event data would be automatically flattened within the adapter before further processing. Here's a simple example of how the flattening process might work: Example:Given a nested event structure like this: {
"sensor_id": "sensor_1",
"timestamp": "2024-08-20T12:34:56Z",
"reading": {
"temperature": 25.3,
"humidity": 60.2
},
"location": {
"latitude": 40.7128,
"longitude": -74.0060
}
} The flattening process would transform this into: {
"sensor_id": "sensor_1",
"timestamp": "2024-08-20T12:34:56Z",
"reading_temperature": 25.3,
"reading_humidity": 60.2,
"location_latitude": 40.7128,
"location_longitude": -74.0060
} This flattened structure removes the complexity of nested objects. This would be the new event schema and the events would all be transformed in the same way at runtime. Open Question:
Topic 2: Handling Arrays in Nested StructuresArrays within nested structures pose a unique challenge, especially when they contain objects, and we don't know how many elements will be present. One approach could be to enumerate through the array and flatten each object with an index appended to the property names. However, this raises the question of whether such a transformation always makes sense, especially in cases where the array length varies or contains nested lists themselves. Example:Consider an event with an array of objects: {
"sensor_id": "sensor_1",
"measurements": [
{"type": "temperature", "value": 25.3},
{"type": "humidity", "value": 60.2}
]
} Flattening this could result in: {
"sensor_id": "sensor_1",
"measurements_0_type": "temperature",
"measurements_0_value": 25.3,
"measurements_1_type": "humidity",
"measurements_1_value": 60.2
} This approach preserves the data, but can lead to different structures depending on the number of array elements. We cannot cover this behavior in StreamPipes, especially when the data is stored. We always expect a fixed event schema. Open Question:
Looking forward to further feedback. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I'd like to propose a shift towards only supporting flat event structures within StreamPipes.
Currently, users have the flexibility to process both flat and nested event structures. While this flexibility is valuable, it also increases the complexity of downstream components, such as pipelines, time-series storage, and dashboards. To simplify this, I suggest removing support for nested event properties.
I further think that nested structures are mainly relevant for broker data sources like MQTT and Kafka, while for other IIoT data sources, such as OPC UA and PLCs, they are not relevant at all.
Since many components don’t fully support nested structures anyway, I believe this change won’t cause significant issues. Moreover, representing nested structures in time-series databases is particularly challenging, adding to the complexity.
To ensure a smooth transition, adapters should be designed to flatten the data before creation. We could also introduce a flatten function that automates this process for users. This way, users would still be able to connect nested event structures, but adapters will only create flat events for further processing.
Are there any potential drawbacks to this change? Does anyone have use cases where nested structures are necessary?
Looking forward to your feedback.
Beta Was this translation helpful? Give feedback.
All reactions