Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: generator chained flows #2440

Merged
merged 12 commits into from
Oct 1, 2024
Merged

Conversation

chrismclarke
Copy link
Member

@chrismclarke chrismclarke commented Sep 25, 2024

PR Checklist

  • PR title descriptive (can be used in release notes)

Description

Fix issue where outputs generated from data_pipe or generator operations aren't processed (if output as data_pipe or generator operators)

Review Notes

As the changes impact the main parser, should ideally test updates against a content repo that has recently been synced to see if there is any content diffs. The only expected change is that data_pipe flows should no longer be stored within the generated content (these have already been used to generate the populated data_lists, so aren't useful to the app runtime)

I create a content sync for the latest sheets from master branch on the debug repo at:
IDEMSInternational/app-debug-content#88

Then, using this branch see updated content sync at:
IDEMSInternational/app-debug-content#89

Should also confirm it fixes the issues seen in #2434

Author Notes

Now that data_pipe and generator flows are no longer included in final output or git content repo, I'm wondering if we may still want to try and keep some reference to them - maybe just keeping a _generated_from property on the list itself as a references to the data_pipe or generator flow that created it?

We could also consider whether we want to keep generated outputs from data_pipe operations the generated data_list subfolder (not sure if the purpose it serves could be better-handled by the proposed _generated_from property)

Dev Notes

The core of the issue is the way that _generated flows would be saved to the local flow and then re-populated to the main list at the end of processing. This would fail to trigger any subsequent processing if the generated flow was itself a generator or data_pipe type.

Now, instead of storing lists of _generated flows the child data_pipe and generator processor take the outputs and send them back to the main flow processor to process as if they were part of the original list.

The largest part of the code change is actually just with the testing to enable checking the main parent FlowProcessor output from within an individual processor (e.g. data_pipe). This wasn't set up very well for testing as the parent FlowProcessor is tightly coupled with a caching system that writes files to disk - so a new Mock Class was created to substitute in for the jsonFile cache system (to just use an in-memory cache instead). In the future it would be best to try and decouple as much as possible processing logic from disk I/O to make it easier to test and separate concerns a bit more.

Git Issues

Closes #2434

Screenshots/Videos

If useful, provide screenshot or capture to highlight main changes


// Pass all generated flows to the back of the current processing queue so that they can be
// populated to processed hashmap and referenced from other processes as required
this.flowProcessor.deferInputProcess(flow, deferId);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change made in both data_pipe and generator parser. By pushing the outputs back into the deferred input processor it then processes them in the same way as synced flows.

@@ -77,18 +73,18 @@ export class TemplatedData {
* Main data conversion method
* Iterate over data, parse string values and nested objects and arrays recursively
*/
public parse<T>(value: T = this.initialValue) {
public parse<T>(value: T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional bug fix - because the TemplatedData would use an initial value to parse could hit an infinite loop if the initial value contains an undefined child entry (trying to recursively parse would then re-populate the initial value each time).

So have just made minor code modifications to not include the initialValue

@chrismclarke chrismclarke marked this pull request as ready for review September 25, 2024 23:17
@github-actions github-actions bot removed the fix label Sep 25, 2024
Copy link
Collaborator

@jfmcquade jfmcquade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think. Good idea to include those content PRs to reference.

I can see how a flow doesn't necessarily need to store the history of how it was created (generated and manually authored flows will be treated the same way at app runtime). But at the same time, I can see how having access to the parsed versions of the generator/data pipe sheets could be useful for debugging, and how having version control on these intermediary sheets (i.e. storing them in the git repo) might be useful. @esmeetewinkel have you found yourself debugging pipes/generators by looking at the parsed JSON version at all?

I can see how _generated_from metadata field (or similar) could be sensible for debugging, and may be sufficient vs storing the intermediary templates. As I haven't used the system much I'm not sure what would be necessary/useful.

@@ -23,13 +20,23 @@ export class GeneratorParser extends DefaultParser<FlowTypes.GeneratorFlow> {
return;
}
try {
this.flow._generated = this.generateFlows(flow, dataListRows);
// this.handleOutputs(generated);
const generated = this.generateFlows(flow, dataListRows);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking that the author could have specified a name for the output flow that is identical to the name of another output flow (or pre-existing flow) and that it wouldn't have been picked up until this point? In that case, the most recently processed flow with that name would overwrite the previous one without throwing any errors, so it could be worth handling here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct, I had originally thought that this would be caught in a previous stage (there's a comment in the flowParser.ts that says // NOTE - duplicate flows are identified up during main converter, however this (vague) comment only applies as flows are converted from the xlsxParser.

So it does mean that generated flows can directly overwrite a flow with the same name, however I'm not sure whether we should necessarily treat this as an error or not. I can imagine a use case for a child deployment using a generator to just overwrite specific files (and not rely on the slightly more advanced but potentially confusing system for overrides). Although I guess at least in that case you could use the fact that the output target likely has the same name as one of the input sources.

I think to address all this properly would probably require a fair amount more work, specifically adding placeholders for all parsed flows before running the generator to allow checking whether there will be a duplicate when generating, and then a secondary level to provide generator options to specify whether this should be allowed or not.

I'm guessing the best thing to do for now will be merge as-is and write a low-priority follow-up issue to at least document the potential issues and handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2451 for future follow-up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for making the issue. I hadn't considered the case of intentionally overwriting a flow with a generated one, so good to have that possibility captured

Copy link
Collaborator

@esmeetewinkel esmeetewinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional test passed - can confirm this fixes the issues seen in #2434

Indeed when performing a content sync on this branch all data_pipe and generator sheet jsons disappear. These (in particular the data pipe sheets) have occasionally been helpful for debugging, so it would be good if these jsons would still be available to inspect somewhere.

maybe just keeping a _generated_from property on the list itself as a references to the data_pipe or generator flow that created it?

I don't like the duplication that would happen if we stored the parsed data pipe json within every data list it generates. Storing just the data pipe ID as a _generated_from property does make sense to me (and making the parsed data pipe json accessible some other way).

We could also consider whether we want to keep generated outputs from data_pipe operations the generated data_list subfolder (not sure if the purpose it serves could be better-handled by the proposed _generated_from property)

I am quite happy that this sits in a generated subfolder, I've found it handy for debugging: Since the git changes on a generated data list are always caused by changes on a different sheet / layer, I don't inspect them in detail when running a content sync. Having these data lists sit in a generated subfolder means I can just spot check a few files and then accept the changes on the entire subfolder.

@esmeetewinkel
Copy link
Collaborator

esmeetewinkel commented Oct 1, 2024

@esmeetewinkel have you found yourself debugging pipes/generators by looking at the parsed JSON version at all?

Here's an example of a situation where looking at the parsed data_pipe and generator flows helps debugging:

I was facing this console error
image

From inspecting the data list json it was clear that some @row references had not been populated correctly
image

The data pipe / generator jsons were showing as follows
image

The issue was then resolved by adding a third set of brackets around @row.id in this data pipe generator
image

@chrismclarke
Copy link
Member Author

chrismclarke commented Oct 1, 2024

Functional test passed - can confirm this fixes the issues seen in #2434

Indeed when performing a content sync on this branch all data_pipe and generator sheet jsons disappear. These (in particular the data pipe sheets) have occasionally been helpful for debugging, so it would be good if these jsons would still be available to inspect somewhere.

maybe just keeping a _generated_from property on the list itself as a references to the data_pipe or generator flow that created it?

I don't like the duplication that would happen if we stored the parsed data pipe json within every data list it generates. Storing just the data pipe ID as a _generated_from property does make sense to me (and making the parsed data pipe json accessible some other way).

We could also consider whether we want to keep generated outputs from data_pipe operations the generated data_list subfolder (not sure if the purpose it serves could be better-handled by the proposed _generated_from property)

I am quite happy that this sits in a generated subfolder, I've found it handy for debugging: Since the git changes on a generated data list are always caused by changes on a different sheet / layer, I don't inspect them in detail when running a content sync. Having these data lists sit in a generated subfolder means I can just spot check a few files and then accept the changes on the entire subfolder.

Makes sense, in that case I'll keep the generator and data_pipe flows within the repo and can just update the code that copies to the app to exclude (as not needed at runtime). I'll also try add a reference to both input/output flows to keep track of relationships (avoiding duplicate content, just by flow type/name)

I think to make things clearer I'll merge here for now and then address that as a follow-up (created #2452)

@chrismclarke chrismclarke merged commit c0cdce2 into master Oct 1, 2024
6 checks passed
@chrismclarke chrismclarke deleted the fix/generator-chained-flows branch October 1, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix scripts Work on backend scripts and devops
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Can't use output of generated data pipe as input of another
3 participants