-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[issue-368] knative integration with DataIndex and JobService #467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this implementation. I'm sorry for the number of comments, but I guess you're getting familiar with the code base.
Let's try not to change the base API for object management. The client
reference is cached for use across the code base.
To pass the file generation check, make sure to run:
|
23e87a9
to
55dcb24
Compare
Thanks! |
3c95aac
to
e6b4601
Compare
Hi @jianrongzhang89 , this starts to look good! Some results here: Case1: DI, JS and workflows takes eventing configuration from the platform
NAME BROKER SINK AGE CONDITIONS READY REASON
callbackstatetimeouts-sb SinkBinding sinkbindings.sources.knative.dev broker:default True @jianrongzhang89 to be continued tomorrow. |
@jianrongzhang89 I'll do another review round tomorrow. |
9832452
to
5153141
Compare
ba31991
to
1600377
Compare
Hi @jianrongzhang89 I was executing the following usecase in Openshift (since before I worked with minikube only) and I can see the following weird behaviour.
But, if I wait some time, I start to see the workflow and the JS restarting forever. See some cases here: |
@wmedvede this issue is now fixed. Please check again.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good! I think we can turn a few knobs here and there and it should be ready to go for QE to verify.
35bffcc
to
13b1bbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move on now to validate this PR. @domhanak can you take a look?
13b1bbb
to
54c472d
Compare
Hi @jianrongzhang89 , here goes some more test results with the latest changes: Case 2) Is still failing After deploying the DI, JS, and the workflow, we have same outcome as before:
Case 5) is now working with the following considerations: Trigger names collision: When we use the SFCP, with the cluster-wide instance of the DI, JS, and Broker, all the triggers must be created in the namespace of the SFP that defines these objects. (in the example case5-kn-eventing). This is necessary from the point of view of Knative Eventing. And thus, if we deploy a workflow "callbackstatetimeouts" in namespace1, a trigger with the name callbackstatetimeouts-callbackevent-trigger will be created in "case5-kn-eventing". Now, if we deploy "callbackstatetimeouts" in nampespace2, the operator will try to create callbackstatetimeouts-callbackevent-trigger in "case5-kn-eventing" and will produce an error. The error only happens of course when we use that cluster-wide configuration. Well, considering that we are doing work in parallel, to support the same workflow name to be deployed in different namespaces, I think this trigger issue, must be fixed as part of this PR. Not too much really. note: we should include the namespace in the trigger name only for workflows that are deployed in a different ns I think. Trigger names length: It looks like the Trigger names, IDK why, must be no longer than 63 characters. With a larger name, it looks like the trigger is created, but, it doesn't work. And, by querying triggers with "kn trigger list", we can see the following output.
Considering that we are not that far to reach that limit, think we must:
Mabye the rational for that limit is this: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#:~:text=in%20RFC%201035.-,This%20means%20the%20name%20must%3A,start%20with%20an%20alphabetic%20character
On the other hand, the SinkBinding names looks to support 253 characters like other k8s objects. |
@wmedvede @jianrongzhang89 For the naming problem, let's use |
For case 2, there is no broker defined in the platform's spec.inventing field and the workflow does not have sink defined. In this case, based on our previous discussions, the test is intended to fail. Please see the ADR https://docs.google.com/document/d/1UIfprTNr1fKNhc7ngvbPDzSFdePg-_GPtxxvqsLi2NA/edit?usp=sharing use case #1.1.
|
@ricardozanini @wmedvede this is a great idea. This limitation is indeed required by the Knative eventing itself, as it uses a label with the trigger name in the subscription objects. I updated the PR with a slightly different implementation using the knative's function ChildName() to generate the trigger name. It is based on the parent object (data index, job index or workflow)'s UID and make sure that the generated name won't be longer that 63 characters. Here are an example of the trigger names: Since these triggers have labels or owner references that indicate which workflows (or dataindex or job service's) they belong to already, it does not seem to me that we need to add the information to workflow's status. Please let me know if I missed something and this is really required. Thank you. |
@jianrongzhang89 agreed with the naming approach, many thanks for looking into this! Regarding with the test failure, please see my changes here: https://github.com/apache/incubator-kie-kogito-serverless-operator/pull/487/files#diff-4782f3846c145011deb7edc727cc7dd3016d1683615d1f426ce246e57908daf2 It should help fix this bug and have access to |
It's, but we have to add an event or a clear status to the object alerting users about this behavior. |
@ricardozanini the sonataflow status already has error message in the condition: Is there anything additional needed? |
829321e
to
1946ca7
Compare
@jianrongzhang89 please squash these commits so we have only one to merge. And your rebase will be way easier. |
6e3ec27
to
2350a95
Compare
2d9d955
to
e0ca64a
Compare
Hi @jakubschwan, @domhanak , I think this is ready for you guys try this from the execution perspective. Thanks! |
Guys, @jianrongzhang89 @ricardozanini I have executed most of the use cases in https://github.com/flows-examples/techpreview2/tree/0ac3f5285fac169bafa0ae84661e6c608e88283c/platforms/data_index_and_jobservice_as_platform_service_postgresql_persistence_knative_eventing and in general I'd say that most of issues are fixed and working. Great job @jianrongzhang89! However, I have identified the following situation that looks not good, specially considering that this is probably one of the most frequent setups to use by Parodos. Full executable usecase here: Summarized usecase:
After some time, we try to delete the WFs created before.
We can query the current workflows and see it all the time. kubectl get workflows -n case5-kn-eventing-workflows NAME PROFILE VERSION URL READY REASON
@jianrongzhang89 Would you mind try to reproduce? |
@jakubschwan @domhanak Would you mind guys double check this execution so see if you get same results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes in PR looks good to me. Now reviewing the execution of usecases.
@wmedvede @ricardozanini the workflow deletion hanging issue is now fixed with a minor code change. Could you please test again? Thanks. |
db0c4bd
to
5b0adea
Compare
5b0adea
to
0d6c3d1
Compare
…rkflow deletion hanging issue
0d6c3d1
to
728b8ea
Compare
Implements the Knative integration for DataIndex and JobService as well as the workflow as outlined in the ADR:
https://docs.google.com/document/d/1UIfprTNr1fKNhc7ngvbPDzSFdePg-_GPtxxvqsLi2NA/edit#heading=h.ds8q4xtkmu64
Unit test cases are updated. See the ADR for common test cases:
https://docs.google.com/document/d/1UIfprTNr1fKNhc7ngvbPDzSFdePg-_GPtxxvqsLi2NA/edit?usp=sharing
Motivation for the change:
Knative integration
Fix #368
Checklist
How to backport a pull request to a different branch?
If something goes wrong, the author will be notified and at this point a manual backporting is needed.