[KOGITO-9940] Add E2E test cases for platform configured with Job Service and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles #337

jordigilh · 2023-12-23T03:22:07Z

Extends the E2E tests to include coverage for the scenarios where Job Service and Data Index are deployed:

Enabled field set to false for both services and workflow in dev profile:
- With ephemeral persistence.
- With posgreSQL persistence.
Enabled field set to true for both services and workflow in prod profile:
- With ephemeral persistence.
- With posgreSQL persistence.

Each test case takes between 4-5 minutes to run, so I have limited the coverage to these 4 cases.

ricardozanini

Really nice! Just a few minor comments. Thank you!

ricardozanini · 2023-12-26T12:16:03Z

controllers/platform/services/services.go

-			corev1.ResourceCPU:    resource.MustParse("100m"),
-			corev1.ResourceMemory: resource.MustParse("256Mi"),
+			corev1.ResourceCPU:    resource.MustParse("500m"),
+			corev1.ResourceMemory: resource.MustParse("1Gi"),
 		},


Is there any justification for this change? Have you run a benchmark? @wmedvede do you have any idea regarding DI/JS consumption resources? Can we have a follow-up task to get a more approximated number? I feel like this can be too much for a default setup. Mainly when running it locally.

I got OOMKill with 256Mi for the DI. I increased it to 512Mi as well as the CPU limits to try to speed up the deployment as it takes 2:40 seconds for the container to reach ready status, which is significant more than for the Job Service (90 seconds).

It didn't help much on both accounts. I can reduce it to 512Mi and the CPU to 100m for limits if you think it aligns with your expectations.

Is there is any previous test done on the resource limits for the DI container or these values were selected on best effort?

Don't worry about changing these numbers now, we can do a benchmark later and have close numbers and a good approximation for users depending on their env.

Ack. I'll set the memory request and limit to 512Mi to avoid random OOMKills. I wonder if the startup time is caused by the JVM resizing its memory capacity as it runs the code....

I can confirm that on OCP cluster the data-service failed to start with the default values set by the operator and had to set the resources limits in the platform CR explicitly to make it start:

services: dataIndex: enabled: true podTemplate: container: image: "quay.io/kiegroup/kogito-data-index-postgresql-nightly:latest" # To be removed when stable version is released resources: limits: cpu: 500m memory: 512Mi persistence:

test/testdata/data-index_and_job-service/dev/postgreSQL/01-postgres.yaml

test/e2e/workflow_test.go

ricardozanini · 2023-12-26T12:27:02Z

Please don't merge until @domhanak's review while I'm on PTO.

ricardozanini · 2023-12-26T12:27:55Z

@jordigilh were you able to run the tests locally? It seems that we have a build problem.

jordigilh · 2023-12-27T15:48:59Z

@jordigilh were you able to run the tests locally? It seems that we have a build problem.

Yes, but with go 1.20. I see one of the functions I used is not supported in 1.19. I'll fix that.

jordigilh · 2024-01-02T15:52:59Z

@ricardozanini Running the e2e test suite locally I'm getting an error for the existing e2e tests:

$> kubectl logs -f greeting-748857df-6tztv -n sonataflow-operator-system
Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec -a "java" java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -cp "." -jar /deployments/quarkus-run.jar
INFO running in /deployments
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.lang.UnsupportedClassVersionError: org/kie/kogito/addons/quarkus/k8s/config/KubernetesAddonConfigSource has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022)
	at io.quarkus.bootstrap.runner.RunnerClassLoader.loadClass(RunnerClassLoader.java:105)
	at io.quarkus.bootstrap.runner.RunnerClassLoader.loadClass(RunnerClassLoader.java:65)
	at io.quarkus.runtime.configuration.RuntimeConfigSource.getConfigSources(RuntimeConfigSource.java:19)

Seems like the greetings container has been rebuilt for a newer version of Java.

wmedvede

LGTM

Just one comment from my side, I have rebased this locally with main and executed the e2e tests, and got this error. Looks like one test is not passing, but maybe it's my local enviroment.

SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and postgreSQL persistence and the workflow in a production profile
/home/wmedvede/development/projects/kogito/kogito-serverless-operator/test/e2e/workflow_test.go:357

[FAILED] Timed out after 300.002s.
Expected success, but got an error:
<*errors.errorString | 0xc00037cab0>:
kubectl wait pod -n test-459 -l sonataflow.org/workflow-app --for condition=Ready --timeout=30s failed with error: (exit status 1) error: no matching resources found

  {
      s: "kubectl wait pod -n test-459 -l sonataflow.org/workflow-app --for condition=Ready --timeout=30s failed with error: (exit status 1) error: no matching resources found\n",
  }

In [It] at: /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586 @ 01/12/24 16:20:22.891

Summarizing 1 Failure:
[FAIL] SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and postgreSQL persistence and the workflow in a production profile
/home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586

Ran 7 of 7 Specs in 1476.462 seconds
FAIL! -- 6 Passed | 1 Failed | 0 Pending | 0 Skipped
--- FAIL: TestE2E (1476.46s)
FAIL
FAIL command-line-arguments 1476.476s
FAIL
make: *** [Makefile:349: test-e2e] Error 1

domhanak · 2024-01-15T12:03:23Z

Pr check also complains about missing headers in some files

jordigilh · 2024-01-15T15:20:11Z

Pr check also complains about missing headers in some files

Fixed 😄

jordigilh · 2024-01-15T15:22:29Z

LGTM

Just one comment from my side, I have rebased this locally with main and executed the e2e tests, and got this error. Looks like one test is not passing, but maybe it's my local enviroment.

SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and postgreSQL persistence and the workflow in a production profile /home/wmedvede/development/projects/kogito/kogito-serverless-operator/test/e2e/workflow_test.go:357

[FAILED] Timed out after 300.002s. Expected success, but got an error: <*errors.errorString | 0xc00037cab0>: kubectl wait pod -n test-459 -l sonataflow.org/workflow-app --for condition=Ready --timeout=30s failed with error: (exit status 1) error: no matching resources found
  {
      s: "kubectl wait pod -n test-459 -l sonataflow.org/workflow-app --for condition=Ready --timeout=30s failed with error: (exit status 1) error: no matching resources found\n",
  }
In [It] at: /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586 @ 01/12/24 16:20:22.891

Summarizing 1 Failure: [FAIL] SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and postgreSQL persistence and the workflow in a production profile /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586

Ran 7 of 7 Specs in 1476.462 seconds FAIL! -- 6 Passed | 1 Failed | 0 Pending | 0 Skipped --- FAIL: TestE2E (1476.46s) FAIL FAIL command-line-arguments 1476.476s FAIL make: *** [Makefile:349: test-e2e] Error 1

Re run them one more time with success...

Ran 7 of 7 Specs in 1033.880 seconds
SUCCESS! -- 7 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestE2E (1033.88s)
PASS
ok  	command-line-arguments	1034.430s

I can only guess that the problem you found was the data index or job service pods failed to deploy. If you see this again please capture the logs from operator and the platform CR so I can troubleshoot it?

domhanak · 2024-01-16T07:57:50Z

So looks like there is a consistent fail on PR check - 3 out of 3 reruns:

SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and ephemeral persistence and the workflow in a dev profile:
  [FAILED] No container was found that could respond to the health endpoint failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!!(MISSING)w(<nil>)
  Unexpected error:
      <*errors.errorString | 0xc00051a2d0>: 
      failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!w(<nil>)
      {
          s: "failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!w(<nil>)",
      }
  occurred

I am currently not sure why this is happening, locally it passes. Should be investigated to keep the CI stable.

domhanak

LGTM, Thank you @jordigilh please rebase for the CI to execute these after kind migration

jordigilh · 2024-01-19T11:06:49Z

So looks like there is a consistent fail on PR check - 3 out of 3 reruns:

SonataFlow Operator Validate that Platform services and flows are running successfully when creating a simple workflow [It] with both Job Service and Data Index and ephemeral persistence and the workflow in a dev profile:
  [FAILED] No container was found that could respond to the health endpoint failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!!(MISSING)w(<nil>)
  Unexpected error:
      <*errors.errorString | 0xc00051a2d0>: 
      failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!w(<nil>)
      {
          s: "failed to execute curl command against health endpoint in container data-index-service:invalid character 'I' looking for beginning of value; %!w(<nil>)",
      }
  occurred

I am currently not sure why this is happening, locally it passes. Should be investigated to keep the CI stable.

It's because the e2e job has set DEBUG=true as environment value and the kubectl starts adding log entries in the kubectl exec -it command, which causes parsing issues. This is an example of what is returned from the command in a test run:

  running: kubectl --v=0 exec -t callbackstatetimeouts-7bf6f9f7f6-hg48m -n test-511 -c workflow -- curl -s localhost:8080/q/health
  I0118 18:50:33.207120   46502 log.go:194] (0x140000e6420) (0x140004a4e60) Create stream
  I0118 18:50:33.207262   46502 log.go:194] (0x140000e6420) (0x140004a4e60) Stream added, broadcasting: 1
  I0118 18:50:33.208525   46502 log.go:194] (0x140000e6420) Reply frame received for 1
  I0118 18:50:33.208535   46502 log.go:194] (0x140000e6420) (0x1400072e000) Create stream
  I0118 18:50:33.208538   46502 log.go:194] (0x140000e6420) (0x1400072e000) Stream added, broadcasting: 3
  I0118 18:50:33.209173   46502 log.go:194] (0x140000e6420) Reply frame received for 3
  I0118 18:50:33.209181   46502 log.go:194] (0x140000e6420) (0x140004f6460) Create stream
  I0118 18:50:33.209183   46502 log.go:194] (0x140000e6420) (0x140004f6460) Stream added, broadcasting: 5
  I0118 18:50:33.209632   46502 log.go:194] (0x140000e6420) Reply frame received for 5
  I0118 18:50:33.245994   46502 log.go:194] (0x140000e6420) Data frame received for 3
  I0118 18:50:33.246002   46502 log.go:194] (0x1400072e000) (3) Data frame handling
  I0118 18:50:33.246006   46502 log.go:194] (0x1400072e000) (3) Data frame sent
  {
      "status": "UP",
      "checks": [
          {
              "name": "SmallRye Reactive Messaging - liveness check",
              "status": "UP"
          },
          {
              "name": "alive",
              "status": "UP"
          },
          {
              "name": "Database connections health check",
              "status": "UP",
              "data": {
                  "<default>": "UP"
              }
          },
          {
              "name": "SmallRye Reactive Messaging - readiness check",
              "status": "UP"
          },
          {
              "name": "SmallRye Reactive Messaging - startup check",
              "status": "UP"
          }
      ]
  }I0118 18:50:33.246382   46502 log.go:194] (0x140000e6420) Data frame received for 3
  I0118 18:50:33.246389   46502 log.go:194] (0x1400072e000) (3) Data frame handling
  I0118 18:50:33.246398   46502 log.go:194] (0x140000e6420) Data frame received for 5
  I0118 18:50:33.246400   46502 log.go:194] (0x140004f6460) (5) Data frame handling
  I0118 18:50:33.247459   46502 log.go:194] (0x140000e6420) Data frame received for 1
  I0118 18:50:33.247468   46502 log.go:194] (0x140004a4e60) (1) Data frame handling
  I0118 18:50:33.247471   46502 log.go:194] (0x140004a4e60) (1) Data frame sent
  I0118 18:50:33.247475   46502 log.go:194] (0x140000e6420) (0x140004a4e60) Stream removed, broadcasting: 1
  I0118 18:50:33.247478   46502 log.go:194] (0x140000e6420) Go away received
  I0118 18:50:33.247571   46502 log.go:194] (0x140000e6420) (0x140004a4e60) Stream removed, broadcasting: 1
  I0118 18:50:33.247579   46502 log.go:194] (0x140000e6420) (0x1400072e000) Stream removed, broadcasting: 3
  I0118 18:50:33.247583   46502 log.go:194] (0x140000e6420) (0x140004f6460) Stream removed, broadcasting: 5

I noticed this while troubleshooting #322 . I removed the env variable in the job and the problem disappeared. If that variable is required for other reasons I can resort in setting it to false before calling the make test-e2e target. Let me know what you think.

ricardozanini · 2024-01-19T12:57:25Z

We should remove this DEBUG=true var for now, once we migrate to BDD we will have a much more debugging/logging approach. Thank you, @jordigilh!

ricardozanini · 2024-01-24T12:57:16Z

@jordigilh just a last generation check and we should be good!

…vice and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles Signed-off-by: Jordi Gil <[email protected]>

…ted in golang 1.19 Signed-off-by: Jordi Gil <[email protected]>

Signed-off-by: Jordi Gil <[email protected]>

…hen running the ephemeral postgres Signed-off-by: Jordi Gil <[email protected]>

…lth status since some finish quicker than the time it takes for the logic to evaluate the health endpoint and causes a test failure Signed-off-by: Jordi Gil <[email protected]>

jordigilh · 2024-01-24T23:47:32Z

@ricardozanini can we merge this PR? It's green

…vice and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles (apache#337)

jordigilh requested a review from ricardozanini as a code owner December 23, 2023 03:22

ricardozanini requested a review from wmedvede December 26, 2023 12:13

ricardozanini approved these changes Dec 26, 2023

View reviewed changes

ricardozanini requested a review from domhanak December 26, 2023 12:26

jordigilh force-pushed the kogito_9940_e2e_tests branch from 8b2d477 to ca1e99b Compare January 2, 2024 16:26

wmedvede approved these changes Jan 12, 2024

View reviewed changes

jordigilh force-pushed the kogito_9940_e2e_tests branch 2 times, most recently from 656c722 to ebb4dc3 Compare January 13, 2024 04:23

ricardozanini mentioned this pull request Jan 16, 2024

Closes #324 - Migrate e2e tests to bddframework #356

Closed

7 tasks

domhanak approved these changes Jan 19, 2024

View reviewed changes

jordigilh force-pushed the kogito_9940_e2e_tests branch 2 times, most recently from 07accbf to ed83260 Compare January 24, 2024 03:56

jordigilh force-pushed the kogito_9940_e2e_tests branch 4 times, most recently from 888df95 to d6993c1 Compare January 24, 2024 18:56

jordigilh added 3 commits January 24, 2024 15:45

[KOGITO-9940] Add E2E test cases for platform configured with Job Ser…

3c9aa63

…vice and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles Signed-off-by: Jordi Gil <[email protected]>

Change error wrapping in test as function errors.Join() is not suppor…

99a4669

…ted in golang 1.19 Signed-off-by: Jordi Gil <[email protected]>

Increase DI memory request and limits to 512Mi to avoid random OOMKills

0d71c16

Signed-off-by: Jordi Gil <[email protected]>

jordigilh added 6 commits January 24, 2024 15:45

Added headers to test data files

f3cbd74

Signed-off-by: Jordi Gil <[email protected]>

Re-estructured e2e tests for platform services

1f448f8

Signed-off-by: Jordi Gil <[email protected]>

Bootstrap e2e test cluster as a github action

799ae52

Signed-off-by: Jordi Gil <[email protected]>

Add missing headers to source files

8e1bc85

Signed-off-by: Jordi Gil <[email protected]>

Increase DI container resource limits to 200m/1Gi to avoid crashing w…

9d19e6a

…hen running the ephemeral postgres Signed-off-by: Jordi Gil <[email protected]>

Tweak platform services test to avoid checking for the workflow's hea…

e4721df

…lth status since some finish quicker than the time it takes for the logic to evaluate the health endpoint and causes a test failure Signed-off-by: Jordi Gil <[email protected]>

jordigilh force-pushed the kogito_9940_e2e_tests branch from d6993c1 to e4721df Compare January 24, 2024 20:46

ricardozanini merged commit cccb7b2 into apache:main Jan 24, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KOGITO-9940] Add E2E test cases for platform configured with Job Service and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles #337

[KOGITO-9940] Add E2E test cases for platform configured with Job Service and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles #337

jordigilh commented Dec 23, 2023

ricardozanini left a comment

ricardozanini Dec 26, 2023

jordigilh Dec 27, 2023

ricardozanini Dec 28, 2023

jordigilh Jan 2, 2024

masayag Jan 2, 2024

ricardozanini commented Dec 26, 2023

ricardozanini commented Dec 26, 2023

jordigilh commented Dec 27, 2023

jordigilh commented Jan 2, 2024

wmedvede left a comment

domhanak commented Jan 15, 2024

jordigilh commented Jan 15, 2024

jordigilh commented Jan 15, 2024

In [It] at: /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586 @ 01/12/24 16:20:22.891

domhanak commented Jan 16, 2024 •

edited

Loading

domhanak left a comment

jordigilh commented Jan 19, 2024

ricardozanini commented Jan 19, 2024

ricardozanini commented Jan 24, 2024

jordigilh commented Jan 24, 2024

[KOGITO-9940] Add E2E test cases for platform configured with Job Service and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles #337

[KOGITO-9940] Add E2E test cases for platform configured with Job Service and Data Index in a combination of scenarios with ephemeral and postgreSQL persistence in dev and production profiles #337

Conversation

jordigilh commented Dec 23, 2023

ricardozanini left a comment

Choose a reason for hiding this comment

ricardozanini Dec 26, 2023

Choose a reason for hiding this comment

jordigilh Dec 27, 2023

Choose a reason for hiding this comment

ricardozanini Dec 28, 2023

Choose a reason for hiding this comment

jordigilh Jan 2, 2024

Choose a reason for hiding this comment

masayag Jan 2, 2024

Choose a reason for hiding this comment

ricardozanini commented Dec 26, 2023

ricardozanini commented Dec 26, 2023

jordigilh commented Dec 27, 2023

jordigilh commented Jan 2, 2024

wmedvede left a comment

Choose a reason for hiding this comment

In [It] at: /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586 @ 01/12/24 16:20:22.891

domhanak commented Jan 15, 2024

jordigilh commented Jan 15, 2024

jordigilh commented Jan 15, 2024

In [It] at: /home/wmedvede/development/go/go1.20.4/src/reflect/value.go:586 @ 01/12/24 16:20:22.891

domhanak commented Jan 16, 2024 • edited Loading

domhanak left a comment

Choose a reason for hiding this comment

jordigilh commented Jan 19, 2024

ricardozanini commented Jan 19, 2024

ricardozanini commented Jan 24, 2024

jordigilh commented Jan 24, 2024

domhanak commented Jan 16, 2024 •

edited

Loading