Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCM dag failure: environment_new.txt file missing #253

Open
venkataanil opened this issue Sep 6, 2022 · 2 comments
Open

OCM dag failure: environment_new.txt file missing #253

venkataanil opened this issue Sep 6, 2022 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@venkataanil
Copy link
Collaborator

venkataanil commented Sep 6, 2022

environment.txt file is missing though it is copied (dag log http://airflow.apps.sailplane.perf.lab.eng.rdu2.redhat.com/log?dag_id=ocm&task_id=api-load&execution_date=2022-08-26T00%3A00%3A00%2B00%3A00 )

[2022-09-01, 20:00:25 EDT] {subprocess.py:92} INFO - cat: /tmp/environment_new.txt: No such file or directory
[2022-09-01, 20:00:25 EDT] {subprocess.py:92} INFO - cat /tmp/environment_new.txt
[2022-09-01, 20:00:25 EDT] {subprocess.py:92} INFO - cat: /tmp/environment_new.txt: No such file or directory
[2022-09-01, 20:00:25 EDT] {subprocess.py:92} INFO - Creating aws key with admin user for OCM testing

Looks like https://github.com/cloud-bulldozer/airflow-kubernetes/blob/master/dags/nocp/scripts/run_ocm_benchmark.sh#L27 is removing all the files in /tmp folder.

This has happend earlier when dag is run through auto schedule (
http://airflow.apps.sailplane.perf.lab.eng.rdu2.redhat.com/log?dag_id=ocm&task_id=api-load&execution_date=2022-08-19T00%3A00%3A00%2B00%3A00 ).

However not happend when triggered manually (http://airflow.apps.sailplane.perf.lab.eng.rdu2.redhat.com/log?dag_id=ocm&task_id=api-load&execution_date=2022-08-26T05%3A45%3A17.720087%2B00%3A00 ).

@venkataanil
Copy link
Collaborator Author

venkataanil added a commit to venkataanil/airflow-kubernetes that referenced this issue Nov 8, 2022
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
cloud-bulldozer#253
dry923 pushed a commit that referenced this issue Nov 8, 2022
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
#253
@venkataanil
Copy link
Collaborator Author

PR  #269 got merged.

Still today's CI run failed.

Errors in the log -

[2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - Cloning into '/tmp/c01bb686-api-load-20221116'... [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - error: could not lock config file /tmp/c01bb686-api-load-20221116/.git/config: No such file or directory [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - fatal: could not set 'core.repositoryformatversion' to '0' [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - -bash: line 6: cd: /tmp/c01bb686-api-load-20221116: No such file or directory [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - Building the binary [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - go mod download: no modules specified (see 'go help mod download') [2022-11-16, 05:00:20 EST] {subprocess.py:92} INFO - make: *** No targets specified and no makefile found. Stop.

Very strange is scenario happening here. 

Looks "mkdir /tmp/{UUID} i.e mkdir /tmp/c01bb686-api-load-20221116" failed, triggering failure of cloning ocm-api-load.

Strange part is, code inside run_ocm_benchmark is showing "c01bb686-api-load-20221116" value for UUID where as "cat /tmp/environment_new.txt" showing "303e39cf-api-load-20221116" value for UUID.

I can login to system and see /tmp/303e39cf-api-load-20221116 with ocm-api-load code and even see a built binary (so make command was issued by CI on it). Dates for these files are matching to this run (so its not stale old code)

[root@airflow-ocm-jumphost 303e39cf-api-load-20221116]# ls -al --full-time build/
total 30260
drwxr-xr-x.  2 root root       27 2022-11-16 10:00:23.335548381 +0000 .
drwxr-xr-x. 10 root root     4096 2022-11-16 10:00:23.373548300 +0000 ..
-rwxr-xr-x.  1 root root 30978423 2022-11-16 10:00:23.335548381 +0000 ocm-load-test

[root@airflow-ocm-jumphost 303e39cf-api-load-20221116]# ls -al --full-time 
total 152
drwxr-xr-x. 10 root root  4096 2022-11-16 10:00:23.373548300 +0000 .
drwxrwxrwt.  5 root root   133 2022-11-16 11:20:00.055189395 +0000 ..
-rwxr-xr-x.  1 root root 27457 2022-11-16 10:00:20.588554213 +0000 automation.py
drwxr-xr-x.  2 root root    27 2022-11-16 10:00:23.335548381 +0000 build
drwxr-xr-x.  3 root root    23 2022-11-16 10:00:20.588554213 +0000 ci
drwxr-xr-x.  2 root root    59 2022-11-16 10:00:20.589554211 +0000 cmd
-rw-r--r--.  1 root root  1800 2022-11-16 10:00:20.589554211 +0000 config.example.yaml
-rw-r--r--.  1 root root   965 2022-11-16 10:00:20.588554213 +0000 Dockerfile
drwxr-xr-x.  8 root root   163 2022-11-16 10:00:20.592554204 +0000 .git
drwxr-xr-x.  3 root root    45 2022-11-16 10:00:20.588554213 +0000 .github
-rw-r--r--.  1 root root   135 2022-11-16 10:00:20.588554213 +0000 .gitignore
-rw-r--r--.  1 root root  6590 2022-11-16 10:00:20.589554211 +0000 go.mod
-rw-r--r--.  1 root root 69130 2022-11-16 10:00:20.590554209 +0000 go.sum
drwxr-xr-x.  2 root root    78 2022-11-16 10:00:20.590554209 +0000 hack
drwxr-xr-x.  2 root root    56 2022-11-16 10:00:20.590554209 +0000 image_resources
-rw-r--r--.  1 root root 11358 2022-11-16 10:00:20.588554213 +0000 LICENSE
-rw-r--r--.  1 root root  2430 2022-11-16 10:00:20.588554213 +0000 Makefile
drwxr-xr-x. 10 root root   117 2022-11-16 10:00:20.592554204 +0000 pkg
-rw-r--r--.  1 root root 10417 2022-11-16 10:00:20.588554213 +0000 README.md
-rw-r--r--.  1 root root    68 2022-11-16 10:00:20.592554204 +0000 requirements.txt

But last run didn't have this strange issue (i.e UUID is same while cloning and also inside /tmp/environment_new.txt) i.e

[2022-11-09, 05:00:20 EST] {subprocess.py:92} INFO - Cloning into '/tmp/a08b9445-api-load-20221109'... [2022-11-09, 05:00:20 EST] {subprocess.py:92} INFO - Building the binary [2022-11-09, 05:00:21 EST] {subprocess.py:92} INFO - go build -o build/ocm-load-test -ldflags "-X github.com/cloud-bulldozer/ocm-api-load/pkg/cmd.BuildVersion=v0.4.0 -X github.com/cloud-bulldozer/ocm-api-load/pkg/cmd.BuildCommit=5fb7304 -X github.com/cloud-bulldozer/ocm-api-load/pkg/cmd.BuildDate=20221109.100021" cmd/ocm-load-test.go [2022-11-09, 05:00:23 EST] {subprocess.py:92} INFO - cat /tmp/environment_new.txt

[2022-11-09, 05:00:23 EST] {subprocess.py:92} INFO - UUID=a08b9445-api-load-20221109

rsevilla87 pushed a commit to rsevilla87/airflow-kubernetes that referenced this issue Jan 12, 2023
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
cloud-bulldozer#253
@afcollins afcollins added the bug Something isn't working label Feb 15, 2023
rsevilla87 pushed a commit to rsevilla87/airflow-kubernetes that referenced this issue Apr 26, 2023
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
cloud-bulldozer#253
rsevilla87 pushed a commit to rsevilla87/airflow-kubernetes that referenced this issue May 10, 2023
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
cloud-bulldozer#253
rsevilla87 pushed a commit to rsevilla87/airflow-kubernetes that referenced this issue Jul 10, 2023
Dag run is successful when triggered manually and failing only
when auto scheduled. I got succesful run in my playground with
these changes, so pushing a PR to run on airflow playground.

More details about this issue -
cloud-bulldozer#253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants