Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broker forces connection closure #89

Closed
JaeAeich opened this issue Jan 10, 2024 · 8 comments
Closed

fix: broker forces connection closure #89

JaeAeich opened this issue Jan 10, 2024 · 8 comments

Comments

@JaeAeich
Copy link
Contributor

Describe the bug
While looking at docker compose logs I notice below error throughout.

prowes-rabbitmq-1       | 2024-01-10 11:09:53.312203+00:00 [error] <0.1519.0> Error on AMQP connection <0.1519.0> (172.18.0.6:33766 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312203+00:00 [error] <0.1519.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312226+00:00 [error] <0.1440.0> Error on AMQP connection <0.1440.0> (172.18.0.6:33694 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312226+00:00 [error] <0.1440.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312253+00:00 [error] <0.1635.0> Error on AMQP connection <0.1635.0> (172.18.0.5:51722 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312253+00:00 [error] <0.1635.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312182+00:00 [error] <0.1643.0> Error on AMQP connection <0.1643.0> (172.18.0.5:51732 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312182+00:00 [error] <0.1643.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312238+00:00 [error] <0.1532.0> Error on AMQP connection <0.1532.0> (172.18.0.6:33792 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312238+00:00 [error] <0.1532.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312209+00:00 [error] <0.1472.0> Error on AMQP connection <0.1472.0> (172.18.0.6:33738 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312209+00:00 [error] <0.1472.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312128+00:00 [error] <0.1451.0> Error on AMQP connection <0.1451.0> (172.18.0.6:33706 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312128+00:00 [error] <0.1451.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312263+00:00 [error] <0.1475.0> Error on AMQP connection <0.1475.0> (172.18.0.6:33750 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312263+00:00 [error] <0.1475.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1569.0> Error on AMQP connection <0.1569.0> (172.18.0.6:33822 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1569.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312287+00:00 [error] <0.1469.0> Error on AMQP connection <0.1469.0> (172.18.0.6:33722 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312287+00:00 [error] <0.1469.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312311+00:00 [error] <0.1660.0> Error on AMQP connection <0.1660.0> (172.18.0.5:51740 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312311+00:00 [error] <0.1660.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312314+00:00 [error] <0.1523.0> Error on AMQP connection <0.1523.0> (172.18.0.6:33780 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312314+00:00 [error] <0.1523.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1543.0> Error on AMQP connection <0.1543.0> (172.18.0.6:33796 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1543.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312350+00:00 [error] <0.1488.0> Error on AMQP connection <0.1488.0> (172.18.0.6:33762 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312350+00:00 [error] <0.1488.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312295+00:00 [error] <0.1556.0> Error on AMQP connection <0.1556.0> (172.18.0.6:33806 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312295+00:00 [error] <0.1556.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

To Reproduce
Steps to reproduce the behavior:

  1. docker compose up --build -d
  2. docker compose logs

Additional context
Below is my compose log.
dev.txt

@uniqueg
Copy link
Member

uniqueg commented Jan 11, 2024

Hmmm, I can't reproduce this error. The same docker compose up --build -d command on the latest state of the dev branch works perfectly fine for me. All services fire up, no errors. Maybe it was just a temporary problem?

docker compose logs

EDIT: Wrong logs uploaded. See comment below for real logs.

@JaeAeich
Copy link
Contributor Author

latest commit on dev
errors

I don't think it is causing any errors or none that I can see (services fire up without errors) and same errors are in your docker compose logs. Maybe its part of the process to force closure, idk. Saw it while looking at #88 so thought I should open an issue.

If it matters which I don't think does, but here are my versions

Docker Compose version v2.21.0
Client: Docker Engine - Community
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:07:58 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:07:58 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.26
  GitCommit:        3dd1e886e55dd695541fdcd67420c2888645a495
 runc:
  Version:          1.1.10
  GitCommit:        v1.1.10-0-g18a0cb0
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
PRETTY_NAME="Ubuntu 23.10"
NAME="Ubuntu"
VERSION_ID="23.10"
VERSION="23.10 (Mantic Minotaur)"
VERSION_CODENAME=mantic
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=mantic
LOGO=ubuntu-logo

@uniqueg
Copy link
Member

uniqueg commented Jan 11, 2024

Oops, I had uploaded your logs 😆

Mine don't have these errors (trying again): docker compose logs

And my Docker versions are almost exactly the same (I have containerd 1.6.25 instead of 1.6.26). And I have Ubuntu 20.04, but I really don't think that should matter...

Can you post the output of docker ps | grep prowes? And can you confirm that you are running docker compose --build -d from branch dev with HEAD commit 25f0e59?

@JaeAeich
Copy link
Contributor Author

JaeAeich commented Jan 11, 2024

Here are my prowes containers, I spun up after destroying everything just to be sure.

5635ee26dec7   mher/flower:0.9.7       "flower flower --bro…"   56 seconds ago   Up 53 seconds   0.0.0.0:5565->5555/tcp, :::5565->5555/tcp                                                                    prowes-flower-1
44b3d1b475ec   prowes:latest           "bash -c 'cd /app/pr…"   56 seconds ago   Up 53 seconds                                                                                                                prowes-prowes-worker-1
1aa77ec22a64   prowes:latest           "bash -c 'cd /app/pr…"   56 seconds ago   Up 54 seconds   0.0.0.0:8090->8080/tcp, :::8090->8080/tcp                                                                    prowes-prowes-1
0deecc13824b   mongo:3.2               "docker-entrypoint.s…"   2 days ago       Up 54 seconds   0.0.0.0:27027->27017/tcp, :::27027->27017/tcp                                                                prowes-mongodb-1
f32fc571dc2e   rabbitmq:3-management   "docker-entrypoint.s…"   4 days ago       Up 54 seconds   4369/tcp, 5671/tcp, 15671-15672/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:5682->5672/tcp, :::5682->5672/tcp   prowes-rabbitmq-1

And yes I am using the latest commit on dev branch, below is output (git log -n 1)

25f0e59 fix: create storage dir on startup (#83)
commit 25f0e593c956fcd450f0f112c06c62ac768c77fe
Author: Alex Kanitz <[email protected]>
Date:   Sun Nov 12 13:53:22 2023 +0100

    fix: create storage dir on startup (#83)

docker-compose-logs-dev-25f0.txt

PS: containers aren't stopping, see below

CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS         PORTS                                                                                                        NAMES
5635ee26dec7   mher/flower:0.9.7       "flower flower --bro…"   8 minutes ago   Up 8 minutes   0.0.0.0:5565->5555/tcp, :::5565->5555/tcp                                                                    prowes-flower-1
44b3d1b475ec   prowes:latest           "bash -c 'cd /app/pr…"   8 minutes ago   Up 8 minutes                                                                                                                prowes-prowes-worker-1
1aa77ec22a64   prowes:latest           "bash -c 'cd /app/pr…"   8 minutes ago   Up 8 minutes   0.0.0.0:8090->8080/tcp, :::8090->8080/tcp                                                                    prowes-prowes-1
0deecc13824b   mongo:3.2               "docker-entrypoint.s…"   2 days ago      Up 8 minutes   0.0.0.0:27027->27017/tcp, :::27027->27017/tcp                                                                prowes-mongodb-1
f32fc571dc2e   rabbitmq:3-management   "docker-entrypoint.s…"   4 days ago      Up 8 minutes   4369/tcp, 5671/tcp, 15671-15672/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:5682->5672/tcp, :::5682->5672/tcp   prowes-rabbitmq-1

@uniqueg
Copy link
Member

uniqueg commented Jan 11, 2024

Really strange...

If I interpret these errors correctly, the connection to the workers drop. So if you were to send a workflow run request, it would likely not be processed. Can you confirm that?

If, instead, everything works as expected, I guess I will close that issue for now.

@JaeAeich
Copy link
Contributor Author

I am closing this, by above if you meant GET endpoint are working, then yeah! it returns runs. I am attaching logs from today again, for later if someone want to see.
logs-dev-jan14.txt

@uniqueg
Copy link
Member

uniqueg commented Jan 15, 2024

I meant the POST endpoint. It's the only one that relies on async operations leveraged by RabbitMQ and the Celery workers. It is the connection of the two that caused errors in your case, so pretty sure that wouldn't work.

@JaeAeich
Copy link
Contributor Author

@uniqueg

oh yeah 🤦‍♂️, I check my POST for run with,

curl -X POST "http://localhost:8090/ga4gh/wes/v1/runs" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "tags=" \
  -F "workflow_attachment=" \
  -F "workflow_engine_parameters=" \
  -F 'workflow_params={"input":{"class":"File","path":"https://is.muni.cz/pics/design/r/znak_MU.png"}}' \
  -F "workflow_type=CWL" \
  -F "workflow_type_version=v1.0" \
  -F "workflow_url=https://github.com/uniqueg/cwl-example-workflows/blob/master/hashsplitter-workflow.cwl"

and I seem to get an error,

{"message": "No suitable workflow engine known.", "code": "400"}

I thought that this might be my system issue or something that I am facing due to my configs or something. I spun up a VM and tried to do the same there.
In the VM (ubuntu - 23.10), I got docker compose logs similar to yours, but it still have some error.
cat vm-logs.txt | grep ERROR
vm-logs.txt

Even though there are less errors but I still got the same response for the curl (No suitable workflow engine known) command. Apart from that some unusual that I saw was on my host machine in swagger UI, I got different response for GET /runs and diff on the vm.

 jaeaeich@JaeAeich  ~  curl -X GET "http://localhost:8090/ga4gh/wes/v1/runs" -H "accept: application/json"
{
  "next_page_token": "654e0b7fed03b261bc7751f2",
  "runs": [
    {
      "run_id": "PSMBPK",
      "state": "RUNNING"
    },
    {
      "run_id": "ASN905",
      "state": "RUNNING"
    },
    {
      "run_id": "887BSR",
      "state": "RUNNING"
    },
    {
      "run_id": "JHAG1Q",
      "state": "SYSTEM_ERROR"
    },
    {
      "run_id": "9H8ZYP",
      "state": "SYSTEM_ERROR"
    }
  ]
}
jae@jae:~/proWES$ curl -X GET "http://localhost:8090/ga4gh/wes/v1/runs" -H  "accept: application/json"
{
  "next_page_token": "65a4d132af4b5f013526b300",
  "runs": [
    {
      "run_id": "WG0KAH",
      "state": "SYSTEM_ERROR"
    },
    {
      "run_id": "EW6XYQ",
      "state": "SYSTEM_ERROR"
    }
  ]
}

These are responses for host and VM respectively, I don't understand why it was happening so I thought may should have it documented. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants