Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When uploading: error doing PATCH request to data gateway #10156

Closed
yuri-becker opened this issue Sep 25, 2024 · 14 comments
Closed

When uploading: error doing PATCH request to data gateway #10156

yuri-becker opened this issue Sep 25, 2024 · 14 comments

Comments

@yuri-becker
Copy link

Describe the bug

File uploads via the web ui fail and the server has a high CPU usage without any user activity.

Steps to reproduce

Try to upload a file via the web ui.

Expected behavior

The upload succeedes and the file is available.

Actual behavior

Upload fails with Unknown Error. For other users, the upload seemes to have succeded, but the file is grayed out and not available.
The log the contains entries like this:

{"level":"error","service":"ocdav","name":"com.owncloud.web.ocdav","traceid":"8ff9c83be64eddf7f24fc1f19a3c2769","request-id":"6dde09b8-4611-4743-a38e-bd8605657e00","path":"/public/POcnTlUcNrxpUXb/<redacted>","error":"Patch \"https://cloud.catboy.house/data/<redacted>\": context canceled","time":"2024-09-25T06:49:43Z","message":"error doing PATCH request to data gateway"}

Also, this log keeps popping up

{"level":"error","service":"postprocessing","uploadID":"da362552-3dfb-4952-a535-89226a610a0f","error":"Failed to delete data: nats: timeout","time":"2024-09-25T07:27:18Z","message":"cannot delete upload"}

Setup

The server is managed via Ansible.
The container setup is this:

- name: 🚀 Starts container
  community.docker.docker_container:
    name: "{{ owncloud_container_name }}"
    image: owncloud/ocis
    state: started
    restart_policy: unless-stopped
    ports:
      - "{{ docker_gateway }}:{{ owncloud_service_port }}:9200"
    entrypoint:
      - /bin/sh
    command: ["-c", "ocis init || true; ocis server"]
    env:
      OCIS_INSECURE: "false"
      OCIS_URL: "https://{{ owncloud_domain }}"
      PROXY_HTTP_ADDR: "0.0.0.0:9200"
      PROXY_TLS: "false"
      OCIS_LOG_LEVEL: "warn"
      OCIS_ADD_RUN_SERVICES: "notifications"
      NOTIFICATIONS_SMTP_HOST: "{{ smtp_server }}"
      NOTIFICATIONS_SMTP_PORT: "{{ smtp_port }}"
      NOTIFICATIONS_SMTP_SENDER: "{{ owncloud_smtp_sender }}"
      NOTIFICATIONS_SMTP_USERNAME: "{{ owncloud_smtp_user }}"
      NOTIFICATIONS_SMTP_PASSWORD: "{{ owncloud_smtp_password }}"
      NOTIFICATIONS_SMTP_INSECURE: "false"
    user: "{{ owncloud_uid }}:{{ app_gid }}"
    volumes:
      - "{{ owncloud_volume_config }}:/etc/ocis"
      - "{{ owncloud_volume_data }}:/var/lib/ocis"

The reverse proxy (caddy) has this setup for owncloud:

cloud.catboy.house {
    reverse_proxy 172.17.0.1:28946 {
        request_buffers 256k
        response_buffers 256k
    }
}

The complete setup is here: https://github.com/yuri-becker/catboy-house/tree/main/roles/owncloud

Additional context

I assume this is some kind of permission issue. Is OCIS having issues with changing the user?
Uploading via the client seems to work, or seemed to have worked.

@rhafer
Copy link
Contributor

rhafer commented Sep 25, 2024

@yuri-becker I guess we need more information

  • What ocis release are running
  • Where are you uploading to? Is that a public link share? Or is that a user uploading to the home space? (the logs you pasted look like it's uploading to a public link)
  • Please set OCIS_LOG_LEVEL=debug and attach the logs if possible

I assume this is some kind of permission issue. Is OCIS having issues with changing the user?

If that user has write permissions on the volumes/directories it shouldn't.

@yuri-becker
Copy link
Author

@rhafer Thanks for your reply.

  • I'm running 5.0.7 and the web client says it is on 8.0.5.
  • In this case, it's uploading to a space with a public link, but it also happens when uploading to my personal files.
  • My current logs are attached owncloud_logs.txt
  • The user has write permissions and other read/write operations work fine.

@rhafer
Copy link
Contributor

rhafer commented Sep 25, 2024

Hm, there's tons of nats related errors in the logs (timeout, warning about dropped messages). Something seems to be really weird there. Is that server under constant high CPU load or just when you're trying to upload?

Also, can you check what ocis storage-users uploads sessions says? Are the any sessions stuck? Maybe it helps to clean them up. (See https://doc.owncloud.com/ocis/5.0/deployment/services/s-list/storage-users.html#sessions-command for details).

Could you by chance try if a clean setup of ocis 6.4.0 (using separate volumes) shows the same behavior on your system?

BTW, was the behavior broken from the beginning? Or could you sucessfully upload files at some point before?

@yuri-becker
Copy link
Author

yuri-becker commented Sep 25, 2024

Yea, OCIS is producing a high CPU load constantly without user interaction, possibly because of re-trying to delete uploads over and over. ocis storage-users uploads sessions outputs me a ton of sessions. Clearing them didn't help.
On a fresh install, the upload worked after a few tries.

@rhafer
Copy link
Contributor

rhafer commented Sep 25, 2024

Yea, OCIS is producing a high CPU load constantly without user interaction, possibly because of re-trying to delete uploads over and over. ocis storage-users uploads sessions outputs me a ton of sessions. Clearing them didn't help.

Does that mean the sessions are still visible after ocis storage-users uploads sessions --clean?

To get the nats servcie back into a sane state you could try to clear the nats folder in you data volume while ocis is stopped. And then restart ocis, that will re-initialize the nats directory.

On a fresh install, the upload worked after a few tries.

What does "after a few tries" mean here? Which release is that fresh install using?

Also what kind of machine is this running on? How much RAM what kind of CPU, storage, etc.

@yuri-becker
Copy link
Author

yuri-becker commented Sep 25, 2024

Does that mean the sessions are still visible after ocis storage-users uploads sessions --clean?

The table was empty afterwards.

To get the nats servcie back into a sane state you could try to clear the nats folder in you data volume while ocis is stopped. And then restart ocis, that will re-initialize the nats directory.

That seemed to have fixed it temporarily. Once i try to upload a "larger" file (19MB), it goes back to being broken again.

What does "after a few tries" mean here? Which release is that fresh install using?

Fails twice, then upload succeeds on the third try. Then fails twice again and succeeds on the third try once more. That was only a <1MB file tho.

Also what kind of machine is this running on? How much RAM what kind of CPU, storage, etc.

I have 43 TB free storage, 126GB of RAM (of which 14GB is used) and a Xeon E5-1650 v3 (6 cores at 3.8 Ghz).

EDIT: The fresh install was using the same release, I'm running OCIS in Docker and it should pull the latest version daily.

@rhafer
Copy link
Contributor

rhafer commented Sep 25, 2024

EDIT: The fresh install was using the same release, I'm running OCIS in Docker and it should pull the latest version daily.

We've recently introduced the so-called rolling releases which we publish roughly every 3 weeks (see https://owncloud.dev/ocis/release_roadmap/#release-types) . They go to a different docker repo: https://hub.docker.com/r/owncloud/ocis-rolling it might be helpful to check if the issue still appears with that latest release from there (which is 6.4.0)

@yuri-becker
Copy link
Author

The issue still persists on 6.4.0.

@rhafer
Copy link
Contributor

rhafer commented Sep 25, 2024

Ok, thanks for trying.

Could you please set OCIS_LOG_LEVEL=debug, start a clean ocis (ideally 6.4.0) instance, reproduce the upload issue and then attach the full log of that?

@yuri-becker
Copy link
Author

Done, funnily enough, on this install, I wasn't able to upload even small files.
owncloud_logs_fresh_install.txt

@rhafer
Copy link
Contributor

rhafer commented Sep 26, 2024

Ok. I think that gets us a bit further I think. I can see a couple of successful uploads. But there is also at least one upload failing:

2024-09-25T15:16:38Z INF access-log | service=proxy proto=HTTP/1.1 request-id=a1c6abd7-29be-454b-9a52-3d26fac34430 traceid=b48f7713426e8f72d7e28fb5f67ef253 remote-addr=93.208.197.202 method=POST status=502 path=/remote.php/dav/spaces/c6d8a02e-77f9-409c-93e2-12b72659fa41$b69dbc42-e8bd-4e24-a626-4d6aae354d64 duration=56.566584 bytes=0 line=github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34 
2024-09-25T15:16:38Z DBG Decomposedfs: built session info | service=storage-users pkg=rgrpc traceid=b48f7713426e8f72d7e28fb5f67ef253 session={} line=github.com/cs3org/reva/[email protected]/pkg/storage/utils/decomposedfs/upload.go:305 
2024-09-25T15:16:38Z INF file upload | service=storage-users pkg=rgrpc traceid=b48f7713426e8f72d7e28fb5f67ef253 data-server=http://localhost:9158/data/simple/26ae9dd0-ea47-4ef3-bbeb-0538c052aa56 fn=./small_file.stl xs=map[md5:100 unset:1000] line=github.com/cs3org/reva/[email protected]/internal/grpc/services/storageprovider/storageprovider.go:470 
2024-09-25T15:16:38Z INF file upload | service=storage-users pkg=rgrpc traceid=b48f7713426e8f72d7e28fb5f67ef253 data-server=http://localhost:9158/data/tus/26ae9dd0-ea47-4ef3-bbeb-0538c052aa56 fn=./small_file.stl xs=map[md5:100 unset:1000] line=github.com/cs3org/reva/[email protected]/internal/grpc/services/storageprovider/storageprovider.go:470 
2024-09-25T15:16:38Z DBG unary | service=storage-users pkg=rgrpc traceid=b48f7713426e8f72d7e28fb5f67ef253 user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 from=tcp://127.0.0.1:49880 uri=/cs3.storage.provider.v1beta1.ProviderAPI/InitiateFileUpload start=25/Sep/2024:15:16:38 +0000 end=25/Sep/2024:15:16:38 +0000 time_ns=61016562 code=OK line=github.com/cs3org/reva/[email protected]/internal/grpc/interceptors/log/log.go:69 
2024-09-25T15:16:38Z DBG unary | service=gateway pkg=rgrpc traceid=b48f7713426e8f72d7e28fb5f67ef253 user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 from=tcp://127.0.0.1:51788 uri=/cs3.gateway.v1beta1.GatewayAPI/InitiateFileUpload start=25/Sep/2024:15:16:38 +0000 end=25/Sep/2024:15:16:38 +0000 time_ns=62587837 code=OK line=github.com/cs3org/reva/[email protected]/internal/grpc/interceptors/log/log.go:69 
2024-09-25T15:16:38Z ERR error doing PATCH request to data gateway | service=ocdav name=com.owncloud.web.ocdav traceid=b48f7713426e8f72d7e28fb5f67ef253 request-id=a1c6abd7-29be-454b-9a52-3d26fac34430 spaceid=c6d8a02e-77f9-409c-93e2-12b72659fa41$b69dbc42-e8bd-4e24-a626-4d6aae354d64 path=/ filename=small_file.stl error=Patch "https://cloud.catboy.house/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNzI3MzYzNzk4LCJpYXQiOjE3MjcyNzczOTgsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy8yNmFlOWRkMC1lYTQ3LTRlZjMtYmJlYi0wNTM4YzA1MmFhNTYifQ.25taQcy58gE_SF8KzQzInVeAeaptFimE2u_gWLHoJy8": context canceled line=github.com/cs3org/reva/[email protected]/internal/http/services/owncloud/ocdav/tus.go:298

As part of the TUS upload processing ocis needs to send a PATCH request to the so-called data gateway. It's sending that to the public ocis URL currently. So that request will go through your reverse proxy (caddy, I believe). For some (yet unknown) reason. The request is never received by ocis for this specific upload, at least there is no correlating access-log message for the PATCH request with a matching request-id. So it somehow got lost between ocis and the reverse proxy.

Do you have any logs from your reverse proxy from around that time? If not, you might wanna increase the log level of the reverse proxy and reproduce the issue once more.

@yuri-becker
Copy link
Author

I havn't been logging but enabled logging and re-tried uploads. Here you go. Thanks again for all your help.
cloud.catboy.house-access.log

@rhafer
Copy link
Contributor

rhafer commented Sep 30, 2024

@yuri-becker Which caddy release are you using? And why are you setting:

        request_buffers 256k
        response_buffers 256k

Does it work without those, by chance? (The docs say those should only be used if really really needed)

@yuri-becker
Copy link
Author

@yuri-becker Which caddy release are you using? And why are you setting:

        request_buffers 256k
        response_buffers 256k

Does it work without those, by chance? (The docs say those should only be used if really really needed)

@rhafer Without these, everything seems to work as it should. Thank you for your help. I would be curious if the same issue exists on nginx, but I might also have not set something in Caddy that I should have with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants