Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max retries exceeded with url ... oo many 503 error responses #1899

Closed
amir-bialek opened this issue Jul 8, 2024 · 6 comments
Closed

Max retries exceeded with url ... oo many 503 error responses #1899

amir-bialek opened this issue Jul 8, 2024 · 6 comments

Comments

@amir-bialek
Copy link

amir-bialek commented Jul 8, 2024

Is this a request for help?:


Version of Helm and Kubernetes:
1.29

Which chart:
artifactory-cpp-ce 107.77.8

Which product license (Enterprise/Pro/oss):
Community

JFrog support reference (if already raised with support team):

What happened:
Deployed artifactory-cpp-ce helm on on-prem k8s with the following values:

artifactory:

  nginx:
    enabled: false

  ingress:
    enabled: true
    className: "my-ingress-class"
    hosts:
      - my-host-address
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "0"

    tls: 
    - secretName: my-cert
      hosts:
        - my-host-address


  nameOverride: artifactory
  fullnameOverride: artifactory
  artifactory:
    persistence:
      size: 50Gi
  postgresql:
    enabled: false

postgresql:
  enabled: false

services are accessing conan directly with:
artifactory.default.svc.cluster.local

And I am getting this error too often:

conans.errors.ConanException: HTTPConnectionPool(host='artifactory.default.svc.cluster.local', port=8082): Max retries exceeded with url: /artifactory/api/conan/myartifactory/v1/ping (Caused by ResponseError('too many 503 error responses')).

Recently it is happening too many times to ignore
On the logs I see:

2024/07/07 11:47:04 httputil: ReverseProxy read error during body copy: stream error: stream ID 673843; CANCEL; received from peer

I can try to add:

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 2

(by default it is set to false)

And / or add postgresql:

postgresql:
  enabled: true

And / or add nginx:

  nginx:
    enabled: true

And / or update helm to version 107.84.16 , currently it is on 107.77.8

The thing is, all the software team are using this db, so every change will block them..

Also can someone advice what is the use of postgresql in this chart ?

If anyone can help, will appreciate it.

@gitta-jfrog
Copy link
Collaborator

Hi @amir-bialek
You mentioned several questions in this issue, if I didn't cover all of them, please let me know.

  1. Disabling PostgreSQL - You should use PostgreSQL when running Artifactory on k8s. We are not supporting k8s deployments With Derby database (the default database configured)

  2. autoscaling - You should not use it as it will work only when you have a valid Enterprise/Enterprise Plus license which support High Availability deployments.

  3. if you decided to use the ingress method and disable Nginx, you should install nginx-ingress controller in your cluster. https://jfrog.com/help/r/jfrog-installation-setup-documentation/run-ingress-behind-another-load-balancer

@amir-bialek
Copy link
Author

amir-bialek commented Jul 10, 2024

Hey @gitta-jfrog
Thank you for the reply.

  1. Understood, thank you. I am deploying now a new artifactory with the new helm + PostgreSQL, and will try to import the data from the 'live' artifactory into 'new' artifactory, then will make the switch (system backup and restore).
    Can you advice why the default PostgreSQL pvc is 200GB, but the artifactory pvc is only 20GB? shouldn't it be the opposite?

  2. Understood, thank you.

  3. Nginx ingress controller is install on the cluster and the ingress to conan svc is working well.
    Note that the specific call is happening without the ingress -> it is from another svc in k8s so it is calling conan directly with artifactory.default.svc.cluster.local:8082.

@gitta-jfrog
Copy link
Collaborator

  1. Indeed the defaults here should be tuned. You can change it according to your needs. Assuming you are storing your binaries on the PVC itself, the Artifactory Filestore will be definitely bigger than the DB size.

  2. I understand. So your client is reaching Artifactory SVC directly. I think the 503 errors you are seeing might be related to the resources allocated to Artifactory Service. What is the size of the node running Artifactory pod? can you see pod restarts? Anything in artifactory-service.log (/opt/jfrog/artifactory/var/log) that indicate resource of crashing of the JVM?
    How many incoming requests in parallel you are running?

@amir-bialek
Copy link
Author

Hey, Artifactory is running on worker1 , it have plenty of resources. kubectl top node give me:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 828m 6% 5952Mi 37%
worker1 2286m 6% 17547Mi 30%

There is no pod-restarts or any special errors, other then:

2024/07/07 11:47:04 httputil: ReverseProxy read error during body copy: stream error: stream ID 673843; CANCEL; received from peer

Which I do see a lot.

Looking at Grafana dashboard for pod resources, the CPU and memory are steady, I do not see any jump in the past 5 days.
At the moment the pod have no resources requests and limits (default settings).
I did try to add:

  artifactory:
    resources: 
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "6Gi"
        cpu: "1"
    javaOpts: 
      xms: "1g"
      xmx: "5g"

And I verified in the logs that it received the new Xmx, but it still reproduce the 503.

I do see a jump in Bandwidth and Packets:
image

@amir-bialek
Copy link
Author

amir-bialek commented Jul 10, 2024

Hey, after uploading the new Artifactory with the following:

  
artifactory:
  nginx:
    enabled: false
  ingress:
    enabled: true
    className: "my-class"
    hosts:
      - my-host1
      - my-host2
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "0"
    tls: 
    - secretName: dev-cert
      hosts:
        - my-host1
        - my-host2
        
  nameOverride: artifactory
  fullnameOverride: artifactory

  artifactory:
    persistence:
      accessMode: ReadWriteOnce
      size: 200Gi


  postgresql:
    persistence:
      enabled: true
      size: 20Gi

I do not see the error. but please leave this case open for another 2-3 days so that I can verify the problem is solved by using postgresql

@gitta-jfrog
Copy link
Collaborator

Great I'm glad you managed to move to PostgreSQL. The use of Artifactory with PostgreSQL will allow you multiple connections to the DB (compare to single connection allowed when using Derby) and that should improve the system behavior. I'll keep this open for the next few days.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants