Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upserts partially committed during node restart #4626

Closed
tekumara opened this issue Jul 6, 2024 · 5 comments · Fixed by qdrant/landing_page#1013
Closed

Upserts partially committed during node restart #4626

tekumara opened this issue Jul 6, 2024 · 5 comments · Fixed by qdrant/landing_page#1013
Labels
bug Something isn't working

Comments

@tekumara
Copy link

tekumara commented Jul 6, 2024

In a 3-node cluster with replication factor 3, during upserts (via batch updates with ordering weak or strong):

  1. One node restarts.
  2. Upserts fail because write consistency != number of replicas ie: 3 != 2 (this is as expected)
  3. The other active nodes commit the upsert.
  4. Restarted node rejoins but doesn't sync the missed upsert.

Result: data inconsistency across nodes, ie: missing points on the restarted node.

Steps to Reproduce

  1. Checkout tekumara/qdrant-demo and install prerequisites
  2. Install using make all
  3. Run make restart-with-upserts
  4. Run make healthcheck and observe different counts across the nodes. eg:
     ❯ make restart-with-upserts
     infra/perf/perf-restart-check.sh upsert
     Start upsert workload
     
               /\      |‾‾| /‾‾/   /‾‾/   
          /\  /  \     |  |/  /   /  /    
         /  \/    \    |     (   /   ‾‾\  
        /          \   |  |\  \ |  (‾)  | 
       / __________ \  |__| \__\ \_____/ .io
     
          execution: local
             script: infra/perf/k6.js
             output: -
     
          scenarios: (100.00%) 1 scenario, 1 max VUs, 1m30s max duration (incl. graceful stop):
                   * default: 1 looping VUs for 1m0s (gracefulStop: 30s)
     
     INFO[0001] deleting existing collection k6-perf-test     source=console
     INFO[0001] delete existing collection k6-perf-test: {"result":true,"status":"ok","time":0.079671625}  source=console
     pod "qdrant-1" deleted
     ERRO[0014] GoError: http status: 500
     {"status":{"error":"Service internal error: 1 out of 3 shards failed to apply operation. First error captured: Service internal error: Failed to apply update with Strong ordering via leader peer 4681542363249938: Timeout error: Deadline Exceeded: status: DeadlineExceeded, message: \"Timeout: Timeout error: Deadline Exceeded: status: DeadlineExceeded, message: \\\"Healthcheck timeout 2000ms exceeded\\\", details: [], metadata: MetadataMap { headers: {} }\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Sat, 06 Jul 2024 12:28:04 GMT\", \"content-length\": \"0\"} }"},"time":6.789591462}
             at go.k6.io/k6/js/modules/k6.(*K6).Fail-fm (native)
             at file:///Users/oliver.mannion/code/qdrant-demo/infra/perf/k6.js:175:7(124)  executor=constant-vus scenario=default source=stacktrace
     stamina.retry_scheduled
     stamina.retry_scheduled
     count0=3840
     count1=554622.1s), 1/1 VUs, 455 complete and 0 interrupted iterations
     count2=5558===========>--------------------------] 1 VUs  0m19.5s/1m0s
     Counts not equal
     count0=6060
     count1=605723.5s), 1/1 VUs, 504 complete and 0 interrupted iterations
     count2=6070============>-------------------------] 1 VUs  0m20.9s/1m0s
     Counts not equal
     
          ✗ batch update points - status is 200
           ↳  99% — ✓ 512 / ✗ 1
          ✗ batch update points - is OK
           ↳  99% — ✓ 512 / ✗ 1
          ✗ batch update points - completed
           ↳  99% — ✓ 512 / ✗ 1
     
          █ setup
     
            ✓ create collection - status is 200
            ✓ create collection - is OK
            ✓ add points - status is 200
            ✓ add points - is OK
            ✓ add points - completed
     
          checks.........................: 99.81% ✓ 1658      ✗ 3  
          data_received..................: 172 kB 7.2 kB/s
          data_sent......................: 182 MB 7.7 MB/s
          http_req_blocked...............: avg=5.93µs   min=2µs     med=4µs     max=688µs   p(90)=5µs     p(95)=6µs     
          http_req_connecting............: avg=1.25µs   min=0s      med=0s      max=373µs   p(90)=0s      p(95)=0s      
          http_req_duration..............: avg=25.05ms  min=7.4ms   med=9.68ms  max=6.8s    p(90)=17.27ms p(95)=23.99ms 
            { expected_response:true }...: avg=12.83ms  min=7.4ms   med=9.68ms  max=445.6ms p(90)=17.16ms p(95)=23.8ms  
          http_req_failed................: 0.17%  ✓ 1         ✗ 555
          http_req_receiving.............: avg=81.88µs  min=36µs    med=57µs    max=4.91ms  p(90)=89µs    p(95)=112µs   
          http_req_sending...............: avg=145.47µs min=13µs    med=106µs   max=3.81ms  p(90)=167.5µs p(95)=196.24µs
          http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s      p(90)=0s      p(95)=0s      
          http_req_waiting...............: avg=24.82ms  min=7.25ms  med=9.49ms  max=6.8s    p(90)=17.1ms  p(95)=23.81ms 
          http_reqs......................: 556    23.425145/s
          iteration_duration.............: avg=46.13ms  min=21.56ms med=24.77ms max=6.82s   p(90)=35.24ms p(95)=42.42ms 
          iterations.....................: 513    21.613488/s
          vus............................: 1      min=0       max=1
          vus_max........................: 1      min=1       max=1
     
     
     running (0m23.7s), 0/1 VUs, 513 complete and 1 interrupted iterations
     default ✗ [============>-------------------------] 1 VUs  0m21.1s/1m0s
     ERRO[0025] test run was aborted because k6 received a 'terminated' signal 
     k6 stopped
     make: *** [restart-with-upserts] Error 1
     ❯ make healthcheck
     .venv/bin/python -m src.demo.healthcheck
     count0=6140
     count1=6136
     count2=6140
     empty0=[]
     empty1=[]
     empty2=[]
    

Expected Behavior

A node can be restarted during upserts, and when it rejoins the qdrant cluster it should be consistent with the rest of the cluster.

Context (Environment)

Observed in production because qdrant node restarts are not uncommon on kubernetes and may occur during a rolling upgrade of the pods, or because they are OOMKilled, or general kubernetes rescheduling/maintenance.

qdrant 1.10.0

@tekumara tekumara added the bug Something isn't working label Jul 6, 2024
@generall
Copy link
Member

generall commented Jul 6, 2024

Hey @tekumara, since

Upserts fail because write consistency

qdrant doesn't not guarantee consistency of this specific operation. Qdrant expects external process to re-apply the operation and fix inconsistency without performing potentially expensive data sync.

Internally, consistency is only guaranteed if the operation was accepted.

If the write_consistency_factor is low enough, the failed nodes will be marked as dead and re-synced on restart. So i believe, that you can achieve your desired behavior by using lower write_consistency_factor.

@generall generall closed this as completed Jul 6, 2024
@tekumara
Copy link
Author

tekumara commented Jul 6, 2024

Oh I see, so I think what I expected here but didn't make clear, is when the write fails to apply to write_consistency_factor number of replicas and returns a failure to the client, then it shouldn't be committed to any replica (rather than leave a subset of nodes with the write)

Write operations will fail if the number of active replicas is less than the write_consistency_factor

@generall
Copy link
Member

generall commented Jul 6, 2024

then it shouldn't be committed to any replica (rather than leave a subset of nodes with the write)

This would require either two-phase commit schema, or sequential writes. Both options would likely damage performance, so we decided against it.

@tekumara
Copy link
Author

tekumara commented Jul 7, 2024

I can confirm that with write_consistency_factor = 2 the operation is accepted when 2 of 3 nodes are available (as mentioned here) and, more importantly, the restarted node rejoins the cluster with a consistent set of points, thank you!

Internally, consistency is only guaranteed if the operation was accepted.

As a suggestion, could the docs could be updated with something along these lines (unless I've missed it somewhere)?

tekumara added a commit to tekumara/qdrant-demo that referenced this issue Jul 7, 2024
to ensure consistency when 1 node restarts see
qdrant/qdrant#4626 (comment)
@timvisee
Copy link
Member

timvisee commented Jul 8, 2024

As a suggestion, could the docs could be updated with something along these lines (unless I've missed it somewhere)?

@tekumara I've added the following: qdrant/landing_page#1013

Please feel free to leave a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants