Qdrant on Kubernetes OOMKilled During Ingestion #3501

dacox · 2024-01-31T16:23:46Z

dacox
Jan 31, 2024

We are running on Kubernetes. Each node has 4 cpu and 12.97GB of allocatable memory.

We are running an 8 replica cluster.

We have tried many different memory requests and limits, and always experience OOMKills at some point during ingestion.

Looking at charts, memory ramps until our requested amount, and then we experience OOMKills.

Here are messages from dmesg

[45677.453474] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ff638e2d2993902ea12313da1e88275f869fcb15c0a1eac8861416f12183438b,mems_allowed=0,oom_memcg=/kubepods/burstable/pod18bd39a7-6b6a-4570-96b5-52c07ee28159/ff638e2d2993902ea12313da1e88275f869fcb15c0a1eac8861416f12183438b,task_memcg=/kubepods/burstable/pod18bd39a7-6b6a-4570-96b5-52c07ee28159/ff638e2d2993902ea12313da1e88275f869fcb15c0a1eac8861416f12183438b,task=qdrant,pid=935981,uid=0
[45677.493439] Memory cgroup out of memory: Killed process 935981 (qdrant) total-vm:17866828kB, anon-rss:11191516kB, file-rss:24528kB, shmem-rss:0kB, UID:0 pgtables:25772kB oom_score_adj:301
[46011.557401] [ 944050]     0 944050  4441105  2803815 26427392        0           301 qdrant
[46011.565900] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=2dcb9b2e6640d719e580546826d65e79ef1dcc4106728893b56073255d09c36e,mems_allowed=0,oom_memcg=/kubepods/burstable/pod18bd39a7-6b6a-4570-96b5-52c07ee28159/2dcb9b2e6640d719e580546826d65e79ef1dcc4106728893b56073255d09c36e,task_memcg=/kubepods/burstable/pod18bd39a7-6b6a-4570-96b5-52c07ee28159/2dcb9b2e6640d719e580546826d65e79ef1dcc4106728893b56073255d09c36e,task=qdrant,pid=944050,uid=0
[46011.605838] Memory cgroup out of memory: Killed process 944050 (qdrant) total-vm:17764420kB, anon-rss:11190680kB, file-rss:24580kB, shmem-rss:0kB, UID:0 pgtables:25808kB oom_score_adj:301

Our full collection is 65 million 1536 dim vectors.

This is very frustrating, My napkin math says we have plenty of RAM for binary quantization - i feel in my gut this has to do with memmap.

When the cluster goes down, it has problems booting from disk again, as it seems to be using tons of virtual memory and getting OOMKIlled.

At a loss for what to do.

Here is our collection config:

PUT /collections/media_embeddings_c927
{
  "vectors": {
    "size": 1536,
    "distance": "Cosine",
  },
  "optimizers_config": {
    "default_segment_number": 32,
    "memmap_threshold": 10000,
    "indexing_threshold": 0
  },
  "hnsw_config": {
    "on_disk": true
  },
  "quantization_config": {
    "binary": {
      "always_ram": true
    }
  },
  "shard_number": 8,
  "replication_factor": 2
}

Here is our StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  labels:
    app: qdrant
    app.kubernetes.io/name: qdrant
    app.kubernetes.io/instance: qdrant
    app.kubernetes.io/version: "v1.7.2"
spec:
  replicas: 8
  selector:
    matchLabels:
      app: qdrant
      app.kubernetes.io/name: qdrant
      app.kubernetes.io/instance: qdrant
  serviceName: qdrant-headless
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '6333'
        prometheus.io/name: '/metrics'
        sidecar.istio.io/rewriteAppHTTPProbers: 'false'
      labels:
        app: qdrant
        app.kubernetes.io/name: qdrant
        app.kubernetes.io/instance: qdrant
    spec:
      containers:
        - name: qdrant
          image: "qdrant/qdrant:v1.7.2"
          imagePullPolicy: IfNotPresent
          command: ["/bin/sh", "-c"]
          args:
          - ./config/initialize.sh
          ports:
            - name: http
              containerPort: 6333
              protocol: TCP
            - name: grpc
              containerPort: 6334
              protocol: TCP
            - name: p2p
              containerPort: 6335
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: 6333
            initialDelaySeconds: 60
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 6
          readinessProbe:
            httpGet:
              path: /
              port: 6333
            initialDelaySeconds: 60
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 6
          startupProbe:
            httpGet:
              path: /
              port: 6333
            initialDelaySeconds: 600
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 30
          resources:
            limits:
              cpu: "4"
              memory: 11900M
            requests:
              cpu: "3"
              memory: 11900M
          volumeMounts:
          - name: qdrant-storage
            mountPath: /qdrant/storage
          - name: qdrant-config
            mountPath: /qdrant/config/initialize.sh
            subPath: initialize.sh
          - name: qdrant-config
            mountPath: /qdrant/config/production.yaml
            subPath: production.yaml
      volumes:
        - name: qdrant-config
          configMap:
            name: qdrant
            defaultMode: 0755

  volumeClaimTemplates:
    - metadata:
        name: qdrant-storage
        labels:
          app: qdrant
      spec:
        storageClassName: ssd  
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: "500Gi"

There is also an injected istio-proxy pod with memory requests and limits of 128Mi and 1Gi

timvisee · 2024-01-31T16:30:29Z

timvisee
Jan 31, 2024
Maintainer

Our full collection is 65 million 1536 dim vectors.

This is very frustrating, My napkin math says we have plenty of RAM for binary quantization - i feel in my gut this has to do with memmap.

A rough calculation shows that 65 million vectors with a dimensionality of 1536 requires 580GB of RAM. That is without any form of quantization and a replication factor of 1. (calculator)

A replication factor of 2 means you'd need double that, being 1.16TB of RAM.

Based on your above configuration I have two comments:

You specify a memory map threshold, but in this case it's likely better to put all vectors on disk in all cases. You can set the on_disk: true flag on vectors.
Maybe your binary quantized data is too big to keep in memory. You might have to put it on disk as well.
Very roughly, 580GB divided by 32 is still 18+GB, which is larger than the 13GB you have available. With a replication factor of 2 you'd need double that.

Regarding the formula, it's explained here: https://qdrant.tech/documentation/cloud/capacity-sizing/#basic-configuration
The replication factor is simply a multiplication of that.

7 replies

dacox Jan 31, 2024
Author

@timvisee

With on_disk: true you'd be sure everything is memory mapped to make the most out of your RAM.

the reason I ask is because if im having memory issues, I'm not sure how making sure everything is memory mapped rather than just some of it will help?

Could you dump your current collection info here as well? It might give us more insight.

I'm going to run some more experiments and next time it explodes ill post the collection info

timvisee Feb 1, 2024
Maintainer

the reason I ask is because if im having memory issues, I'm not sure how making sure everything is memory mapped rather than just some of it will help?

Most of the memory mapped memory can be unloaded on demand. Non-memory mapped memory cannot be unloaded at all, and will always be in RAM.

You had configured your binary quantization to always be in memory, so this already eats up the available memory you have.

All your small segments will not be memory mapped because they don't reach the threshold. So these will eat up even more memory.

All of it together (with other things) collapses as your available memory reaches zero.

If you'd enable memory mapping (on_disk: true) for all vectors by default, all your small segments will be memory mapped as well. That may as well do enough to prevent going out of memory.

Does that explain my suggestion well enough?

I'm going to run some more experiments and next time it explodes ill post the collection info

Thanks!

dacox Feb 1, 2024
Author

Most of the memory mapped memory can be unloaded on demand. Non-memory mapped memory cannot be unloaded at all, and will always be in RAM.

OK, i think we had things kind of backwards mentally.

I thought it had to be threshold big to be memory mapped, otherwise its always on disk. Turns out its always in memory.

For the record, I did the following things:

on_disk true for vectors
made sure we set on_disk_payload true
Did not create a payload index for asset_uuid - since we only plan on filtering via type and `source.

I'm not totally sure which of these had the biggest effect - I'm guessing it was a combination of all of them, but mostly 1 and 3

With this configuration it looks like we're basically maxxing out what we can fit in our ~80G of memory.

Of course, memory mapping still makes that a bit hard to reason about - is there a way to keep vectors ONLY on disk to get a better idea of minimal memory requirements?

timvisee Feb 2, 2024
Maintainer

I thought it had to be threshold big to be memory mapped, otherwise its always on disk. Turns out its always in memory.

By default it's always in memory, that's correct. We do this because it is the preferred location to achieve the best performance.

Am I correct that I'm seeing good results in your graphs now? And that this at least fixed going OOM?

It looks like memory is settling down at the end. If all optimizations are done, you might have slightly better memory usage if you restart all nodes another time. But that wouldn't be of much gain, other than seeing a lower number in the graph.

is there a way to keep vectors ONLY on disk to get a better idea of minimal memory requirements?

Currently not, no.

We may eventually implement this, though I don't see a real use case for it at this time.

dacox Feb 2, 2024
Author

@timvisee yes - we can now ingest everything without going out of memory. Payloads on disk are causing issues with search performance, but i think the primary bottleneck on memory was the uuid payload that we will set as the point id instead going forwards

dacox · 2024-02-06T15:18:15Z

dacox
Feb 6, 2024
Author

@timvisee OK, I am officially very confused about payload storage.

I got rid of the high cardinality UUID payload, opting to store payload ids as UUIDs instead.

With on_disk_payload: true, we max out at a stable 20GB of memory.

With on_disk_payload: false, it has again crashed during ingestion, using over 90GB of memory.

But our payloads should be very small!

{
  "field_name": "v",
  "field_schema": "bool"
}

{
  "field_name": "s",
  "field_schema": "integer"
}

We are storing an integer and a boolean.

For 64 bit integers, we should only need about 500MB of memory for all 66M records, and even less for the boolean.
Even if we need to store the field_name independently, that should still only be like 2GB across the board?

Finding it very hard to capacity plan, we keep running experiments and being surprised

16 replies

dacox Feb 11, 2024
Author

@generall interesting! We've been trying to get our performance for search to be decent given the hardware we have. Unfortunately I've lost track of every experiment we tried - I will try on disk payloads and payload indexes again and see if we run into the same ballooning memory problem

timvisee Feb 12, 2024
Maintainer

@dacox Even though payloads in memory may not be the go-to way, I still think there shouldn't be such a memory increase.

What kinds of payloads are you adding to every point, just some random {"b": true, "i": 12345}-like objects?

If so, I might try to bisect what commit introduced this change.

dacox Feb 12, 2024
Author

@timvisee yep, they are integers between 1 and 50, and booleans. The keys are also a single character. 1 payload per vector - from my napkin math I can't see them taking for than 1 - 2 Gi.

generall Feb 13, 2024
Maintainer

unindexed payload stored as-is in binary represenatation of json, so each record also stores all field names. So it might get expensive quite easily. That's a reason why we enable on_disk_payload by default

dacox Feb 13, 2024
Author

@generall good to know!

even so, payloads like

'{"v": true, "s": 31}'

are about 20 bytes, which for 66M records is still less than 2Gi, but I'm seeing upwards of 60Gi of memory usage

I did open #3604 in an attempt to figure out whats going on

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant

Qdrant on Kubernetes OOMKilled During Ingestion #3501

{{title}}

Replies: 2 comments 23 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Qdrant

Qdrant on Kubernetes OOMKilled During Ingestion #3501

dacox Jan 31, 2024

Replies: 2 comments · 23 replies

timvisee Jan 31, 2024 Maintainer

dacox Jan 31, 2024 Author

timvisee Feb 1, 2024 Maintainer

dacox Feb 1, 2024 Author

timvisee Feb 2, 2024 Maintainer

dacox Feb 2, 2024 Author

dacox Feb 6, 2024 Author

dacox Feb 11, 2024 Author

timvisee Feb 12, 2024 Maintainer

dacox Feb 12, 2024 Author

generall Feb 13, 2024 Maintainer

dacox Feb 13, 2024 Author

dacox
Jan 31, 2024

Replies: 2 comments 23 replies

timvisee
Jan 31, 2024
Maintainer

dacox Jan 31, 2024
Author

timvisee Feb 1, 2024
Maintainer

dacox Feb 1, 2024
Author

timvisee Feb 2, 2024
Maintainer

dacox Feb 2, 2024
Author

dacox
Feb 6, 2024
Author

dacox Feb 11, 2024
Author

timvisee Feb 12, 2024
Maintainer

dacox Feb 12, 2024
Author

generall Feb 13, 2024
Maintainer

dacox Feb 13, 2024
Author