Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount-s3-1.13.0.service Failed with result 'oom-kill' #1206

Open
nikitadom opened this issue Dec 23, 2024 · 3 comments
Open

mount-s3-1.13.0.service Failed with result 'oom-kill' #1206

nikitadom opened this issue Dec 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@nikitadom
Copy link

nikitadom commented Dec 23, 2024

Mountpoint for Amazon S3 version

mount-s3 1.13.0

AWS Region

Describe the running environment

Running on OpenStack Kubernetes Managed SaaS service Helm chart v.1.11.0 https://github.com/awslabs/mountpoint-s3-csi-driver/tree/main/charts/aws-mountpoint-s3-csi-driver.
OS: linux amd64
OS Image: Ubuntu 22.04.5 LTS
Kernel version: 5.15.0-124-generic
Container runtime: containerd://1.7.22
Kubelet version: v1.29.9
AWS credentials of IAM User provided via k8s secrets.

Mountpoint options

Mount Options
allow-delete, allow-other, region eu-west-2, prefix static/

What happened?

Pod can not start because of mount volume setup failed:

MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "k8s-s3-static-files" at "/var/lib/kubelet/pods/d72649b9-a67b-45fd-ab49-9ffe7764ca89/volumes/kubernetes.io~csi/s3-pv/mount": Mount failed: Failed to start systemd unit, context cancelled output:

Relevant log output

Dec 23 21:21:21 systemd[1]: Starting Mountpoint for Amazon S3 CSI driver FUSE daemon...
Dec 23 21:21:28 kernel: mount-s3 invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 23 21:21:28 kernel: CPU: 2 PID: 2110387 Comm: mount-s3 Not tainted 5.15.0-124-generic #134-Ubuntu
Dec 23 21:21:28 kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Dec 23 21:21:28 kernel: Call Trace:
Dec 23 21:21:28 kernel:  <TASK>
Dec 23 21:21:28 kernel:  show_stack+0x52/0x5c
Dec 23 21:21:28 kernel:  dump_stack_lvl+0x4a/0x63
Dec 23 21:21:28 kernel:  dump_stack+0x10/0x16
Dec 23 21:21:28 kernel:  dump_header+0x53/0x228
Dec 23 21:21:28 kernel:  oom_kill_process.cold+0xb/0x10
Dec 23 21:21:28 kernel:  out_of_memory+0x106/0x2e0
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  mem_cgroup_out_of_memory+0x13f/0x160
Dec 23 21:21:28 kernel:  try_charge_memcg+0x687/0x740
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? kernel_init_free_pages.part.0+0x4a/0x70
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? get_page_from_freelist+0x353/0x540
Dec 23 21:21:28 kernel:  charge_memcg+0x45/0xb0
Dec 23 21:21:28 kernel:  __mem_cgroup_charge+0x2d/0x90
Dec 23 21:21:28 kernel:  __add_to_page_cache_locked+0x2d8/0x350
Dec 23 21:21:28 kernel:  ? scan_shadow_nodes+0x40/0x40
Dec 23 21:21:28 kernel:  add_to_page_cache_lru+0x4d/0xd0
Dec 23 21:21:28 kernel:  pagecache_get_page+0x192/0x590
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? page_cache_ra_unbounded+0x163/0x210
Dec 23 21:21:28 kernel:  filemap_fault+0x488/0xab0
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? filemap_map_pages+0x309/0x400
Dec 23 21:21:28 kernel:  __do_fault+0x3c/0x120
Dec 23 21:21:28 kernel:  do_read_fault+0xeb/0x160
Dec 23 21:21:28 kernel:  do_fault+0xa0/0x2e0
Dec 23 21:21:28 kernel:  handle_pte_fault+0x1cd/0x240
Dec 23 21:21:28 kernel:  __handle_mm_fault+0x405/0x6f0
Dec 23 21:21:28 kernel:  handle_mm_fault+0xd8/0x2c0
Dec 23 21:21:28 kernel:  do_user_addr_fault+0x1c9/0x640
Dec 23 21:21:28 kernel:  exc_page_fault+0x77/0x170
Dec 23 21:21:28 kernel:  asm_exc_page_fault+0x27/0x30
Dec 23 21:21:28 kernel: RIP: 0033:0x55f2fc0ea4ed
Dec 23 21:21:28 kernel: Code: Unable to access opcode bytes at RIP 0x55f2fc0ea4c3.
Dec 23 21:21:28 kernel: RSP: 002b:00007f299e9d5790 EFLAGS: 00010246
Dec 23 21:21:28 kernel: RAX: 000055f2fc2cbf34 RBX: 00007f299e9d57f0 RCX: 00007f2998000030
Dec 23 21:21:28 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000594
Dec 23 21:21:28 kernel: RBP: 00007f299e9d5790 R08: 0000000000000000 R09: 00007f2998000d50
Dec 23 21:21:28 kernel: R10: 0000000000000077 R11: 00007f2998000090 R12: 0000000000000001
Dec 23 21:21:28 kernel: R13: 00007f299e9d5ad8 R14: 00000000000040fb R15: 00007f299e9d57e8
Dec 23 21:21:28 kernel:  </TASK>
Dec 23 21:21:28 kernel: memory: usage 393216kB, limit 393216kB, failcnt 65040923
Dec 23 21:21:28 kernel: swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec 23 21:21:28 kernel: Memory cgroup stats for /system.slice:
Dec 23 21:21:28 kernel: anon 375468032
                                                   file 5414912
                                                   kernel_stack 1343488
                                                   pagetables 3022848
                                                   percpu 1196608
                                                   sock 0
                                                   shmem 1507328
                                                   file_mapped 3399680
                                                   file_dirty 0
                                                   file_writeback 0
                                                   swapcached 0
                                                   anon_thp 0
                                                   file_thp 0
                                                   shmem_thp 0
                                                   inactive_anon 356020224
                                                   active_anon 1437696
                                                   inactive_file 372736
                                                   active_file 139264
                                                   unevictable 22913024
                                                   slab_reclaimable 5072048
                                                   slab_unreclaimable 9662144
                                                   slab 14734192
                                                   workingset_refault_anon 0
                                                   workingset_refault_file 68014397
                                                   workingset_activate_anon 0
                                                   workingset_activate_file 1696685
                                                   workingset_restore_anon 0
                                                   workingset_restore_file 366767
                                                   workingset_nodereclaim 105839
                                                   pgfault 78387338
                                                   pgmajfault 2230983
                                                   pgrefill 389283562
                                                   pgscan 3739026161
                                                   pgsteal 76549324
                                                   pgactivate 388064783
                                                   pgdeactivate 388780914
                                                   pglazyfree 0
                                                   pglazyfreed 0
                                                   thp_fault_alloc 0
                                                   thp_collapse_alloc 0
Dec 23 21:21:28 kernel: Tasks state (memory values in pages):
Dec 23 21:21:28 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec 23 21:21:28 kernel: [    864]     0   864     1555      221    49152        0             0 agetty
Dec 23 21:21:28 kernel: [    449]     0   449    19290     3956   176128        0          -250 systemd-journal
Dec 23 21:21:28 kernel: [    480]     0   480    72338     6905   114688        0         -1000 multipathd
Dec 23 21:21:28 kernel: [    487]     0   487     6405     1127    69632        0         -1000 systemd-udevd
Dec 23 21:21:28 kernel: [    673]   113   673     2026      857    53248        0             0 rpcbind
Dec 23 21:21:28 kernel: [    701]   101   701     6450     1925    86016        0             0 systemd-resolve
Dec 23 21:21:28 kernel: [    792]     0   792     1822      585    53248        0             0 cron
Dec 23 21:21:28 kernel: [    797]   102   797     2276      945    57344        0          -900 dbus-daemon
Dec 23 21:21:28 kernel: [    803]     0   803    20713      779    61440        0             0 irqbalance
Dec 23 21:21:28 kernel: [    806]     0   806     8273     3073   102400        0             0 networkd-dispat
Dec 23 21:21:28 kernel: [    807]   104   807    55601      651    77824        0             0 rsyslogd
Dec 23 21:21:28 kernel: [    838]     0   838     3859     1079    73728        0         -1000 sshd
Dec 23 21:21:28 kernel: [    824]     0   824     7855      804    69632        0             0 systemd-logind
Dec 23 21:21:28 kernel: [    872]     0   872     1544      217    45056        0             0 agetty
Dec 23 21:21:28 kernel: [    906]     0   906    27527     2975   114688        0             0 unattended-upgr
Dec 23 21:21:28 kernel: [   1105]   103  1105    22341     1131    77824        0             0 systemd-timesyn
Dec 23 21:21:28 kernel: [   1237]   100  1237     4225     1414    73728        0         -1000 systemd-network
Dec 23 21:21:28 kernel: [   1353]     0  1353    58865      371    90112        0             0 polkitd
Dec 23 21:21:28 kernel: [   1895]     0  1895      621       10    45056        0            10 sh
Dec 23 21:21:28 kernel: [   1921]     0  1921   306742      117    90112        0            10 go-runner
Dec 23 21:21:28 kernel: [   1926]     0  1926   315588     2281   180224        0            10 cinder-csi-plug
Dec 23 21:21:28 kernel: [   2627]     0  2627   311473     1303   143360        0            10 csi-node-driver
Dec 23 21:21:28 kernel: [ 158439]     0 158439    74068     1472   159744        0             0 packagekitd
Dec 23 21:21:28 kernel: [1352765]     0 1352765   441273     3188   319488        0          -900 snapd
Dec 23 21:21:28 kernel: [2110384]     0 2110384    22195      809   102400        0             0 mount-s3
Dec 23 21:21:28 kernel: [2110385]     0 2110385   325873    69389   827392        0             0 mount-s3
Dec 23 21:21:28 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service,mems_allowed=0,oom_memcg=/system.slice,task_mem>
Dec 23 21:21:28 kernel: Memory cgroup out of memory: Killed process 2110385 (mount-s3) total-vm:1303492kB, anon-rss:277556kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:808kB oom_score_adj:0
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: A process of this unit has been killed by the OOM killer.
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: Failed with result 'oom-kill'.
Dec 23 21:21:28 systemd[1]: Failed to start Mountpoint for Amazon S3 CSI driver FUSE daemon.
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: Consumed 15.479s CPU time.
@nikitadom nikitadom added the bug Something isn't working label Dec 23, 2024
@unexge
Copy link
Contributor

unexge commented Dec 27, 2024

Hey @nikitadom, Mountpoint currently tries to use 512 MiB memory by default, by looking at the logs it seems like it has less than minimum memory available – which might be causing OOM. Would you be able to increase Mountpoint's memory to a higher limit?

You might be also running into awslabs/mountpoint-s3-csi-driver#82, the CSI Driver currently spawns Mountpoint in systemd context and consumes systemd resources rather than Kubernetes/container resources.

@nikitadom
Copy link
Author

Hey @nikitadom, Mountpoint currently tries to use 512 MiB memory by default, by looking at the logs it seems like it has less than minimum memory available – which might be causing OOM. Would you be able to increase Mountpoint's memory to a higher limit?

You might be also running into awslabs/mountpoint-s3-csi-driver#82, the CSI Driver currently spawns Mountpoint in systemd context and consumes systemd resources rather than Kubernetes/container resources.

Where should I increase the memory?

@unexge
Copy link
Contributor

unexge commented Dec 30, 2024

The CSI Driver spawns systemd units with mount-s3-<mp-version>-<uuid>.service format, I think you can use drop-in files for Mountpoint units to tweak its configuration.

For example, you can create /etc/systemd/system/mount-s3-.service.d/50-memory.conf with the content:

$ cat /etc/systemd/system/mount-s3-.service.d/50-memory.conf
[Service]
MemoryHigh=2G

and reload systemd daemon to apply changes:

$ systemctl daemon-reload

After that, existing or newly created systemd units for Mountpoint will have MemoryHigh=2G:

$ systemctl status mount-s3-1.13.0-a3bb5010-9341-49c6-9806-dfbb84aba93b.service | grep Memory
     Memory: 16.4M (high: 2.0G available: 1.9G)

See systemd documentation and this SO answer on configuring memory limit for systemd units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants