Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically "touch" the IPC socket files to prevent inadvertent cleanup by systemd-tmpfilesd #1895

Open
achimnol opened this issue Feb 7, 2024 · 0 comments
Labels
comp:common Related to Common component type:enhance Enhance component, behavior, internals without user-facing features urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores.
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Feb 7, 2024

We need to periodically refresh the atime/mtime timestamps so that IPC socket files do not get deleted.
Unlike normal files, the atime/mtime of IPC socket files are not updated by reading/writing through them without re-opening the file descriptors.

For example, we could update atime/mtime on every 1 hour.

How to use os.utime():

import os
import time

socket_file_path = '/path/to/your/socket.file'

# Get the current time
current_time = time.time()

# Update the access time and modification time to now
os.utime(socket_file_path, (current_time, current_time))

Places to update

  • We could write a timer to scan and touch files in the IPC base path as a ai.backend.common module.
  • Activate the timer in the manager, agent, storage-proxy, and webserver startup.

Workaround

Until we add this feature, we can workaround the issue by changing the ipc-base-path configuration to a directory outside /tmp or adding additoinal systemd-tmpfilesd rules to exempt the Backend.AI IPC directories.

How to update /etc/tmpfiles.d/tmp.conf:

cp /usr/lib/tmpfiles.d/tmp.conf /etc/tmpfiles.d/
x /tmp/backend.ai
x /tmp/backend.ai/*
x /tmp/backend.ai/ipc
x /tmp/backend.ai/ipc/*
x /tmp/backend.ai/ipc/container
x /tmp/backend.ai/ipc/container/*
x /tmp/backend.ai/manager
x /tmp/backend.ai/manager/ipc
systemctl start systemd-tmpfiles-clean && journalctl -u systemd-tmpfiles-clean -xe -f
@achimnol achimnol added comp:common Related to Common component type:enhance Enhance component, behavior, internals without user-facing features labels Feb 7, 2024
@achimnol achimnol added this to the 23.09 milestone Feb 7, 2024
@achimnol achimnol added the urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores. label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:common Related to Common component type:enhance Enhance component, behavior, internals without user-facing features urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores.
Projects
None yet
Development

No branches or pull requests

1 participant