Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job script mkdir tweak #6000

Open
hjoliver opened this issue Feb 27, 2024 · 3 comments
Open

job script mkdir tweak #6000

hjoliver opened this issue Feb 27, 2024 · 3 comments
Labels
could be better Not exactly a bug, but not ideal. investigation question Flag this as a question for the next Cylc project meeting.
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Feb 27, 2024

Part of the job script boilerplate in cylc/flow/etc/job.sh:

  # Create share and work directories
    mkdir -p "${CYLC_WORKFLOW_SHARE_DIR}" || true
    mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true

I've traced the origin of this scripting to PR #17 🤯

It would be good if we could remove the || true fail-safes, to make the task fail if those "directories" are actually dangling symlinks (e.g. the symlinked data dirs got whacked by a disk failover).

We've speculated that the || true was meant to protect against multiple tasks trying to create these directories at the same time, but mkdir -p should have that covered.

However, @oliver-sanders correctly pointed out that changing anything this fundamental is risky.

Ping @matthewrmshin - if you're listening, as the author of that PR, do you recall your thought process from late September 2011? If so, you deserve a prize, but maybe it's worth asking!

@hjoliver hjoliver added question Flag this as a question for the next Cylc project meeting. could be better Not exactly a bug, but not ideal. labels Feb 27, 2024
@hjoliver hjoliver added this to the some-day milestone Feb 27, 2024
@matthewrmshin
Copy link
Contributor

Your speculation is most likely correct. It was to avoid multiple tasks (i.e., different jobs/processes on different nodes on an HPC/cluster) creating the same directories at the same time on a networked file system.

mkdir -p is definitely good enough when you are on a local file system. However, I don't really know how networked file systems will behave these days when you have multiple processes on multiple nodes trying to create the same directories.

Also bear in mind that we had to handle both ksh and bash running on machines that are not GNU/Linux in the early days of Cylc. These days we only have to handle modern GNU/Linux systems, so at least you can rely on a more uniformly behaving mkdir -p.

(An alternate way to implement this is to use a while [[ ! -d "${dir}" ]]; do ... done loop with logic in the block to handle dangling symlinks.)

@hjoliver
Copy link
Member Author

Thanks for responding @matthewrmshin ! Good point on networked filesystems, makes sense. We might have to investigate a bit...

@dpmatthews
Copy link
Contributor

I suggest we leave in the || true fail-safes. There are other ways to handle the dangling symlinks if necessary (see #5978).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal. investigation question Flag this as a question for the next Cylc project meeting.
Projects
None yet
Development

No branches or pull requests

4 participants