Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unlink: optional sleep after calling client-to-server unlink rpc #745

Merged
merged 2 commits into from
Jan 24, 2023

Conversation

adammoody
Copy link
Collaborator

@adammoody adammoody commented Dec 5, 2022

In testing PnetCDF, some tests fail when creating a file after deleting a file by the same name (see #744). As a work around, this adds an optional sleep immediately after a client calls the client-to-server unlink rpc to give the unlink operation more time to complete before the client returns from its call to unlink().

To enable this option, one can set a new config parameter:

export UNIFYFS_CLIENT_UNLINK_USECS=1000000

For the first test case that was failing, which was a serial program (single-process MPI job), a value of 1000000 (1 second) was sufficient. Higher sleep times may be required for parallel jobs.

This is a hack, but it helps for now.

A better fix would be to implement a mode where the unlink() wrapper blocks at the calling client until all servers have indicated that the unlink operation has completed. That may require a round trip between each server with each of its clients, since each client has to do some work to support unlink. That change will be a more substantial effort, and so it is saved for future work. Once added, this particular work around could be removed.

Description

Motivation and Context

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Testing (addition of new tests or update to current tests)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the UnifyFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted.

@adammoody adammoody force-pushed the unlink_sleep branch 3 times, most recently from 8c3ef64 to abb0cc1 Compare December 5, 2022 17:18
Copy link
Member

@CamStan CamStan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adammoody!

@adammoody
Copy link
Collaborator Author

Thanks for taking a look, @CamStan !

@adammoody adammoody merged commit 81f7426 into LLNL:dev Jan 24, 2023
@adammoody adammoody deleted the unlink_sleep branch January 24, 2023 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants