-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix non-blocking
option
#354
base: main
Are you sure you want to change the base?
Conversation
@golaz updates on diagnosing the issue: Where is the
|
Option | Expected behavior | Current behavior | Behaviors match? |
---|---|---|---|
--non-blocking on |
do not wait | do not wait | ✅ |
--non-blocking off (i.e., not non_blocking ) |
wait | do not wait | ❌ |
So... the code is suggesting we're always waiting, but #290 tells us that we're never waiting.
What are the call paths to
|
@TonyB9000 Thanks for agreeing to look into this more. See my comments above for what I've found so far. |
@forsyth2 OK. Ill probably conduct a series of tests between acme1 and chrysalis, to determine when and if the globus blocking can be effected (a "simplest driver"). If globus blocking CAN be effected, it should be possible to do a code-wise side-by-side. Note: Globus blocking/non-blocking should be independent of HPSS, right? I want to ensure that HPSS tape-drive does not send a weird "we're all done - nothing to see here" signal to the calling apparatus, the moment it gets invoked. |
@forsyth2 I cannot seem to git-clone the repo:
|
I believe so, but I'd need to go through the code to really check.
I think you need to delete a line in your ssh config file. I remember needing to do something like that, but I don't recall the exact file name. |
|
||
# This is the `--non-blocking` behavior, even though we did not specify it. | ||
|
||
# Now, with changes in this PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this PR mainly involve propagating if not non_blocking:
on the remaining globus_wait
not wrapped in a conditional. But I don't think that really matters, given the summary table in #354 (comment)
|
||
# In a different window: | ||
ls /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions/v2.NARRM.historical_0151/tests/zstash | ||
# 000000.tar 000001.tar 000002.tar 000003.tar 000004.tar 000005.tar index.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these tar files completed? empty?
|
||
# From https://docs.e3sm.org/zstash/_build/html/main/usage.html: | ||
# `--maxsize MAXSIZE`` specifies the maximum size (in GB) for tar files. The default is 256 GB. Zstash will create tar files that are smaller than MAXSIZE except when individual input files exceed MAXSIZE (as individual files are never split up between different tar files). | ||
# `--non-blocking` Zstash will submit a Globus transfer and immediately create a subsequent tarball. That is, Zstash will not wait until the transfer completes to start creating a subsequent tarball. On machines where it takes more time to create a tarball than transfer it, each Globus transfer will have one file. On machines where it takes less time to create a tarball than transfer it, the first transfer will have one file, but the number of tarballs in subsequent transfers will grow finding dynamically the most optimal number of tarballs per transfer. NOTE: zstash is currently always non-blocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, you want transfer to go as fast possible, so --non-blocking
make sense.
BUT there are situations where the disk is nearly full, so you want to block the transfers (and delete local cache) to avoid filling up the disk space.
=> So we want to hold up generating a new tarball until the previous tarball was transferred over (and its local copy is deleted)
Issue resolution
zstash
is always non-blocking #290Select one: This pull request is...
1. Does this do what we want it to do?
Objectives:
Required:
If applicable:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zstash/conda
, not just animport
statement.3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: