Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symlink behavior #343

Merged
merged 1 commit into from
Aug 16, 2024
Merged

Fix symlink behavior #343

merged 1 commit into from
Aug 16, 2024

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Aug 2, 2024

Fix symlink behavior. Resolves #341.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 2, 2024

Current behavior, for reference:

symlinks1.sh:

# Test symlinks
# Adjusted from https://github.com/E3SM-Project/zstash/issues/341

mkdir workdir
mkdir workdir2
cd workdir
mkdir -p src/d1 src/d2
touch src/d1/large_file.txt

# This creates a symlink in d2 that links to a file in d1
# Notice absolute path is used for source
ln -s /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt src/d2/large_file.txt

ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 15:35 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt

zstash create --hpss=none --follow-symlinks --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 src/d2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: Gathering list of files to archive
# INFO: Creating new tar archive 000000.tar
# INFO: Archiving large_file.txt
# INFO: tar name=000000.tar, tar size=10240, tar md5=6d058418bf7d409e46fd5a1f02da9440
# INFO: put: HPSS is unavailable
# INFO: put: Keeping tar files locally and removing write permissions
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' original mode=b"'644'"
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' new mode=b"'444'"
# INFO: put: HPSS is unavailable
# total 0

ls -l  src/d2
# -rw-r--r-- 1 ac.forsyth2 cels 0 Aug  2 15:35 large_file.txt
# Notice the file is no longer a symlink

cd ../workdir3
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar exists. Checking expected size matches actual size.
# INFO: Opening tar archive /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar
# INFO: Extracting large_file.txt
# INFO: No failures detected when extracting the files. If you have a log file, run "grep -i Exception <log-file>" to double check.

$ ls
large_file.txt

symlinks2.sh:

# Test symlinks
# Adjusted from https://github.com/E3SM-Project/zstash/issues/341

mkdir workdir
mkdir workdir2
cd workdir
mkdir -p src/d1 src/d2
touch src/d1/large_file.txt

# This creates a symlink in d2 that links to a file in d1
# Notice absolute path is used for source
ln -s /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt src/d2/large_file.txt

ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 15:42 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt

# Difference from symlinks1.sh -- not using follow-symlinks
zstash create --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 src/d2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: Gathering list of files to archive
# INFO: Creating new tar archive 000000.tar
# INFO: Archiving large_file.txt
# INFO: tar name=000000.tar, tar size=10240, tar md5=113bc8544f6d8ad3301be8e86d15396f
# INFO: put: HPSS is unavailable
# INFO: put: Keeping tar files locally and removing write permissions
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' original mode=b"'644'"
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' new mode=b"'444'"
# INFO: put: HPSS is unavailable
# total 1

ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 15:42 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt
# Notice the file is still a symlink

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 2, 2024

Change to:

    if follow_symlinks and os.path.islink(file_name):
        linked_file_name = os.path.realpath(file_name)
        #os.remove(file_name)  # Remove symbolic link and create a hard copy
        #shutil.copy(linked_file_name, file_name)
        file_name = linked_file_name

symlinks1.sh now:

# Test symlinks
# Adjusted from https://github.com/E3SM-Project/zstash/issues/341

mkdir workdir workdir2 workdir3
cd workdir
mkdir -p src/d1 src/d2
touch src/d1/large_file.txt

# This creates a symlink in d2 that links to a file in d1
# Notice absolute path is used for source
ln -s /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt src/d2/large_file.txt
ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 16:22 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt

zstash create --hpss=none --follow-symlinks --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 src/d2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: Gathering list of files to archive
# INFO: Creating new tar archive 000000.tar
# INFO: Archiving large_file.txt
# INFO: tar name=000000.tar, tar size=10240, tar md5=fea1fddc08b1ad33447f0f7f6b62cb9c
# INFO: put: HPSS is unavailable
# INFO: put: Keeping tar files locally and removing write permissions
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' original mode=b"'644'"
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' new mode=b"'444'"
# INFO: put: HPSS is unavailable
# total 0

ls -l  src/d2
# -rw-r--r-- 1 ac.forsyth2 cels 0 Aug  2 16:22 large_file.txt
# Notice src is unaffected

cd ../workdir3
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar exists. Checking expected size matches actual size.
# INFO: Opening tar archive /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar
# INFO: Extracting large_file.txt
# INFO: No failures detected when extracting the files. If you have a log file, run "grep -i Exception <log-file>" to double check.

ls gpfs/fs1/home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src
# d1
# notice no d2 directory

ls gpfs/fs1/home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt 
# gpfs/fs1/home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt

So, with that change large_file.txt is unaffected in src, however it also ends up extracted as 1) many subdirectories deep in d1, and 2) missing the symlink in d2.

EDIT: actually, we're only archiving src/d2 on the zstash create step, so really it's just placing the one actual file (not symlink) way far down a directory path once extracted.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 2, 2024

With this diff:

--- a/zstash/hpss_utils.py
+++ b/zstash/hpss_utils.py
@@ -98,7 +98,7 @@ def add_files(
                 do_hash = False
             tarFileObject = HashIO(os.path.join(cache, tfname), "wb", do_hash)
             # FIXME: error: Argument "fileobj" to "open" has incompatible type "HashIO"; expected "Optional[IO[bytes]]"
-            tar = tarfile.open(mode="w", fileobj=tarFileObject)  # type: ignore
+            tar = tarfile.open(mode="w", fileobj=tarFileObject, dereference=follow_symlinks)  # type: ignore
 
         # Add current file to tar archive
         current_file: str = files[i]
@@ -179,10 +179,11 @@ def add_file(
 
     # FIXME: error: "TarFile" has no attribute "offset"
     offset: int = tar.offset  # type: ignore
-    if follow_symlinks and os.path.islink(file_name):
-        linked_file_name = os.path.realpath(file_name)
-        os.remove(file_name)  # Remove symbolic link and create a hard copy
-        shutil.copy(linked_file_name, file_name)
+    # if follow_symlinks and os.path.islink(file_name):
+    #     linked_file_name = os.path.realpath(file_name)
+    #     os.remove(file_name)  # Remove symbolic link and create a hard copy
+    #     shutil.copy(linked_file_name, file_name)
+    #     #file_name = linked_file_name
     tarinfo: tarfile.TarInfo = tar.gettarinfo(file_name)
     # Change the size of any hardlinks from 0 to the size of the actual file
     if tarinfo.islnk():

We get:

# Test symlinks
# Adjusted from https://github.com/E3SM-Project/zstash/issues/341

mkdir workdir workdir2 workdir3
cd workdir
mkdir -p src/d1 src/d2
touch src/d1/large_file.txt

# This creates a symlink in d2 that links to a file in d1
# Notice absolute path is used for source
ln -s /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt src/d2/large_file.txt
ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 16:50 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt

zstash create --hpss=none --follow-symlinks --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 src/d2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: Gathering list of files to archive
# INFO: Creating new tar archive 000000.tar
# INFO: Archiving large_file.txt
# INFO: tar name=000000.tar, tar size=10240, tar md5=b51fab8640dc26029eb01b46c2bcf04f
# INFO: put: HPSS is unavailable
# INFO: put: Keeping tar files locally and removing write permissions
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' original mode=b"'644'"
# INFO: '/home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar' new mode=b"'444'"
# INFO: put: HPSS is unavailable
# total 1

ls -l  src/d2
# lrwxrwxrwx 1 ac.forsyth2 cels 71 Aug  2 16:50 large_file.txt -> /home/ac.forsyth2/ez/zstash/tests/scripts/workdir/src/d1/large_file.txt
# Notice src is unaffected, still has a symlink

cd ../workdir3
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2
# For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
# INFO: /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar exists. Checking expected size matches actual size.
# INFO: Opening tar archive /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2/000000.tar
# INFO: Extracting large_file.txt
# INFO: No failures detected when extracting the files. If you have a log file, run "grep -i Exception <log-file>" to double check.

ls workdir3
# large_file.txt

@TonyB9000
Copy link
Collaborator

That looks perfect! I can test it over my test directories, however you want to make it available for testing.

@forsyth2 forsyth2 force-pushed the issue-341-symlinks branch from 0441071 to 30dbb16 Compare August 2, 2024 22:28
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 2, 2024

@TonyB9000 Great, let me know if everything works as expected/desired in your testing. If that goes well, I'll run the zstash tests and make sure it's good to merge.

I just committed the changes on this branch. To try the new code out yourself:

# You might already have a zstash directory. If not, run:
git clone [email protected]:E3SM-Project/zstash.git

# Fetch my changes
git fetch origin issue-341-symlinks

# Create the conda environment
# I'm assuming you have some way of using conda/mamba
mamba clean --all
mamba env create -f conda-env/dev.yml -n zstash_dev_symlinks
conda activate zstash_dev_symlinks

# Install the code changes in that environment
pip install .

# Now, use zstash as you normally would.

@TonyB9000
Copy link
Collaborator

Most Excellent Ryan! Yes, I will be able to do these and retest! What I needed was the git fetch origin issue-341-symlinks command. Cheers!

@TonyB9000
Copy link
Collaborator

@forsyth2 Minor issue? I renamed my old zstash repo so I could "git clone" the latest, but got a pile of "Warning - MITM possible, etc.) It would not "clone". So I went back to my previous zstash repo and did the git fetch origin issue-341-symlinks and created a new environment. It worked, but says:

Installing collected packages: zstash
  Attempting uninstall: zstash
    Found existing installation: zstash 1.4.3
    Uninstalling zstash-1.4.3:
      Successfully uninstalled zstash-1.4.3
Successfully installed zstash-1.4.1

So I'm thinking the "fetch" pulled down into an old version. I may wait until Monday to see if I can straighten this out.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 5, 2024

Hmm I get this from pip install .

Installing collected packages: zstash
  Attempting uninstall: zstash
    Found existing installation: zstash 1.4.3
    Uninstalling zstash-1.4.3:
      Successfully uninstalled zstash-1.4.3
Successfully installed zstash-1.4.3

I think it just says installed whatever the last released version was. Nevertheless, that should be 1.4.3, not 1.4.1. I might suggest just doing git fetch origin. I've noticed for whatever reason that sometimes specifying/not specifying the branch may change if it gets fetched...

In any case run git log to confirm the last commit. For me:

$ git log
commit 30dbb16566ce5775f86526b9a3a5e84b3290af01 (HEAD -> issue-341-symlinks, upstream/issue-341-symlinks)
Author: Ryan Forsyth <[email protected]>
Date:   Fri Aug 2 15:53:53 2024 -0500

    Fix symlink behavior

commit 92447d2b5438cd1c59b37bbbde5e21e4ba300db8 (upstream/main, main)
Author: forsyth2 <[email protected]>
Date:   Thu Apr 11 10:04:21 2024 -0700

    Bump to 1.4.3 (#334)

@TonyB9000
Copy link
Collaborator

TonyB9000 commented Aug 6, 2024

@forsyth2 I am having trouble accessing your changed version. I did a fresh git clone, obtained version 1.4.3, and then issued

git fetch origin issue-341-symlinks

and the response was

From https://github.com/E3SM-Project/zstash
 * branch            issue-341-symlinks -> FETCH_HEAD

I then did the

mamba clean --all
mamba env create -f conda/dev.yml -n zstash_dev_symlinks
mamba activate zstash_dev_symlinks
and
pip install .

But in testing with
zstash create --hpss=none --cache /home/bartoletti1/test/zstash/MyNewTestArchive --follow-symlinks SRC/dirA/dirAA/

I am getting the old behavior (links in SRC clobbered with copied files). Moreover, when I issue "git log", the latest entries read

commit ef87eafa274bc76dd8d383dcdad50660622eae0c (HEAD -> main, origin/main, origin/HEAD)
Author: forsyth2 <[email protected]>
Date:   Fri Aug 2 12:55:31 2024 -0700

    Add DOI badge (#342)

commit 92447d2b5438cd1c59b37bbbde5e21e4ba300db8 (tag: v1.4.3)
Author: forsyth2 <[email protected]>
Date:   Thu Apr 11 10:04:21 2024 -0700

    Bump to 1.4.3 (#334)

I must be doing it wrong ...

When I issue "git branch", all I see is "main".

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 6, 2024

@TonyB9000 Oh I see the problem, I left out the all-important step of actually checking out the branch:

git fetch origin issue-341-symlinks
git checkout -b your-own-branch origin/issue-341-symlinks

@TonyB9000
Copy link
Collaborator

@forsyth2 That works much better... (I've never checked out a remote branch before).

It seems to run correctly, created the (--cache) archive

/home/bartoletti1/test/zstash/MyNewTestArchive/

contains

-r--r--r--. 1 bartoletti1 publishers 1996800 Aug  6 15:18 000000.tar
-rw-r--r--. 1 bartoletti1 publishers   20480 Aug  6 15:18 index.db

But (in the /home/bartoletti1/test/zstash/ directory) if I issue

zstash extract --hpss=none --cache /home/bartoletti1/test/zstash/MyNewTestArchive *

I get

INFO: No matches for DST
INFO: No matches for MyNewTestArchive
INFO: No matches for theSRC.tar
Traceback (most recent call last):
  File "/home/bartoletti1/mambaforge/envs/zstash_dev_symlinks/bin/zstash", line 8, in <module>
    sys.exit(main())
  File "/home/bartoletti1/mambaforge/envs/zstash_dev_symlinks/lib/python3.9/site-packages/zstash/main.py", line 69, in main
    extract()
  File "/home/bartoletti1/mambaforge/envs/zstash_dev_symlinks/lib/python3.9/site-packages/zstash/extract.py", line 45, in extract
    failures: List[FilesRow] = extract_database(args, cache, keep_files)
  File "/home/bartoletti1/mambaforge/envs/zstash_dev_symlinks/lib/python3.9/site-packages/zstash/extract.py", line 245, in extract_database
    raise FileNotFoundError("There was nothing to extract.")
FileNotFoundError: There was nothing to extract.

Not sure what to try next.

@TonyB9000
Copy link
Collaborator

It works if you don't use just a wildcard. zstash ls -l gives

(zstash_dev_symlinks) -bash-4.2$ zstash ls -l --hpss=none *f*
For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
id      name    size    mtime   md5     tar     offset
1       v1/f1   647094  2024-08-01 21:51:31     a49ea75d33b261d3cd69e26cdcc1d108        000000.tar      0
2       v1/f2   4464    2024-08-01 21:52:23     3285ca7ca93fc10f0b63747fb126e23e        000000.tar      648704
3       v1/f3   5230    2024-08-01 21:52:49     2b1a1e638467b75efc080227f8909675        000000.tar      654848
4       v2/f1   647094  2024-08-01 21:51:31     a49ea75d33b261d3cd69e26cdcc1d108        000000.tar      662016
5       v2/f2   4464    2024-08-01 21:52:23     3285ca7ca93fc10f0b63747fb126e23e        000000.tar      1310720
6       v2/f3   5241    2024-08-02 01:32:17     9157c4303ddf42870356bdcb95cd0612        000000.tar      1316864
7       v3/f1   647094  2024-08-01 21:51:31     a49ea75d33b261d3cd69e26cdcc1d108        000000.tar      1324032
8       v3/f2   4475    2024-08-02 20:02:15     59302555d12f34174972ce6046613e71        000000.tar      1972736
9       v3/f3   5247    2024-08-02 01:32:57     04288f42fcf4b1bc5487bbccf23a44d4        000000.tar      1978880


@TonyB9000
Copy link
Collaborator

@forsyth2 How do you "extract everything" (or "ls -l everything")? It crashed when given just *

@TonyB9000
Copy link
Collaborator

TonyB9000 commented Aug 6, 2024

@forsyth2 Everything works perfectly. The source directories are untouched, and the extracted archive contains the correct file content in each directory.

I just wish "extract *" (or ls -l *) would work. Even "*/* fails as a file pattern. Is that intentional?

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 6, 2024

Using my latest iteration of test script:

# These work
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 *
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 "*"

# Does not work, since there are no subdirectories in my test
zstash extract --hpss=none --cache /home/ac.forsyth2/ez/zstash/tests/scripts/workdir2 */*

Note https://docs.e3sm.org/zstash/_build/html/main/usage.html#extract specifies for [files]:

  • Leave empty to extract all the files.
  • List of files with support for wildcards. Please note that any expression containing wildcards should be enclosed in double quotes (“…”) to avoid shell substitution.
  • Names of specific tar archives to extract all files within these tar archives.

@TonyB9000
Copy link
Collaborator

TonyB9000 commented Aug 7, 2024

@forsyth2 Excellent! I should have used quotes on my wildcard expression. Did not know "empty" = "ALL".

Let me know when I can employ a new release version, rather than "pip install" the test branch. BTW I like your practice of putting the issue# in the branch name - I often neglect to create "issues".

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 9, 2024

Let me know when I can employ a new release version

@TonyB9000 That will probably not be for a while, unless we do a patch release before Unified is released in the winter. Is waiting until the next Unified ok or should we do a patch release?

In any case, I still need to double check the unit tests on these changes before merging. Then at least it will be on main.

@forsyth2 forsyth2 force-pushed the issue-341-symlinks branch from 8f06327 to c927973 Compare August 16, 2024 00:43
@forsyth2 forsyth2 marked this pull request as ready for review August 16, 2024 00:44
@forsyth2 forsyth2 merged commit 6894798 into main Aug 16, 2024
3 checks passed
@forsyth2
Copy link
Collaborator Author

@TonyB9000 I've merged this change into main, but you'll still only have access to it in a development environment until the next zstash/Unified release.

@TonyB9000
Copy link
Collaborator

Awesome! SO I can "git (re)clone" the repo, and have everything up-to date. I just need to remember to "pip install" in any environment where I need it. Thanks!

@forsyth2 forsyth2 deleted the issue-341-symlinks branch October 31, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: zstash with --follow-symlinks clobbers the source directory
2 participants