Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sync): avoid symlink loops #2

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

aleclarson
Copy link

@aleclarson aleclarson commented Oct 21, 2020

Check the chain of directories that led to the current directory before reading its contents. Avoid reading the same directory twice when crawling deeper.

The realpathSync result is never passed to the callback, so all file paths are still inside the root directory.

Note: Only the sync version is fixed by this PR. The async version is not used by sirv, so I can't justify helping there.

Check the chain of directories that led to the current directory
before reading its contents. Avoid reading the same directory twice
when crawling deeper.

The `realpathSync` result is never passed to the callback, so all
file paths are still inside the root directory.
@lukeed
Copy link
Owner

lukeed commented Oct 21, 2020

Thank you(!) but this is breaking and cannot be accepted as is. This is because sync has Node 6.x support, and realPathSync.native was introduced in Node 9.x.

In general cases, it won't matter, but there are (unfortunately) still a huge number of popular npm packages that force/require Node 6 support.

I'll come up with another way to solve this within this PR

@lukeed
Copy link
Owner

lukeed commented Oct 21, 2020

Hey @aleclarson if you could check out the changes, would appreciate it!

This module has to have identical behavior between sync & async, so I added support there as well.

@aleclarson
Copy link
Author

Looks good, though there's one difference from my original PR. Your changes will skip a directory that has already been crawled even if crawling it again would not lead to infinite recursion.

For example, if both a/b and a/c are symlinks to the same directory, the callback will only get descendants of a/b even though crawling a/c too would not lead to infinite recursion.

@aleclarson
Copy link
Author

Actually, three more issues.

  1. The realpath result is used to check the cache, but is never cached itself.

  2. When a symlink points to a crawled directory, the callback is called with the symlink path.

  3. Your changes don't account for a linked directory being crawled before the real directory. For example, if a/b points to a/c (a real directory), the callback will get descendants for both a/b and a/c.

@aleclarson
Copy link
Author

aleclarson commented Oct 22, 2020

The realpath result is used to check the cache, but is never cached itself.

Gonna need to call realpath at the top of the function.

Your changes don't account for a linked directory being crawled before the real directory. For example, if a/b points to a/c (a real directory), the callback will get descendants for both a/b and a/c.

The !stats.isSymbolicLink() check causes this.

@lukeed
Copy link
Owner

lukeed commented Oct 24, 2020

Thanks @aleclarson – I applied your fixes, which means that it's basically back to your initial state 😅

My only hesitation with the PR right now is that it's now twice as slow as it was previously. This is because it has to "double-check" what its absolute path is on every invocation, which is something I was trying to avoid in my first pass.

@aleclarson
Copy link
Author

I guess it's time for you to learn Rust. :P

@lukeed
Copy link
Owner

lukeed commented Oct 24, 2020

Haha, I've done a couple things with it :D

When you get a chance, do you think you could add some test cases for the points you raised earlier? In case you couldn't already tell, I'm not very familiar with symlinks. Am currently reworking this (in a separate branch) and it'd be good to make sure all your concerns/points are still respected.

@lukeed
Copy link
Owner

lukeed commented Oct 24, 2020

Also, you invite me to your fork? I want to push a branch and PR into fix/symlink so I don't go messing this up again w/o your oversight 😅

@aleclarson
Copy link
Author

invite me to your fork

Done 👍

When you get a chance, do you think you could add some test cases for the points you raised earlier?

Don't think I can, as I'm busy with getting a product ready for launch. But I'll point out if you missed anything in the PR.

@lukeed
Copy link
Owner

lukeed commented Oct 25, 2020

Thank you! PR opened on your fork.

And totally understand, good luck! No rush either way – just appreciate your help on this :)

@vigneshshanmugam
Copy link

vigneshshanmugam commented Nov 4, 2020

Thanks for the awesome package @lukeed, Wouldnt it suffice to just change the fs.stat check with fs.lstat which should handle the symlink issue by itself? I have tested totalist with a large project and it seems to work correctly. Just curious on why do we need separate caching behaviour?

@aleclarson
Copy link
Author

I agree with @vigneshshanmugam. This library shouldn't be following symlinks at all.

@lukeed
Copy link
Owner

lukeed commented Mar 15, 2021

Ok, does that mean this PR is replaced by another that just does stat -> lstat?

This would be a breaking change (which I think I'm okay with) since it would mean that any symlink'd directories are no longer followed at all. Currently they are, which exposes the possible infinite loop this PR addresses.

@vigneshshanmugam
Copy link

Ok, does that mean this PR is replaced by another that just does stat -> lstat?

That was my proposal. Not sure if caching would be useful, but yeah if we can also add have that on top, sound good.

@aleclarson
Copy link
Author

I can't use uvu with libraries that depend on local dependencies via yarn link until this issue is fixed. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants