Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitize_filepath does not produce valid length filepaths #49

Open
7x11x13 opened this issue Jun 18, 2024 · 2 comments
Open

sanitize_filepath does not produce valid length filepaths #49

7x11x13 opened this issue Jun 18, 2024 · 2 comments

Comments

@7x11x13
Copy link
Contributor

7x11x13 commented Jun 18, 2024

Code to reproduce:

from pathvalidate import sanitize_filepath, validate_filepath

path = "/".join("a"*200 for _ in range(30))
print("length:", len(path))
sanitized = sanitize_filepath(path)
print("sanitized length:", len(sanitized))
validate_filepath(sanitized)

Output:

length: 6029
sanitized length: 6029
Traceback (most recent call last):
  ...
pathvalidate.error.ValidationError: [PV1101] found an invalid string length: file path is too long: expected<=260 bytes, actual=6029 bytes, platform=universal, fs_encoding=utf-8, byte_count=6,029

To be honest I'm not sure what the expected behavior should be, but I think at the least it should throw an error if it can't successfully sanitize the filepath.

@7x11x13
Copy link
Contributor Author

7x11x13 commented Jun 19, 2024

After some thought I think the default behavior should be to truncate the last component of the path so the entire path length falls within the accepted range, and if this is not possible, it should throw an error.

@thombashi
Copy link
Owner

thombashi commented Jul 27, 2024

Thank you for your feedback.
I have considered this in the past.
My conclusion at the time was that it can be difficult to sanitize the length of the filepath, so I just added the validate_after_sanitize argument.

For example, if the maximum length of the filepath is 260 bytes and the directory path length is 259 bytes, even if the filename is 200 bytes long, it must be truncated to 1 byte.
This means that most of the filename is lost, it would not be what users would expect in many cases.
That is why I do not truncate the filepath length in sanitize_filepath.
However, when validate_after_sanitize arg is True, an exception is raised in such cases.

The default value of validate_after_sanitize is False for now to keep backward compatibility. It would be good to set the default value to True in pathvalidate v4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants