Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding problems with directory listing #3

Open
ewerybody opened this issue Feb 19, 2024 · 0 comments
Open

encoding problems with directory listing #3

ewerybody opened this issue Feb 19, 2024 · 0 comments

Comments

@ewerybody
Copy link
Owner

Oh my. So I once thought (early 2022 when I built this) I have issues with some non ASCII file names.
So now I added a test that puts 🤗 into a name and Voilà it worked! I thought it's all good until I performed a scan on some other FTP space than the test area.
Turns out you can set the encoding on the ftplib FTP object and these emojis and stuff work no problem!

But when you have an é in a name:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 868: invalid continuation byte

But all good1! These can be remidied changing the encoding to latin-1. Yes? :D
Yes. But then emojies no longer work! 😫

https://stackoverflow.com/q/77089678/469322

solution

Honestly I don't know how we would solve this completely now.

A 1st step: Less directory listing!
I already modified our mkdirs function no so that it no longer looks up the parent dir of each part of the directory to be created to check for already existing.
Now It just fires the mkd and catches ftplib.error_perm with error code 550. This code is not specifically for "already exists" but close enough. And its faster as well!
Listing for each part of the path is rather expensive.

So under normal circumstances we no longer do directory listing at all on the FTP! 🙌
But we can! Maybe we should drop the option entirely. But then we'd still need a unittest that verifies that update still works with weird file names.

For kicks I just created a file named tëstfilé🤗.txt to trip off ANY encoding :D
When uploading it with WinSCP it turned into tëstfilé??.txt on the server and when copying back tëstfilé%3F%3F.txt
so not event THEY have it solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant