Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web UI: Explicitly strip newlines from chunks when uploading #1267

Closed
wants to merge 4 commits into from

Conversation

rdmark
Copy link
Member

@rdmark rdmark commented Oct 25, 2023

No description provided.

@rdmark rdmark marked this pull request as draft October 25, 2023 11:08
@rdmark rdmark force-pushed the rdmark-issue-1157 branch 2 times, most recently from fbd9951 to ac6dd8e Compare October 25, 2023 11:58
@rdmark rdmark marked this pull request as ready for review October 25, 2023 12:00
@uweseimet
Copy link
Contributor

uweseimet commented Oct 25, 2023

@rdmark Just a question on this. Please bear with me if this question does not apply for this ticket, because I have not digged deeper into this.
In general, issues caused by newlines are typical for cases where the upload protocol/format is not a binary format. A good example is ftp, where you can choose between text/ASCII and binary transfer format. If you transfer binaries with text format enabled you run into very similar issues with ftp like the one this ticket is addressing. Can it be that there is a general problem with uploading binaries in the web UI, e.g. because the code or the libraries used are trying to deal with text instead of binary format? Where do the newlines at the end of a chunk come from?

One more thing: When you strip a newline, how do you know that the newline (the binary representation of it) does not belong there? Depending on how you define a newline (depending on the platform it can be CR, LF or CR/LF), there can be chunks that happen to have these characters at their end. Stripping them in this case would corrupt the uploaded data.

@nucleogenic nucleogenic linked an issue Oct 26, 2023 that may be closed by this pull request
Copy link
Member

@nucleogenic nucleogenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at what's been discussed in the ticket, the change seems reasonable enough. It would be better if we could understand why the issue isn't reproducible between users though. There's seemingly a component to this problem we don't understand 🤔

@uweseimet
Copy link
Contributor

uweseimet commented Oct 26, 2023

@nucleogenic I would have appreciated if somebody had explained why my concerns are not justified and a sequence of line feed characters that is actually a valid part of the binary cannot just be stripped. As long as one does not fully understand a problem one cannot resolve it.

@rdmark
Copy link
Member Author

rdmark commented Oct 26, 2023

@uweseimet Yes, It's a mystery where the CRLF comes from. Your theory sounds believable, that somewhere in the software stack the data is treated as text and not binary.

In the python code we create the file object in binary append mode, as per the documentation: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

The one thing I can imagine is that a corner case bug with dropzone.js and the particular version of Chrome that the reporter is using.

I'm hoping that we can get some more feedback and test data from the user and then decide what to do next.

@uweseimet
Copy link
Contributor

uweseimet commented Oct 26, 2023

@rdmark In order to reproduce this problem, what about creating an image file that only contains sequences of CR, CR/LF and LF and that is big enough to require several chunks? For testing the upload there is no need for an image with valid filesystems.
If there is an issue with binary/ASCII format, it might also depend on the charsets/locales used on the client and the server. If this reproduce the problem it helps resolving it. If there is no problem with this approach (and without changing anything in piscsi) there may not be a problem at all, or it is, just like you say, something related to issues with a user's setup. At least as long as only a single user has this problem.

Edit: In case you run such a test, it's best to use three images. Only only with CR characters, the second only with LF, and the third only with CR/LF. This way you will definitely have chunks of the type that might be causing issues.
Creating such a file should work with dd, e.g. similar to

dd if=/dev/zero bs=1024 count=1 | tr '\000' '1' > test.img

With bs the size of the file to create, and the second argument of tr the character the file shall consist of.
Such a file also shows the problem I tried to point out: How do you know which of the characters to strip and which not? Just saying "the ones at the end of each chunk" is IMHO not correct. Any resolution for this ticket would have to ensure that such files are uploaded without being modified.

@rdmark rdmark marked this pull request as draft October 28, 2023 11:40
@rdmark
Copy link
Member Author

rdmark commented Oct 29, 2023

@uweseimet A file consisting of nothing but CRLF does indeed get corrupted with the code in this PR. (And works before the change.)

One idea I had, is to add a check for the size of a transferred chunk, and strip trailing line endings only when the actual size exceeds the expected size. Thoughts?

@uweseimet
Copy link
Contributor

uweseimet commented Oct 29, 2023 via email

@rdmark
Copy link
Member Author

rdmark commented Oct 29, 2023

I added some debug output and will ask the original reporter to test once they respond. Logging should at least tell us if the chunk size and offsets are correct.

@uweseimet
Copy link
Contributor

uweseimet commented Oct 29, 2023 via email

@rdmark rdmark force-pushed the rdmark-issue-1157 branch 2 times, most recently from d8aee1d to 698524b Compare November 11, 2023 02:16
Copy link

sonarcloud bot commented Nov 11, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
5.7% 5.7% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@uweseimet
Copy link
Contributor

@rdmark Correct me if I am wrong: The latest comments in #1157 say that there is no problem at all, don't they?

@rdmark
Copy link
Member Author

rdmark commented Nov 14, 2023

I want to ask the original reporter to try one more thing: the current develop code... just to make sure that the adding of the logging didn't have an effect on their corner case.

Edit: never mind, I see now that they already tested that!

@rdmark rdmark closed this Nov 14, 2023
@rdmark rdmark deleted the rdmark-issue-1157 branch November 14, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File uploads via webui are corrupted
3 participants