Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with buffered stream reads #184

Closed
jeremy-visionaid opened this issue Oct 7, 2024 · 13 comments
Closed

Crash with buffered stream reads #184

jeremy-visionaid opened this issue Oct 7, 2024 · 13 comments
Milestone

Comments

@jeremy-visionaid
Copy link
Collaborator

jeremy-visionaid commented Oct 7, 2024

Reading from a StreamDecorator using a BufferedStream may throw an ArgumentOutOfRangeException:

            using CompoundFile cf = new(CFSVersion.Ver_4, CFSConfiguration.Default);

            CFStream cfStream = cf.RootStorage.AddStream("MyStream");
            using Stream stream = cfStream.AsIOStream();
            const int BufferLength = 4096 * 21;
            var buffer = Helpers.GetBuffer(BufferLength);
            stream.Write(buffer, 0, buffer.Length);
            stream.Position = 0;

            Assert.AreEqual(BufferLength, cfStream.Size);

            using MemoryStream memoryStream = new();
            using BufferedStream bufferedStream = new(stream);
            bufferedStream.CopyTo(memoryStream);
    ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument argument, ExceptionResource resource)
    List`1.get_Item(Int32 index)
    StreamView.Read(Byte[] buffer, Int32 offset, Int32 count) line 119
    CompoundFile.ReadData(IDirectoryEntry de, Int64 position, Byte[] buffer, Int32 offset, Int32 count) line 2239
    CFStream.Read(Byte[] buffer, Int64 position, Int32 offset, Int32 count) line 201
    StreamDecorator.Read(Byte[] buffer, Int32 offset, Int32 count) line 69
    BufferedStream.Read(Byte[] array, Int32 offset, Int32 count)
@jeremy-visionaid
Copy link
Collaborator Author

@ironfede I've got some various PRs to start working towards fixing this. A minimal repro is available on this branch for now:

https://github.com/Visionaid-International-Ltd/openmcdf/tree/buffered-read-crash

@jeremy-visionaid
Copy link
Collaborator Author

jeremy-visionaid commented Oct 8, 2024

@ironfede Oh dear, this turns out like it might quite a big/tough one... If I fix StreamView.Read so that it behaves correctly (i.e. truncate reads that would go past the end of the stream), then a whole bunch of the unit test fail. I'm suspicious that the root cause of this may be because some of the test files are corrupt, and unfortunately, reading/modifying them may be dependent on allowing reads past the end of a sector chain.

Experimenting with some of the tests, I can get them to pass by creating roughly equivalent files using the reference implementation from #166 (though I need to verify that the equivalence is fair, e.g. mini sector streams are still mini sector streams). I might push some changes for the StructuredStorageExplorer to help validate this.

Some thoughts:

  1. Some test files may need to be recreated (preferably using the reference implementation)
  2. The files that OpenMcdf creates could be checked against the reference implementation
  3. StreamRW could check the reads results to make sure the end of stream isn't being reached (like BinaryReader does)
  4. The existing code could be made more robust to corrupt files (i.e. raise CFCorruptedFileException rather than ArgumentOutOfRangeException)

I'll try and push a few PRs to try and improve validation etc. before moving to fix the core problem.

This was referenced Oct 8, 2024
@ironfede
Copy link
Owner

ironfede commented Oct 8, 2024

@ironfede Oh dear, this turns out like it might quite a big/tough one... If I fix StreamView.Read so that it behaves correctly (i.e. truncate reads that would go past the end of the stream), then a whole bunch of the unit test fail. I'm suspicious that the root cause of this may be because some of the test files are corrupt, and unfortunately, reading/modifying them may be dependent on allowing reads past the end of a sector chain.

Experimenting with some of the tests, I can get them to pass by creating roughly equivalent files using the reference implementation from #166 (though I need to verify that the equivalence is fair, e.g. mini sector streams are still mini sector streams). I might push some changes for the StructuredStorageExplorer to help validate this.

Some thoughts:

  1. Some test files may need to be recreated (preferably using the reference implementation)
  2. The files that OpenMcdf creates could be checked against the reference implementation
  3. StreamRW could check the reads results to make sure the end of stream isn't being reached (like BinaryReader does)
  4. The existing code could be made more robust to corrupt files (i.e. raise CFCorruptedFileException rather than ArgumentOutOfRangeException)

I'll try and push a few PRs to try and improve validation etc. before moving to fix the core problem.

Super Thank you @jeremy-visionaid. Could you please (if and when possible) to better explain me the issue pointing to the code portion with the wrong behaviour? It would help me to understand. There's another important point. Is it possible to make a retro-compatible fix in order not to require regeneration of files but only future-in-line correction of them not forcing to re-elaborate them? Many thanks.
Federico

@jeremy-visionaid
Copy link
Collaborator Author

@ironfede I'm still puzzling over it at the moment - I'm not 100% certain that they're corrupt, it could still just be a bug in reading, or possibly both.

Essentially, if a you write a stream just shy of MinSizeStandardStream (e.g. 4095 bytes), then GetMiniSectorChain will try and read past the end of the mini-FAT view (and that currently succeeds since StreamView doesn't prevent it). #185 at least ensures that the end of stream is detected early if/when StreamView correctly truncates the reads.

If you can take a look at the other pull requests and merge them, then I could push some bits tomorrow which would allow you to more easily reproduce the error.

@ironfede
Copy link
Owner

ironfede commented Oct 8, 2024

@ironfede I'm still puzzling over it at the moment - I'm not 100% certain that they're corrupt, it could still just be a bug in reading, or possibly both.

Essentially, if a you write a stream just shy of MinSizeStandardStream (e.g. 4095 bytes), then GetMiniSectorChain will try and read past the end of the mini-FAT view (and that currently succeeds since StreamView doesn't prevent it). #185 at least ensures that the end of stream is detected early if/when StreamView correctly truncates the reads.

If you can take a look at the other pull requests and merge them, then I could push some bits tomorrow which would allow you to more easily reproduce the error.

I'm going to merge them this afternoon. Only one thing: I've noticed you used a ver.4 file in the test with buffered reader. Is that specific to the issue? Many thanks

@jeremy-visionaid
Copy link
Collaborator Author

I'm going to merge them this afternoon. Only one thing: I've noticed you used a ver.4 file in the test with buffered reader. Is that specific to the issue? Many thanks

Thanks! I don't think it's specific to v4, but our app uses v4 so I just made the smallest equivalent thing to show the failure. If I fix the particular problem with the buffered reads crashing, then it causes it to fail somewhere else. Then I can repro that failure like so with v3:

        [TestMethod]
        [DataRow(4095)]
        public void Test_INCREMENTAL_SIZE_MULTIPLE_WRITE_AND_READ_CFS(int size)
        {
            SingleWriteReadMatching(size);
        }
__Error.EndOfFile()
BinaryReader.FillBuffer(Int32 numBytes)
BinaryReader.ReadInt32()
CompoundFile.GetMiniSectorChain(Int32 secID) line 1419
CompoundFile.GetSectorChain(Int32 secID, SectorType chainType) line 1443
CompoundFile.WriteData(CFItem cfItem, Byte[] buffer, Int64 position, Int32 offset, Int32 count) line 2123
CompoundFile.WriteData(CFItem cfItem, Int64 position, Byte[] buffer) line 2094
CompoundFile.WriteData(CFItem cfItem, Byte[] buffer) line 2138
CFStream.SetData(Byte[] data) line 48
CFStreamTest.SingleWriteReadMatching(Int32 size) line 588
CFStreamTest.Test_INCREMENTAL_SIZE_MULTIPLE_WRITE_AND_READ_CFS(Int32 size) line 501

@Numpsy
Copy link
Contributor

Numpsy commented Oct 8, 2024

I'm suspicious that the root cause of this may be because some of the test files are corrupt,

There are some corupt test files, which were added specifically to test the behavior with bad files as there have been bugs with things like OutOfMemoryExceptions before - e.g #30

@jeremy-visionaid
Copy link
Collaborator Author

@Numpsy Ah, yup, thanks, I have seen those. I already submitted some correctness fixes for the tests for those things as part of #167. This one is because #88 wasn't quite fixed correctly. However, if I "fix" the fix, then it causes faults elsewhere to manifest as actual failures. I've done some preparatory work in #185 so that we can be reasonably assured we find out early about attempts to read past the end of a stream, I'll push a branch shortly so we can repro both problems.

@jeremy-visionaid
Copy link
Collaborator Author

@Numpsy BTW, I also fixed the fix for #30 in #181. It was only actually validating part of the tree previously. Though there is still room for improvement 😄

@ironfede
Copy link
Owner

ironfede commented Oct 8, 2024

Ok, I've found a lot of cascading issues higlighted by the strict validation. It's honestly a little bit complicated to explain but one of the root causes is a problem in updating the value in header.MiniFATSectorsNumber. This was not so important since correctness of loaded structures prevents "real" over-read of streams and ministreams. But since validation is based on stream-size values we have a lot of issues arising in a spot way. I have also found an issue with read request with larger-than-size count values that was not handled correctly... I'm trying to get out from this beautiful scenario...

@jeremy-visionaid
Copy link
Collaborator Author

@ironfede @Numpsy OK, I've pushed a few bits to a branch here:
https://github.com/Visionaid-International-Ltd/openmcdf/tree/buffered-copy

There are a few parts:

First is a minimal repro of the failiure I was observing. We don't want to merge this as is, the copy buffer sizes are dependent on various factors, so it's not really deterministic and might give different results on different platforms. I'll write a more targeted test to replace it and the test from #89
Visionaid-International-Ltd@9da5033

Second is an improved test for CFStream reads/writes that limit the buffer sizes to possible problematic boundaries. That could be cherry-picked since it speeds up testing and allows easy and reproducible debugging for particular values in Test Explorer:
Visionaid-International-Ltd@560ca59

Last is a somewhat minimal change to truncate the reads from StreamView. While I believe the change is reasonable on its own, it does cause a whole bunch of other failures:
Visionaid-International-Ltd@b7cf828

I haven't yet been able to show conclusievly whether the CFB files are actually written correctly. I have a local branch with a new test project that uses the reference implementation from #166 but I haven't done much with it yet.

@jeremy-visionaid
Copy link
Collaborator Author

@ironfede A bit of an aside, but while looking at this I found the StreamView.Position property isn't quite being handled/validated correctly... #187

@jeremy-visionaid
Copy link
Collaborator Author

It just struck me that there's a really simple workaround for this... Just truncate the reads from the StreamDecorator and leave the StreamView alone. I just tested it and it works OK for my purposes. I'll clean up the tests and put in a PR for it. It is sufficient to be able to close this ticket - we could address the other underlying problems as part of another ticket. It looks to me like it would be easier to fix the underlying problems as part of v3.

jeremy-visionaid added a commit to Visionaid-International-Ltd/openmcdf that referenced this issue Oct 10, 2024
@ironfede ironfede added this to the 2.4.0 milestone Oct 10, 2024
ironfede added a commit that referenced this issue Oct 10, 2024
Fix #184 Fix Size calculations Fix Stream Size validations
jeremy-visionaid added a commit to Visionaid-International-Ltd/openmcdf that referenced this issue Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants