-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Unstructured
to be backed by an iterator of bytes instead of a byte slice?
#103
Comments
It is more straightforward to derive the relationship between data passed in as a buffer and its use in code, than when it needs to go through the iterator machinery. The predecessor to It would be an interesting challenge/exercise to implement a virtually infinite buffer of virtual memory that is filled with data as accesses from
|
To be clear, I was thinking of something as simple as I guess what I don't understand is what relationship needs to be understood between the data and its use in code, if the unstructured data is truly arbitrary. This may be where my lack of knowledge of fuzzing methodology is holding me back. |
The I think you should be able to get thyour ultimately desired functionality in a performant manner by using the let (min, max) = T::size_hint(0);
let capacity = max.unwrap_or(min * 2);
let mut data = vec![0; capacity];
loop {
rng.fill(&mut data);
let u = Unstructured::new(&data);
let x = match MyType::arbitrary_take_rest(u) {
Ok(x) => x,
Err(arbitrary::Error::NotEnoughData) => {
// Double the buffer's size. Optionally have a max
// buffer size.
let new_len = data.len() * 2;
data.resize(new_len, 0);
continue;
}
Err(_) => {
// Just try again with new data. Optionally have a
// max number of retries.
continue;
}
};
// Do stuff with `x`...
} |
Interesting, this could work. Thinking it could be made ergonomic by creating a trait parallel to Arbitrary, with a similar API but which instead of taking |
In our use case, we are not using Arbitrary for fuzzing, but simply for creating arbitrary fixture values in tests. Currently we create a 10MB static
Vec<u8>
of random noise and use that as our Unstructured data. However this is annoying since it requires 10MB of memory overhead, and sometimes even then we run out of bytes.I am wondering if it would be valid to have two flavors of Unstructured, one backed by a byte slice, presumably for fuzzing, and one backed by an infinite iterator of bytes which can never be exhausted. I experimented with a PR for doing this and got some basic tests passing, but don't know how valid it is in the grand scheme. I did find that some functionality is indeed dependent on the fixed byte slice, so at the very least some extra UX effort would have to be made to provide slightly different interfaces for different Unstructured flavors.
I understand if this wouldn't be worth the effort but I guess I am primarily wondering from a motivational standpoint why Unstructured is backed by a byte array instead of an iterator, and only secondarily asking for some feedback on the feasibility of using infinite iterators. Note: I know next to nothing about fuzzing.
The text was updated successfully, but these errors were encountered: