Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallible collection allocation 1.0 #2116

Merged
merged 2 commits into from
Feb 7, 2018

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Aug 18, 2017

Add minimal support for fallible allocations to the standard collection APIs. This is done in two ways:

  • For users with unwinding, an oom=panic configuration is added to make global allocators panic on oom.
  • For users without unwinding, a try_reserve() -> Result<(), CollectionAllocErr> method is added.

The former is sufficient to unwinding users, but the latter is insufficient for the others (although it is a decent 80/20 solution). Completing the no-unwinding story is left for future work.

Rendered


Updated link:

Rendered

@Gankra Gankra changed the title fallible allocation 1.0 fallible collection allocation 1.0 Aug 18, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Aug 18, 2017

I'm really sorry this isn't perfect. I am just deeply exhausted with working on this problem right now, and need to push out what I have just to get it out there and focus on something else for a bit.

I'm not 100% convinced with all my "don't rock the boat" rationales for CollectionAllocErr, and could probably be very easily convinced to change that. It's just that my default stance on this kinda stuff is "don't touch anything, because the Servo team probably is relying on it in 17 different ways that will make me sad".


This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu.

Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.

@jethrogb
Copy link
Contributor

jethrogb commented Aug 18, 2017

I think the plan for unwinding is terrible. I could support the try_reserve part of this RFC separately as a stop-gap measure on the way to full support.

Similar to the embedded case, handling allocation failure at the granularity of tasks is ideal for quality-of-implementation purposes. However, unlike embedded development, it isn't considered practical (in terms of cost) to properly take control of everything and ensure allocation failure is handled robustly.

There is no evidence of this not being considered practical.

More in general, the server use case seems a little thin. I've already mentioned in the internals thread that there's a lot more to be considered there: servers come in different shapes and sizes. Considering one of Rust's 2017 goals is “Rust should be well-equipped for writing robust, high-scale servers” I think this use case (or, I'd like to argue, use cases) should be explored in more detail.

Depending on unwinding for error handling is a terrible idea and entirely contrary to Rust best practices. This by itself should be listed under the “drawbacks” section. Besides being counteridiomatic, recovering from unwinding doesn't work well in at least three cases, two of which are not currently considered by the RFC:

  1. Platforms without unwinding support. Not really any need to go into detail here, as the RFC describes it pretty well.
  2. FFI. Unwinding is not supported across FFI boundaries. Allocation errors now result in a relatively clean abort. With this RFC, with unwinding from allocation errors through FFI can result in weird/non-deterministic/undefined behavior.
  3. Synchronization primitives You can't use any of the standard synchronization primitives such as Once, Mutex, and RwLock if you expect your code to unwind because of the possibility of lock poisoning. This was also already mentioned in the internals thread.

@rpjohnst
Copy link

Using unwinding to contain errors at task granularity is completely idiomatic. It's why Rust bothers to have unwinding at all. Allowing OOMs to panic in addition to their current behavior is totally in line with this. It's not a full solution, but it is a necessary part of one.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 18, 2017

Update: The suggestion was followed. No need to read rest of this comment (which I have left below the line)


I suggest that the filename for this RFC be changed to something that isn't quite so subtle.

(The current filename, "alloc-me-like-one-of-your-french-girls.md", is a meme/quote from the movie "Titanic"; I infer that reference is meant to bring to mind "fallibility", but I needed some help along the way.)

@Gankra Gankra force-pushed the if-at-first-you-dont-alloc branch from 0687133 to c1da9a1 Compare August 18, 2017 06:11

This strategy is used on many *nix variants/descendants, including Android, iOS, MacOS, and Ubuntu.

Some developers will try to use this as an argument for never *trying* to handle allocation failure. This RFC does not consider this to be a reasonable stance. First and foremost: Windows doesn't do it. So anything that's used a lot on windows (e.g. Firefox) can reasonably try to handle allocation failure there. Similarly, overcommit can be disabled completely or partially on many OSes. For instance the default for Linux is to actually fail on allocations that are "obviously" too large to handle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that this previous comment on the earlier filename does not get lost in the shuffle; quoting here for completeness:

Worth mentioning that allocations can fail for reasons other that running out of physical memory. e.g. running out of address space. or running into a ulimit/setrlimit.


Here unwinding is available, and seems to be the preferred solution, as it maximizes the chances of allocation failures bubbling out of whatever libraries are used. This is unlikely to be totally robust, but that's ok.

With unwinding there isn't any apparent use for an infallible allocation checker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know what "infallible allocation checker" meant when I first read through this. The term does not occur, at least not as written, elsewhere in the document.

Is it meant to be a hypothetical tool listed above with "User Profile: Embedded", namely:

some system to prevent infallible allocations from ever being used

If so, maybe just add "we'll call this an 'infallible allocation checker' when the idea is first introduced, just to define local terminology?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good catch (it's what I end up proposing in the first Future Work section)


## User Profile: Runtime

A garbage-collected runtime (such as SpiderMonkey or the Microsoft CLR), is generally expected to avoid crashing due to out-of-memory conditions. Different strategies and allocators are used for different situations here. Most notably, there are allocations on the GC heap for the running script, and allocations on the global heap for the actual runtime's own processing (e.g. performing a JIT compilation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the word "crash" here in a narrow sense that just references to undefined-behavior and low-level errors like segmentation faults, etc? Or is it meant to include unchecked exceptions, which Java's OutOfMemoryError qualifies as? (I understand that you did not include the JVM in your list of example garbage-collected runtimes, but I think most reasonable people would include it as an example of one...)

But maybe I misunderstand the real point being made in this sentence, since you draw a distinction between allocations made for the script versus allocations for the internals of the runtime. I.e. is your point that even Java avoids crashing from out-of-memory conditions that arise from the runtime internals (like JIT compilation) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I guess my previous comment is implicitly suggesting that you use a more specific word than "crash" in the first sentence. That, or add text elsewhere to the document that specifies what "crash" denotes in the context of this RFC.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should be more clear. I mostly meant crashing due to runtime internals, but script stuff should also try to recover (e.g. triggering a GC when a script allocation fails and retrying). I ended up cutting all focus from the script side because, as I note just below, the script allocations aren't actually relevant to this RFC (AFAICT).

I didn't include the JVM because I had only found the time to interviewed SM and CLR people.

@jethrogb
Copy link
Contributor

Using unwinding to contain errors at task granularity is completely idiomatic.

Only as a last resort, such that that one assertion failure doesn't take down your whole process accidentally. Unwinding should not be used for errors that are more or less expected and you know how to deal with. https://doc.rust-lang.org/stable/book/second-edition/ch09-03-to-panic-or-not-to-panic.html

@rpjohnst
Copy link

Yes, precisely. In many situations, allocation failure is unexpected and has no meaningful response at a granularity smaller than a task. This is a reason to support oom=panic rather than just abort.


## try_reserve

`try_reserve` and `try_reserve_exact` would be added to `HashMap`, `Vec`, `String`, and `VecDeque`. These would have the exact same APIs as their infallible counterparts, except that OOM would be exposed as an error case, rather than a call to `Alloc::oom()`. They would have the following signatures:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification request: You'll only be adding fn try_reserve_exact to the types that already supply fn reserve_exact, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest I was working off memory and thought reserve and reserve_exact were always together. If not, then yeah what you said.

```
/// Tries to reserve capacity for at least `additional` more elements to be inserted
/// in the given `Vec<T>`. The collection may reserve more space to avoid
/// frequent reallocations. After calling `reserve`, capacity will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc-comment should be further revised since this is the doc for fn try_reserve, not fn reserve. In particular, instead of saying "After calling reserve, capacity will ..." (which is not relevant to this function), you could instead say:

If try_reserve returns Ok, capacity will be greater than or equal to self.len() + additional. If capacity is already sufficient, then returns Ok (with no side-effects to this collection).


## Eliminate the CapacityOverflow distinction

Collections could potentially just create an `AllocErr::Unsupported("capacity overflow")` and feed it to their allocator. Presumably this wouldn't do something bad to the allocator? Then the oom=abort flag could be used to completely control whether allocation failure is a panic or abort (for participating allocators).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever! (don't know if I like it, but had to give it credit nonetheless)

@withoutboats withoutboats added T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-lang Relevant to the language team, which will review and decide on the RFC. labels Aug 18, 2017
@mark-i-m
Copy link
Member

@gankro Thanks for all the hard work!

However, I also don't like unwinding as error handling. Unwinding should only happen when I have signaled that I don't have anything to do to help; IMHO, it is damage control, rather than error handling.

Have there been any proposals to add some sort of set_oom_handler to the allocator interface? I am imagining something like the following:

enum OOMOutcome {
    Resolved(*const u8), // OOM was resolved and here is the allocation
    
    // Could not resolve the OOM, so do what you have to
    #[cfg(oom=panic)]
    Panic,
    #[cfg(oom=abort)]
    Abort,
}

fn set_oom_handler<H>(handler: H) 
    where H: Fn(/* failed operation args... */) -> OOMOutcome;

You would then call set_oom_handler with your handler at the beginning of your program. Your handler can then choose what it wants to happen, including triggering a GC or whatever...

The benefit of this approach is that the existing interface doesn't have to change at all, and applications don't have choose to use try_reserve vs the existing things.

An alternate approach would be to make the oom_handler a language item or something (but that seems more suitable for the global allocator than collection allocators).

Yet another approach would be to make the OOM handler a type that implements a trait. Allocators would then be generic over their OOMHandler type.

@elahn
Copy link

elahn commented Aug 19, 2017

Great idea, @mark-i-m. In a server app, I'd grab a chunk of memory on startup, then in my OOM handler:

  • set max_inflight_requests = inflight_requests - 1
  • send back-pressure/notify the monitoring, traffic and instance manager
  • resolve the allocation using some of the chunk, so the request can succeed

On operating systems that error on allocation as opposed to access, this would remove the need to manually tune the app for request load as a function of memory consumption.

Thanks @gankro for your work on this RFC.

@whitequark
Copy link
Member

whitequark commented Aug 19, 2017

@gankro I have mixed feelings about this RFC. I think this is a design, it provides me with the ability to write a VecAllocExt trait that gives me try_push and friends that would be ergonomic enough to provide a short path to an eventual stability, and these are the parts that I like. It's useful! But also, it's very imperfect, and (without going into details) I would like to see most of the internals to be completely redone before it's stabilized.

Still, it's much better than the current situation, and it doesn't have any drawbacks I can see as long as the guts are unstable (taking into account that application/firmware code will be written against this), so I'm favor of merging this even as-is.

Thanks for writing this!

@mark-i-m
Copy link
Member

Perhaps this could be an experimental RFC until we get experience with what doesn't work so well? That's seemed to work well in the past for the allocator interfaces...

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 21, 2017

Given the constraints, I'd say this is good enough.

I have only one question, what are the semantics of try_reserve on a HashMap ? That is, If I want to insert an element into a HashMap and I do something like hash_map.try_reserve(hash_map.len() + 1) am I guaranteed that on insertion the HasMap won't try to allocate?

@aturon aturon self-assigned this Aug 22, 2017
@arthurprs
Copy link

arthurprs commented Aug 23, 2017

@gnzlbg that's implementation detail. The stdlib version, for example, is able to guarantee room for any combination of items in advance. Some variants though can't 100% guarantee room without knowing the dataset. Hopscotch and cuckoo hash tables come to my mind as examples.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 23, 2017

@arthurprs IIUC the whole point of try_reserve is to let you use std::collections without panic!ing on OOM. With the guarantee that "if try_reserve(N) succeeds, the collection can grow up to size N without allocating new memory" and due to this, OOMs cannot happen. Avoiding panics is neither easy, nor ergonomic, nor reliable, but if you are careful, doable.

Without this guarantee, one cannot avoid OOM at all, so why would I actually call try_reserve instead of reserve if I can get an OOM panic! anyway? E.g. try_resere(N) succeeds, ok, now what? I can still get an OOM panic. Is there anything useful that I can do with this information if I cannot use it to avoid OOM panics?

So... I must be missing something, because as I understand it, this guarantee is not an implementation detail of try_reserve, but its reason for existing. Without this guarantee, I really don't see how the method can be used to do anything useful (*).

(*) unless we add try_... variants of other insertion methods.

@arthurprs
Copy link

arthurprs commented Aug 23, 2017

I don't disagree with your overall idea, I'm just saying that it can't be done in ALL cases.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 23, 2017

I don't disagree with your overall idea, I'm just saying that it can't be done for ALL data structures.

@arthurprs This is why I was asking (and why I choose the HashMap as an example). I agree with you on this. This cannot be guaranteed for all collections.

@briansmith
Copy link

briansmith commented Jan 5, 2022

I was looking into changing some libraries to be oom=panic compatible and I expect it will be very error-prone to do.

Already a lot of people think that catching allocation errors in C++ and trying to recover is too error-prone and a security disaster, and that's with the language always supporting that kind of recovery.

In the case of Rust libraries and programs, they have been written with their security analysis assuming oom=abort. Thus, either an allocation succeeds or it's game over, so it's currently valid to create an inconsistent state in a program (e.g. have a memory-safety invariant left unsatisfied), do an allocation, and then restore the state (e.g. restore the memory-safety invariant) after the allocation. Any such code will be broken potentially memory-unsafe with oom=panic where it was safe with the current semantics. Thus one can't assume any crate or any chunk of copy-pasta code is compatible with oom=panic without doing an analysis similar to analyzing unsafe { } code.

In other words, the oom=panic aspect of this will make some currently safe Rust code unsafe. I believe for this reason at a minimum that crates would have to indicate that they actually intend to be compatible with oom=panic and if a crate hasn't explicitly indicated that it supports oom=panic then it should be assumed that the entire crate is unsafe when the configuration is oom=panic. A crate could specify this with something like unsafe-oom-panic-compatible = true in its Cargo.toml; I suggest using unsafe in the name because this assertion is an unchecked claim regarding memory safety.

@Ericson2314
Copy link
Contributor

@briansmith I think I basically think I agree in thinking that as we get more try_ methods (or equivalent) the uses-cases for oom=panic will fast go away. I even expect unwinding to go away too as we get more tools to audit panics and thus ensure recovery paths exist by other means.

@Aaron1011
Copy link
Member

In other words, the oom=panic aspect of this will make some currently safe Rust code unsafe.

To make sure that I understand correctly, you mean that this will make existing sound unsafe code (i.e. code inside an unsafe block) unsound, right? If I have a crate consisting of entirely safe code (including expanded macros), then oom=panic could lead to logic errors (my code doesn't restore some logical invariant due to a panic from an allocation). However, I don't believe that it can lead to unsoundness, as injecting a panic (which is safe) into arbitrary purely-safe code can't lead to unsoundness.

@briansmith
Copy link

To make sure that I understand correctly, you mean that this will make existing sound unsafe code (i.e. code inside an unsafe block) unsound, right? If I have a crate consisting of entirely safe code (including expanded macros),

Very few crates meet that criteria. For example, most use libstd and libstd has a lot of unsafe code within it. Is libstd itself memory-safe with oom=panic?

then oom=panic could lead to logic errors (my code doesn't restore some logical invariant due to a panic from an allocation). However, I don't believe that it can lead to unsoundness, as injecting a panic (which is safe) into arbitrary purely-safe code can't lead to unsoundness.

struct S<T> {
    ptr: *mut T,
}

impl<T> S<T> {
    fn foo(&mut self) {
        let saved = self.ptr;
        self.ptr = 12345678 as *mut _; // Invalidate invariant
        let _ = vec![0u8; 1_000_000];
        self.ptr = saved; // Restore invariant
    }
}

[Pretend there are 10,000 lines of code here]

impl<T> S<T> where T: Copy {    
    fn bar(&self) -> T {
        unsafe { *self.ptr }
    }
}

The above code is safe for oom=abort, but unsafe for oom=panic. Note that there is no unsafe block in the block of code directly affected by oom=panic; the unsafe code that is affected is far, far away. This makes it very difficult to find all the places that are potentially affected.

At a minimum, we should audit libstd to make sure there are no such patterns within it. I think that will be a difficult project.

@Aaron1011
Copy link
Member

I agree that finding these kinds of issues is going to be very difficult. I just wanted to clarify that any memory safety issue will ultimately have to involve unsafe code in se way, just like with any other form of undefined behavior.

@Lokathor
Copy link
Contributor

Lokathor commented Jan 6, 2022

Once again the lack of "unsafe to touch" fields bites us.

@briansmith
Copy link

briansmith commented Jan 6, 2022

I agree that finding these kinds of issues is going to be very difficult. I just wanted to clarify that any memory safety issue will ultimately have to involve unsafe code in se way, just like with any other form of undefined behavior.

That is probably true for memory safety, but it isn't true for all security invariants.

Consider an application that accepts requests with nonces. It appends each nonce it sees into a HashSet so that it never accepts the same nonce twice. Further, assume that it already uses catch_unwind to "gracefully" handle panics. With oom=abort, this would be safe until the application runs out of memory and dies. with oom=panic, any attempt to append to the HashSet becomes lossy, and the application will silently continue on without (perfectly) enforcing its unique nonce enforcement security guarantee.

This makes me think that the kind of panic that OOM would issue must be distinct from the kinds of panics that catch_unwind would catch:

  • catch_unwind would ignore OOM panics
  • A new unsafe fn catch_unwind_oom<R>(impl Fn() -> R + UnwindSafe, R) -> Result<R> would need to be used to catch OOM (and only OOM?).
  • Library crates should have to opt in to supporting oom=panic, e.g. unsafe-oom-panic=true in Cargo.toml. (This is an unverifiable assertion, thus why I suggest using "unsafe" in the name.) The build system should help enforce that only crates claiming to be oom=panic-safe should be used.
  • libstd shouldn't be considered oom=panic safe unless/until it is somehow verified.

@briansmith
Copy link

@briansmith I think I basically think I agree in thinking that as we get more try_ methods (or equivalent) the uses-cases for oom=panic will fast go away. I even expect unwinding to go away too as we get more tools to audit panics and thus ensure recovery paths exist by other means.

I disagree, because I think the ecosystem shouldn't switch to using the try_ allocation mechanisms in general. The problem I am trying to solve is "How do I use Rust libraries that do memory allocation into a C application that doesn't want oom=abort semantics?" I don't think the right answer to that is "Rewrite all the crates to use only fallible memory allocation" as that creates a huge hinderance to all users to accommodate a small minority need. oom=panic looks attractive as a way to accommodate them if the safety issues can be addressed, as one would put catch_unwind/catch_unwind_oom at every FFI boundary and throw away the Rust library state (and potentially build new Rust library state) on OOM panics.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 6, 2022

This makes me think that the kind of panic that OOM would issue must be distinct from the kinds of panics that catch_unwind would catch:

  • With oom=panic, some operations (in this case allocations) which used to be infallible can now panic.
  • Code may exist which relies on such operations to be infallible to maintain some program invariants.
  • Such invariants may be safety invariants, security invariants, or really anything.

In other words, there's nothing "special" about OOM panics, other than the fact that OOM didn't used to be a panic. For that reason, I don't think they should be a different kind of panic, since it would make a historical accident into a permanent distinction.

One could easily imagine that in the future there might be other operations that we'd want to make fallible, and that would have the same implications for program invariants, but we wouldn't want to keep introducing new kinds of panics.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 6, 2022

I also think, that since oom=panic is an application-level decision, that an initial solution doesn't need to be perfectly automated - it would not be unreaonable to initially expect application authors to do some vetting of their dependencies before using such an option in a security critical context, and the "oom=panic" option could have some barriers/warnings to ensure that users are aware of this.

There is a huge space to explore here, but a new field in Cargo.toml that declares a crate "oom=panic" safe, and a badge on crates.io could be useful.

@briansmith
Copy link

One could easily imagine that in the future there might be other operations that we'd want to make fallible, and that would have the same implications for program invariants, but we wouldn't want to keep introducing new kinds of panics.

I don't think the expectation is that Rust developers must guard against any standard library function being able to panic, now or in the future. In any case, people don't program that way. Actually, because we have "Panic" sections in documentation of the standard library, I think one may assume that outside the circumstances already described by the "Panic" section documentation, a standard library function won't panic. So I don't think it is reasonable to make existing APIs panic when they didn't before, unless a crate has opted into that breaking change.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 6, 2022

unless a crate has opted into that breaking change.

By using oom=panic, an application crate has opted into that breaking change. For a library crate, if it doesn't declare that it supports oom=panic, then there's no expectation that it should work in such a configuration.

@Ericson2314
Copy link
Contributor

@briansmith

I disagree, because...

In the narrow case where C calls out to Rust in just a few place sure. But for more invasive "rewrite it in Rust" scenarios, that breaks down. Same for certain freestanding/embedded projects, e.g. for the Rust in the Linux kernel, catching panics would never have been accepted by upstream. I am not saying we all need to go rewrite the world with try_ right now. I am just predicting that will be the trend in the long-long term for important libraries, so we only need stop-gap solutions in the meantime.

This isn't supposed to be a doom prediction, but rather if the process goes slowly enough the cost will be amortized and not hurt :)

@briansmith
Copy link

I am not saying we all need to go rewrite the world with try_ right now. I am just predicting that will be the trend in the long-long term for important libraries, so we only need stop-gap solutions in the meantime.

The reason I became interested in this issue is because there was a request to do exactly that for Rustls to get it to work according to Apache Web Server's coding rules, and I'm trying to find a solution that doesn't involve adding a giant number of new (untested) branches to Rustls. I did sketch what this would look like in the Rustls codebase and it got ugly really quickly so I'm trying to find alternatives.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 7, 2022

I did sketch what this would look like in the Rustls codebase and it got ugly really quickly so I'm trying to find alternatives.

Did the issues stem from the fact that many functions would have to return a Result? Or was it more to do with the use of indirect allocations that you couldn't easily convert to a try_... form?

One possibility to address the former would be to translate reserve() into try_reserve().custom_unwrap() (where custom_unwrap panics with some custom panic payload indicating OOM) effectively giving you a crate-local oom=panic.

@Stargateur
Copy link

I did sketch what this would look like in the Rustls codebase and it got ugly really quickly so I'm trying to find alternatives.

How having a new error make your code a mess ? snafu or thiserror make it easy to handle error. Can you expand what problem do you encounter ?

@briansmith
Copy link

How having a new error make your code a mess?

Every time we'd do an allocation, we have to branch on whether the allocation succeeds or fails. Usually this using ?. This becomes a problem in the many cases where the function previously didn't fail, as now we have to bubble up Results everywhere there wasn't one before. Further, either we have to resign ourselves to not testing most of these new branches (i.e. have low confidence in the correctness in the result) and assume everything is OK, or we have to implement fuzzing with an allocator that randomly fails and ensure it hits all these new branches. Experience with C shows that untested error paths are some of the most dangerous in terms of latent security issues, so having the new branches be untested seems irresponsible. Further, all of these branches make it much harder to reason abstractly about the correctness of the code. (One of the things I've been working towards in Rustls is eliminating as much branching as I can by using functional patterns.)

@Stargateur
Copy link

I don't have good solution for your case, since allocator is a global state is very hard to control, I don't think it's impossible but very hard and would require a lot of work. Even with fuse you can't test every possibility, that not because you test a branch that you test every possibility to go thought this branch.

I suggest on rustls case that anything that trigger a oom make an irrecoverable error state into the connection and make it impossible to use it again. Cause let's say you are a server that doesn't want to abort on oom, you have a client that somehow trigger an oom you want to kick this client. So any oom error from a connection you want to stop it. By having a "State Broken abort the ship" that the server will receive trying to use rustls this would prevent any inconsistency state and allow to not abort on oom.

eliminating as much branching as I can by using functional patterns.

I don't see how functional patterns remove any branch, functional pattern remove side effect not branch.

Experience with C shows that untested error paths are some of the most dangerous in terms of latent security issues

True but it's Rust here. It's not a 1 to 1 equivalent, Rust have must better tool to prevent bug.

But I think here we are a little off topic about this feature, your warning about oom=panic is valid but I think you are a little off topic about the rest.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 8, 2022

@briansmith I get the pain of refactoring libs, but I don't understand that latest reson for being way of try_ at all. Whether implemented as ? or panicked unwinding, the error control flow path still exists! And either way, it's untested without allocation failure fuzzing!

Now I understand the following is a matter of opinion, but I would much prefer explicit control flow to panicked control flow to audit. As we all know, Rust does error handling this much better than C, by virtue of having sum types, ? and must_use. This makes result-based error handling as "regular" as unwinding. Furthermore, being explicit, the type checker is there to help. Little tooling exists for complicated matters of "exception safety", whereas results or a lack there of have to be tracked conclusively, the type system won't let otherwise.

@Lokathor
Copy link
Contributor

Lokathor commented Jan 8, 2022

Brian seems to want all alloc failures to only abort, if I read their meaning properly. Which is certainly one way to maintain security, but maybe a poor user experience.

@Ericson2314
Copy link
Contributor

OK aborting is simpler, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocation Proposals relating to allocation. A-collections Proposals about collection APIs final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.