Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Prevent aliasing when using inout / move with in parameters #1273

Open
bluetarpmedia opened this issue Sep 5, 2024 · 13 comments

Comments

@bluetarpmedia
Copy link
Contributor

bluetarpmedia commented Sep 5, 2024

Suggestion

cppfront could detect at runtime (and maybe even compile time?) and prevent aliasing of inout or move function parameters when the function also takes an in parameter of the same type.

When the size of a type exceeds 2 pointers, the in parameter is passed by T const& in C++, which opens up the possibility of aliasing with another reference parameter of the same type.

Motivating example 1

//  Appends the read-only string `other` twice
//  to the end of the mutable `str`.
append_twice: (inout str: std::string, in other: std::string) = {
    str += " (other)$";
    str += " (other)$";
}

main: () -> int = {
    s1: std::string = ("hello");
    s2: std::string = ("world");

    append_twice(s1, s2);     // OK: hello world world
    std::println("{}", s1);

    append_twice(s2, s2);     // Expected: world world world
    std::println("{}", s2);   // Actual:   world world world world

    return 0;
}

The program prints the following to stdout:

hello world world
world world world world

Repro on Godbolt.

Motivating example 2

main: () -> int = {
    s1: std::string = ("hello");

    some_forwarding_function(s1, s1);

    return 0;
}

some_forwarding_function: (forward x: std::string, forward y: std::string) = {
    sink(move x, y);
}

sink: (move x: std::string, in y: std::string) = {
    z: std::string = (move x);

    std::println("x: {}", x);  // OK: Moved-from
    std::println("y: {}", y);  // Expect this to have a value; `y` was passed as `in`
    std::println("z: {}", z);  // OK: Moved `x` into `z`
}

The program prints the following to stdout:

x: 
y: 
z: hello

Repro on Godbolt.

Runtime

For Cpp2 functions having inout/move and in parameters of the same type, cppfront could insert code in the lowered C++ that asserts that the relevant arguments have different addresses.

E.g. it would be as if the user had written this:

append_twice: (inout str: std::string, in other: std::string) = {
    assert(str& != other&);
    ...
}
sink: (move x: std::string, in y: std::string) = {
    assert(x& != y&);
    ...
}

Compile-time

If possible, it would be even better if cppfront could produce a diagnostic if it detects the same argument is passed to a Cpp2 function's inout/move and in parameters.

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No, I can't find any CVEs related to aliasing with function parameter const and non-const references.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

No, I can't find this guidance written anywhere.

@DyXel
Copy link
Contributor

DyXel commented Sep 6, 2024

I understand the motivation, I am not sure if I agree with this. Mostly because you would be giving users a false sense of safety; an in parameter doesn't represent exclusive read ownership, it just means that within the function body you are not allowed to modify the value, so only the caller has the guarantee, not the callee. The argument could still be modified by someone else, for example, by a different thread outside the function, or as shown in your examples, through a mutable reference to it.

@dutkalex
Copy link
Contributor

dutkalex commented Sep 6, 2024

I completely agree with @bluetarpmedia on this one:
https://youtu.be/5lecIqUhEl4?si=2JurjtmCxKpHFtTd&t=1187
The whole talk is worth viewing btw, but about this particular issue, the whole point of eliminating references from the language and switching to a more expressive parameter passing model is to solve this kinds of problems IMO!

@jcanizales
Copy link

Thanks for that video, super interesting! Because the obvious question about how to implement this in the compiler is how to prevent aliasing when one of the arguments points to a part of the other. Stopping only the case where the two arguments are exactly the same would lead IMO to a false sense of security, whereas the language is sometimes saving you from the problem but not always. Well, their solution is simple and works for all the cases.

Remains to be seen, though, if that can be easily implemented in a transpiler that lowers to C++ (and one that has to be able to interact natively with existing C++ code).

@bluetarpmedia
Copy link
Contributor Author

https://youtu.be/5lecIqUhEl4?si=2JurjtmCxKpHFtTd&t=1187

Oops, I forgot to mention that video, that was what prompted this suggestion!

@ivo-doko
Copy link

an in parameter (...) just means that within the function body you are not allowed to modify the value

Currently, it clearly doesn't quite mean exactly that, because in both examples provided in the OP the value of the in parameter is modified within the function body.

That said, I do not know how this could be resolved while remaining within the land of C++, other than making in parameters always be passed by value, which I doubt is what we want to do. But if the meaning of in remains "the value of this parameter won't be changed within the body of this function with fingers secretly crossed behind my back", this is a problem. Any secretly crossed fingers behind backs are a problem.

@jcanizales
Copy link

I think the crux is:

  1. Can the mechanism explained in the video above (starting at 41:46) be implemented in C++?
  2. Can it be made compatible with calling existing C++ code?

For (1), if we just follow the way it was done in the video, it requires zero-cost/lightweight coroutines/fibers and yield (I don't know if C++20 coroutines would be up to the task, maybe Boost's fibers?). I guess it's worth a shot, but seems like a good amount of work.

For (2) I'm more skeptic, based solely on the fact that the video says Swift couldn't go all the way there on account of maintaining compatibility with Objective-C.

@bluetarpmedia
Copy link
Contributor Author

I don't think the problem comes up that often, though I have no data to back that up! In my experience I think I've only seen it once or twice and can't even remember the actual functions and call sites. But IIRC it's a real pain to debug!

So for my needs, I'd be happy with cppfront inserting the equivalent of:

assert(x& != y&);

at the beginning of the function to ensure, at runtime, that the caller doesn't provide the same variable to both ref and const-ref parameters, or perhaps adding a precondition:

append_twice: (inout str: std::string, in other: std::string)
    pre( str& != other&, "'str' and 'other' should not alias each other" )
= {
    ...
}

But, like I said, if it's possible for cppfront to diagnose this at compile-time then that'd be even better.

@ivo-doko
Copy link

So for my needs, I'd be happy with cppfront inserting the equivalent of:

assert(x& != y&);

at the beginning of the function to ensure, at runtime, that the caller doesn't provide the same variable to both ref and const-ref parameters

This would give users a false sense of safety, precisely as @DyXel said. Consider for example (excuse the Syntax 1 C++ code, I'm not well versed with Syntax 2 yet):

template<typename T>
class Counter {
public:
  T value;
};

template<typename T>
void add_twice(Counter<T>& counter, const T& value) {
  counter.value += value;
  counter.value += value;
}

A dummy (and tragically conceptless) example, but it demonstrates the problem - the inout (reference) and in (const reference) parameters are different types, yet the same pitfall exists:

add_twice(counter, counter.value);

@DyXel
Copy link
Contributor

DyXel commented Sep 11, 2024

Guys, there's a reason why in order to implement these kind of guarantees properly, language designers have to go and reinvent the entire language from the ground up, not once, but twice in this case (Swift, then Val/Hylo).

My first comment is incomplete, I apologize for that, but yes, what I mean is:

  • An in parameter doesn't represent exclusive read ownership, it just means that within the function body you are not allowed to modify the value through that reference.
  • The argument could still be modified by someone else, ..., by a different thread outside the function, or assuming this is implemented as suggested, through a indirect mutable reference to it.

In practice, I have never found this to be an issue, and it doesn't invoke UB either afaik, so it is not "unsafe" in the practical sense we usually refer to. It is unexpected however if you apply just local reasoning, which is the motivation I feel from this suggestion and from the linked talk. Thanks for linking it btw, very informative!

@SidneyCogdill
Copy link

SidneyCogdill commented Nov 18, 2024

I'm just going to drop a code snippet in different languages to demonstrate the usefulness of exclusivity (no aliasing) in preventing dangling references:

The in / out in cppfront today already closely matches the semantics in Hylo. It should be possible to have a similar guarantee as Hylo does, at least when calling cpp2 syntax functions.

@DyXel
Copy link
Contributor

DyXel commented Nov 18, 2024

To be clear, I am not arguing about exclusivity being a bad thing. I am arguing the fact that as it stands right now, exclusivity cannot be properly implemented, and that IMO getting half across the bridge is not a good enough solution, and could in fact, cause more subtle issues.

@SidneyCogdill
Copy link

Yes, to properly implement it, the compiler needs to at least be able to track the pointer/reference provenance in order to reason about exclusivity, which just isn't very fesible with solely a transpiler.

The code snippets are just there to demostrate that there is a real benefit adopting exclusivity in parameter passing, it's not just something that "sounds nice in theory", it is a strategy that prevents real lifetime bugs from occurring. The suggestion should not be dismissed for "Not preventing XX% of security vulnerabilities", unless there is a better strategy for preventing invalidation across function boundary.

FYI The invalidation issue shown in the godbolt link (store reference -> push_back() -> read through the stored reference) is prevented today with the lifetime safety checker implementation of Visual Studio and cppx (experiment fork) when the invalidation and subsequent reads happen within the same function scope. It's just not nice that you can then trivially defeat the checker by splitting them into separate functions.

@jcanizales
Copy link

Saying it as a fan of the whole thing:

it is a strategy that prevents real lifetime bugs from occurring. The suggestion should not be dismissed for "Not preventing XX% of security vulnerabilities",

No one denies those snippets have a bug that the C++ compiler doesn't catch and Rust and Hylo's do. "Does it prevent XX% of security vulnerabilities" is not asking "does it prevent a bug from happening". It's asking if the bug is actually present in practice in people's code, and is prominent enough to cause some fraction of security vulnerabilities. Nothing in those snippets suggest "these are real life bugs in practice" nor "they're common enough to be a security concern". The snippets you wrote are a good didactic way of demonstrating the problem and how other languages don't have it. But they're not an answer to Herb's threshold for language suggestions - for that one needs to look at CVEs. Big tech companies might have already done so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants