Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 'c_ptrTo' be allowed for remote data? #26750

Open
mppf opened this issue Feb 20, 2025 · 9 comments
Open

Allow 'c_ptrTo' be allowed for remote data? #26750

mppf opened this issue Feb 20, 2025 · 9 comments

Comments

@mppf
Copy link
Member

mppf commented Feb 20, 2025

Summary of Problem

Description:
When working with the Communication module (or other low level programming) one might wish to work with a c_ptr to memory on another locale. However, that is currently disallowed by a runtime check (so it is allowed with --fast).

Is this issue currently blocking your progress?
No

Steps to Reproduce

Source Code:

use CTypes;
use Communication;

proc main() {
  var A:[0..100] int;

  on Locales[numLocales-1] {
    var temp: int = 42;

    var dstPtr = c_ptrTo(A[0]);
    var srcPtr = c_ptrTo(temp);

    put(dstPtr, srcPtr, 0, 8);
  }

  writeln(A[0]); // expect 42
}

Compile command:
chpl bb.chpl in a multilocale config

Execution command:
./bb -nl 2

bb.chpl:10: error: references to remote data cannot be passed to external routines like 'c_pointer_return'

Associated Future Test(s):
TODO

Configuration Information

gasnet+quickstart

@mppf
Copy link
Member Author

mppf commented Feb 20, 2025

If we want to allow it, this patch will do it:

diff --git a/compiler/passes/insertWideReferences.cpp b/compiler/passes/insertWideReferences.cpp
index e0119e3d08f..f031153e390 100644
--- a/compiler/passes/insertWideReferences.cpp
+++ b/compiler/passes/insertWideReferences.cpp
@@ -1475,6 +1475,9 @@ static void insertStringLiteralTemps()
 
 static void narrowWideClassesThroughCalls()
 {
+  const char* c_pointer_return = astr("c_pointer_return");
+  const char* c_pointer_return_const = astr("c_pointer_return_const");
+
   //
   // Turn calls to functions with local arguments (e.g. extern or export
   // functions) involving wide classes
@@ -1531,7 +1534,11 @@ static void narrowWideClassesThroughCalls()
 
             // Insert a local check because we cannot pass narrow references to
             // remote data to external routines
-            if (!fNoLocalChecks) {
+            if (!fNoLocalChecks &&
+                // but allow this for c_ptr_return/c_ptr_return_const
+                // as these are used for c_ptrTo
+                fn->name != c_pointer_return &&
+                fn->name != c_pointer_return_const) {
               if (fn->hasFlag(FLAG_EXTERN))
                 stmt->insertBefore(new CallExpr(PRIM_LOCAL_CHECK, sym->copy(), buildCStringLiteral(astr("references to remote data cannot be passed to external routines like '", fn->name, "'"))));
               else if (fn->hasFlag(FLAG_EXPORT))

@bradcray
Copy link
Member

This is really intriguing… I'm open to it and find the motivation motivating.

This seems thematically somewhat related to:

though, somewhat ironically, my open-ness to this issue's proposal somewhat contradicts my thinking about #21220 and #26710 in which I was thinking that we should disallow c_ptr dereferences on locales other than the one on which the c_ptr lives / was created. Here, you're not dereferencing the pointer, nor wanting to, but logically it lives on the current locale even though it's pointing to a remote locale's memory which would permit someone to dereference it here, but not there.

Some other approaches we could consider if this hesitation resonates, or others aren't as wild about loosening the current behavior:

  • prevent c_addrOf() from getting a remote address, but permit c_ptrTo() to (which is technically all you've proposed here, but is there a reason we'd not do both or neither?)
  • add a c_ptrToRemote() that could be used in cases like these, but would not be as advisable for most users to use in most cases
  • add a proper Chapel pointer type (that would presumably be something like a c_ptr and a locale ID in the common CHPL_LOCALE_MODEL=flat case and use that instead for cases like this

@mppf
Copy link
Member Author

mppf commented Feb 20, 2025

The workaround that I'm aware of is ugly: define and use these instead of c_addrOf / c_ptrTo etc:

  proc addrOf(const ref p): c_ptr(p.type) {
    return __primitive("_wide_get_addr", p): c_ptr(p.type);
  } 
  proc addrOfConst(const ref p): c_ptrConst(p.type) {
    return __primitive("_wide_get_addr", p): c_ptrConst(void) : c_ptrConst(p.type); 
  } 

@mppf
Copy link
Member Author

mppf commented Feb 21, 2025

An additional tidbit, which might not matter, but might not be obvious, is that the reason that we are checking this at all today is that c_ptrTo is implemented with at extern proc to convert the ref into a pointer. That's an implementation detail, but the checking was inherited as a result. IMO such checking is reasonable for extern procs (generally speaking) but there should be an easy way to opt out of the checking. But c_ptrTo would be the obvious way to opt out, so you could pass a remote pointer to an extern proc.

in which I was thinking that we should disallow c_ptr dereferences on locales other than the one on which the c_ptr lives / was created

Haven't re-read those issues but IMO it's OK if C pointers are low level and can cause core dump. They already are that way, so why should we be trying to protect people from them in this regard?

prevent c_addrOf() from getting a remote address, but permit c_ptrTo() to (which is technically all you've proposed here, but is there a reason we'd not do both or neither?)

IMO the difference between these has nothing to do with whether or not it's reasonable to allow a remote pointer. So I think they should be the same in this regard.

add a c_ptrToRemote() that could be used in cases like these, but would not be as advisable for most users to use in most cases

Sounds plausible, if a bit wordy. Personally, I'm more inclined to stop checking for c_ptrTo being a remote pointer. But I agree c_ptrToRemote could work.

add a proper Chapel pointer type (that would presumably be something like a c_ptr and a locale ID in the common CHPL_LOCALE_MODEL=flat case and use that instead for cases like this

I think we absolutely need a Chapel wide pointer type. I'm sure there is an issue about it. Today we use _ddata & the sticking point is, what do we call the user-facing wide pointer type?

Note that it would only solve the motivating case in this issue if we also significantly changed the API for Communication put and get.

Also, I think it's theoretically possible that somebody would want to pass a C pointer to remote data to an extern proc. It'd be nice if there was a way to at least opt-out of the checking in such a case, without resorting to turning off all the checks. That said, I don't have a specific plausible example of doing this at hand.

@bradcray
Copy link
Member

but might not be obvious, is that the reason that we are checking this at all today is that c_ptrTo is implemented with at extern proc to convert the ref into a pointer.

Definitely not obvious to me, thanks for pointing that out.

it's OK if C pointers are low level and can cause core dump. They already are that way, so why should we be trying to protect people from them in this regard?

On one hand, I completely agree with you. On the other hand, I feel like when users with a high-level, non-C/SPMD/HPC profile hit this type of issue, they are very confused and somewhat frustrated that we either didn't magically make their pointer work remotely, or protect them from it somehow. All that said, if we had the Chapel pointer type and encouraged most users to use that in most cases, and it provided the safety and/or transparency that such users wanted, I'd have no qualms about proceeding with this proposal. Which suggests to me that we probably should (especially since we don't have any plans to implement safety features around c_ptr currently anyway).

So I think they should be the same in this regard.

👍 That was my thinking as well.

I think we absolutely need a Chapel wide pointer type. I'm sure there is an issue about it.

There's at least #8680 but I know we talked about it more recently when stabilizing the CTypes module, though I'm not finding an issue from that era. Anyway, I didn't mean to imply that I thought I was proposing something new if it came across that way.

what do we call the user-facing wide pointer type?

I'd pick pointer or ptr (not sure which offhand).

Note that it would only solve the motivating case in this issue if we also significantly changed the API for Communication put and get.

I was imagining the Chapel pointer type would have the ability to query the c_ptr and locale fields such that with the existing API, one could feed those two fields in as arguments to the Communication calls. But I also agree that we'd want new, additional overloads of the Communication routines that took Chapel pointers in such a future anyway (and at that time, I'd probably move the current routines either to the CTypes module since they rely on a type from that module, or to a sub-module of Communication).

Also, I think it's theoretically possible that somebody would want to pass a C pointer to remote data to an extern proc. It'd be nice if there was a way to at least opt-out of the checking in such a case, without resorting to turning off all the checks. That said, I don't have a specific plausible example of doing this at hand.

That's a good point. One trivial case might be if they wanted to use printf()s from Chapel to debug their remote C pointers.

I'd be curious what @riftEmber , @e-kayrakli , and @jabraham17 think about this as people who've been involved in implementing the routines and the issues linked above.

@e-kayrakli
Copy link
Contributor

I like the proposal here. The linked issue, #22755, does not call what's proposed here as an option, but I think it can be a nice solution.

@riftEmber
Copy link
Member

I feel we shouldn't allow c_ptrTo on remote data, and instead should require a new Chapel (wide) pointer type for that. Reasoning:

  1. c_ptrs and CTypes in general are for C interoperability, and I can't think of a native C equivalent to getting a pointer to remote data.
  2. We'll be making a Chapel pointer type eventually, and when we do it will make far more sense to use that for remote data than c_ptr. But if we allow c_ptr to be used for that now, down the line we'll be providing two ways to do the same thing.
  3. It would be nice to not make c_ptr less safe. I know we already don't guarantee safety for it, but comparing to existing unsafe cases (casting, or pointing to managed heap-allocated memory) this seems easier to fall into accidentally.

I view allowing c_ptrTo on remote data to enable the motivating case as a workaround for not having a wide pointer type. That said, designing the Chapel pointer type would be a lot of work, so I could see justifying a workaround to punt that issue.

@bradcray
Copy link
Member

Thanks for taking a contrary position, Anna! Here's my rebuttal now that Michael has gotten me off of the initial fence I was sitting on:

  • c_ptrs and CTypes in general are for C interoperability

They are for that, but I'd add "but not just for that" — they're also for representing C types in Chapel code. Since some Chapel routines (like, in this case, the Communication routines) require C types, there are cases where a user would want to use CTypes and c_ptrs without doing any interoperability (as in the OP).

revcomp9.chpl is another example of an important program that uses C pointers, not for interoperability, but because they permit inherently unsafe things for which there's no alternative in Chapel today (and maybe there never will be since it effectively uses them for type punning, which I'd guess we may want to disallow for Chapel pointers).

That said—as noted earlier—the Communication module is unstable and would definitely be more Chapeltastic if it accepted a native Chapel pointer type, once we have one. But since we're not there yet...

and I can't think of a native C equivalent to getting a pointer to remote data.

True, but there's arguably also not a C equivalent to taking a pointer to a Chapel array or string, yet we support those things. Basically, I think of CTypes as providing access to C types in a way that makes sense within the Chapel context, which may include supporting computations and patterns that C doesn't. As a specific example, in Chapel I can do the following:

config const iWantAnError = false;

var x: int;
var p = c_ptrTo(x);
on Locales[1] {
  var p2 = p;
  if iWantAnError then writeln(p2.deref());
}
writeln(p.deref());

which I can't do in C, but that doesn't imply to me that we shouldn't permit it.

  • We'll be making a Chapel pointer type eventually, and when we do it will make far more sense to use that for remote data than c_ptr

I don't know if I'd go so far as to say it will make more sense (e.g., maybe I want to store a remote address as a field in my record and don't want to spend the two ints that a Chapel pointer would require), but I agree that it'd be preferable. And, going further, I'll say that I believe that once we have Chapel pointers, programs that use them over c_ptr will be the preferred style from a clarity/safety perspective and will be more Chapeltastic.

  • It would be nice to not make c_ptr less safe.

This resonates with me because it's where I started, but I was pretty heavily convinced by Michael's argument that C is an inherently unsafe language, and its pointers one of the more unsafe aspects, so it doesn't feel to me like we should bear the burden of making them safer than they are—Chapel pointers should be the safe option. Also, since there are workarounds to this issue, like:

var A = BlockDist.createArray({1..n}, real);
var p: c_ptr(int);
on Locales[1] {
  var mylo = A.myLocalSubdomain.low;
  p = c_ptrTo(MyBlockArr[mtlo]);
}
… do something with p …

it's not as though a user can't take a pointer to a remote variable today, they just have to go to a lot more trouble to do it. Not only do today's workarounds involve more typing, but they will also likely be more expensive since we can often know where a remote variable or array element lives without communicating back to the remote locale. For example:

var x: int;
on Locales[1] {
  x + = 1;  // we know x's address by virtue of it being passed into the active message implementing the on-clause
  ...c_ptrTo(x)…   // so could compute this without communicating back to locale 0
}

I view allowing c_ptrTo on remote data to enable the motivating case as a workaround for not having a wide pointer type.

I sort of agree with this, but sort of don't. I'd probably say that if we had a wide pointer type, we should avoid using c_ptr and c_ptrTo() almost always. In such a world, I think it'd preferable to use Chapel pointers for everything, and then to use a throwing cast (myChplPtrToInt: c_ptr(int)) or method (myChplPtrToInt.getLocalCptr()) to convert it to a C pointer (e.g., when passing it off to a C routine that required one)—the idea being that the operation would throw if it wasn't local.

So summarizing, my stance is:

  • C pointers are inherently unsafe and I'm not sure this proposal makes them that much less safe
  • the alternatives to supporting this are more unwieldy (in terms of user code written) and expensive/non-optimizable
  • C pointers are a temporarily unfortunate thing for Chapel to live with until we have Chapel pointers, at which point C pointers should be avoided as much as possible
  • any Chapel standard routines that use C pointers should ultimately be rewritten / discouraged in favor of versions that use Chapel pointers (including the Communication routines in the OP here)

@riftEmber
Copy link
Member

c_ptrs and CTypes in general are for C interoperability

They are for that, but I'd add "but not just for that" — they're also for representing C types in Chapel code. Since some Chapel routines (like, in this case, the Communication routines) require C types, there are cases where a user would want to use CTypes and c_ptrs without doing any interoperability (as in the OP).
revcomp9.chpl is another example of an important program that uses C pointers, not for interoperability, but because they permit inherently unsafe things for which there's no alternative in Chapel today (and maybe there never will be since it effectively uses them for type punning, which I'd guess we may want to disallow for Chapel pointers).
That said—as noted earlier—the Communication module is unstable and would definitely be more Chapeltastic if it accepted a native Chapel pointer type, once we have one. But since we're not there yet...

True, I agree they're not just for C interop. I don't see the Communication routines using c_ptr as a good example of that, since as you mention it'd be more Chapeltastic to do that via a native Chapel pointer, so to me the status quo is also sort of a workaround. But I do see using them to do unsafe things as a good example and also lean towards keeping that around long-term.

and I can't think of a native C equivalent to getting a pointer to remote data.

True, but there's arguably also not a C equivalent to taking a pointer to a Chapel array or string, yet we support those things. Basically, I think of CTypes as providing access to C types in a way that makes sense within the Chapel context, which may include supporting computations and patterns that C doesn't. As a specific example, in Chapel I can do the following

I didn't think of c_ptrs to Chapel types (without C equivalents) as an example of allowing something without a C equivalent, though I wasn't sure why. I think it's because I view this as an explicit composition of a C feature (pointer to an arbitrary address/chunk of memory) with a Chapel feature (the type, which only has meaning within Chapel), which now that I write it out sounds a lot like what you're saying about the function of CTypes.

Though technically a c_ptr to a remote Chapel variable could be described as such a composition, where the Chapel feature is transparently referencing remote data, it still feels like a bridge too far to me. I think my sticking point is that now the C component, the address, no longer exists in the C context of a local address space; from the C point of view we're creating a pointer with no higher-level meaning, it's just an unsigned long wearing void* clothes. And while that's allowed, Chapel requires a cast to create such a c_ptr, and even C requires a cast or explicit pointer arithmetic (I think).

We'll be making a Chapel pointer type eventually, and when we do it will make far more sense to use that for remote data than c_ptr

I don't know if I'd go so far as to say it will make more sense (e.g., maybe I want to store a remote address as a field in my record and don't want to spend the two ints that a Chapel pointer would require), but I agree that it'd be preferable. And, going further, I'll say that I believe that once we have Chapel pointers, programs that use them over c_ptr will be the preferred style from a clarity/safety perspective and will be more Chapeltastic.

I didn't think about wanting to use a narrow pointer to save space, that's a good point. I think it might be worth considering also introducing a Chapel narrow pointer type, but that's a separate conversation, and I wouldn't be that against leaving c_ptr in use there.

I guess having c_ptr still usable for this even if we have a more Chapeltastic way later isn't really a big deal, particularly with your point that we can change standard routines to prefer the latter.

It would be nice to not make c_ptr less safe.

This resonates with me because it's where I started, but I was pretty heavily convinced by Michael's argument that C is an inherently unsafe language, and its pointers one of the more unsafe aspects, so it doesn't feel to me like we should bear the burden of making them safer than they are—Chapel pointers should be the safe option. Also, since there are workarounds to this issue, like: [...]

it's not as though a user can't take a pointer to a remote variable today, they just have to go to a lot more trouble to do it. Not only do today's workarounds involve more typing, but they will also likely be more expensive since we can often know where a remote variable or array element lives without communicating back to the remote locale. For example: [...]

The points I made about semantics and safety are sort of overlapping for me, but essentially I think we shouldn't add a new what I consider a new "class" of unsafety without requiring the user to do it explicitly. The example of getting a remote pointer today is explicitly going across locales so I don't have an issue with it on those grounds.

Not only do today's workarounds involve more typing, but they will also likely be more expensive since we can often know where a remote variable or array element lives without communicating back to the remote locale.

That is a good point and definitely a downside of not allowing c_ptr to remote.


I think where I'm landing on this is though I could live with c_ptr to remote, I would prefer a separate c_ptrToRemote proc or an optional c_ptrTo argument to allow remote. That would address all my concerns and also let us write concise and efficient code for the motivating case(s). I know adding yet another c_ptrTo* is kind of onerous though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants