You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The semantics of the best_affinity_to constraint in Realm machine model processor and memory queries is currently poorly defined and can lead to surprising query results. Consider the following simple example:
Machine::MemoryQuery query(Machine::get_machine());
query.only_kind(Memory::Z_COPY_MEM);
query.best_affinity_to(target_proc); // target proc is a GPU
assert(query.count() > 0);
Today this code can fail the assertion, even if the best_affinity_to call is a replaced with a has_affinity_to call, which is surprising because as long as there is at least one zero-copy memory with affinity to the target GPU, then one of those zero-copy memories should be able to have the best affinity to that particular GPU (if there's a tie then just picking any of them is fine, but the set should not be empty).
There are two possible ways to interpret the constraints of the machine model:
The constraints are aggregated all together and then only evaluated conjunctively. This should always result in the assertion passing today.
Constraints are evaluated eagerly as applied, but the ordering of the constraints might matter to the result of the query (e.g., application of the constraints is not commutative). This should also always result in the code above passing.
The second kind of evaluation might also lead to the assertion failing if the ordering of the constraints is reversed. For example:
Machine::MemoryQuery query(Machine::get_machine());
query.best_affinity_to(target_proc); // target proc is a GPU
query.only_kind(Memory::Z_COPY_MEM);
assert(query.count() > 0);
would likely fail for the second kind of evaluation because the memories with the best affinity to a GPU would likely be the framebuffer, and then none of such memories would be a zero-copy memory so the query could return no results. If you evaluate this code with the first strategy though then it would return the same result regardless of the ordering of the constraints.
We need to clarify the semantics of best_affinity_to and the evaluation strategy of constraints on processor and memory queries.
Can someone write down what the current semantics actually are?
What I seem to remember is that they're something wildly unintuitive, like query.best_affinity_to(P) only finds memories M if M's best affinity is to P (among all processors that M is connected to), rather than finding the M such that P's best affinity (among all memories P is connected to) is maximized. That may be the reason why you're not finding anything, aside from the conjunction and evaluation order issues.
Basically, I think there are two possible semantics:
M.has_best_affinity_to_processor(P): there is no other processor P2 with a better affinity to M
M.has_best_affinity_from_processor(P): there is no other memory M2 with a better affinity from P
There may be better names for these, but the point is that there are two possible directions for the query to go.
In addition to that, an open question is what to do when we have a tie. It appears that we may be breaking the tie by choosing exactly one "best" processor (out of a set with identical affinities) where it probably makes sense to return all of the tied options when they have the same affinity.
Can someone write down what the current semantics actually are?
Yeah, you're right, best_affinity_to(p) filters the query of memories to only the memories that are considered the "best" for the processor, keeping all ties. So in effect, it returns the intersection of whatever the previous predicates result in and the list of all memories that have the same "best" bandwidth.
I think the naming of this API makes it seem like it takes the current memory query with all the previous predicates applied and only keeps the best ones of that subset. That's definitely not the case now. Do we want to change the current API or provide a new API to give the interpretation that people expect?
The semantics of the
best_affinity_to
constraint in Realm machine model processor and memory queries is currently poorly defined and can lead to surprising query results. Consider the following simple example:Today this code can fail the assertion, even if the
best_affinity_to
call is a replaced with ahas_affinity_to
call, which is surprising because as long as there is at least one zero-copy memory with affinity to the target GPU, then one of those zero-copy memories should be able to have the best affinity to that particular GPU (if there's a tie then just picking any of them is fine, but the set should not be empty).There are two possible ways to interpret the constraints of the machine model:
The second kind of evaluation might also lead to the assertion failing if the ordering of the constraints is reversed. For example:
would likely fail for the second kind of evaluation because the memories with the best affinity to a GPU would likely be the framebuffer, and then none of such memories would be a zero-copy memory so the query could return no results. If you evaluate this code with the first strategy though then it would return the same result regardless of the ordering of the constraints.
We need to clarify the semantics of
best_affinity_to
and the evaluation strategy of constraints on processor and memory queries.Assigning to @apryakhin to triage.
The text was updated successfully, but these errors were encountered: