Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core, runtime] Modify the launchers to support returning results. #2277

Merged
merged 3 commits into from
Oct 18, 2024

Conversation

schweitzpgi
Copy link
Collaborator

@schweitzpgi schweitzpgi commented Oct 14, 2024

The launchers ceased really supporting dynamic kernel results at some point. This PR adds that functionality back into the mix. This support is added pervasively across the runtime library calls.

Some notes:

  • Return values of static size continue to be supported as they were before. Specifically, these values are stored into the data buffer by the thunk adaptor so they can be returned, ultimately, to the original caller.

  • Return values of dynamic size follow exactly 1 of 2 possible calling conventions. The convention must be selected by the runtime layers.

    1. Everything is running within a single process; i.e., this is a simulation. In this case, the kernel will create a span of data and that span will be returned to the original caller which will use it to construct the std::vector result object.

    2. There are multiple processes and/.or memory spaces involved. The result span will be appended to the original data packet and the new data packet will be returned as a new span by the runtime. The calling code will follow a similar process, but the data will be passed in the runtime in a pointer-free encoding.

Requires merge of #2268 (consider this part 2)

Copy link
Collaborator

@annagrin annagrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

tpls/Stim Outdated Show resolved Hide resolved
@schweitzpgi schweitzpgi force-pushed the ch-rezultvaloro.revivigxo branch 3 times, most recently from 295d8eb to 75b9c03 Compare October 17, 2024 22:17
Copy link
Collaborator

@1tnguyen 1tnguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@schweitzpgi
Copy link
Collaborator Author

I have no idea why it picked up those tpls changes. Total chaos.

@bmhowe23
Copy link
Collaborator

I have no idea why it picked up those tpls changes. Total chaos.

When people git pull into their local repo, submodule changes are not typically automatically updated (by git design???). In any case, for future reference, running git submodule status would've revealed that there is a discrepancy in your workspace vs what is committed, and you can simply run git submodule update tpls/Stim to resolve it.

@schweitzpgi
Copy link
Collaborator Author

I have no idea why it picked up those tpls changes. Total chaos.

When people git pull into their local repo, submodule changes are not typically automatically updated (by git design???). In any case, for future reference, running git submodule status would've revealed that there is a discrepancy in your workspace vs what is committed, and you can simply run git submodule update tpls/Stim to resolve it.

I realize that. I saw that it was updated on main and thought I had avoided it. I think it was because I recently changed aliases for this sign-off checking. One landmine for another.

The launchers ceased really supporting dynamic kernel results
at some point. This PR adds that functionality back into the
mix. This support is added pervasively across the runtime
library calls.

Some notes:

  - Return values of static size continue to be supported as
    they were before. Specifically, these values are stored
    into the data buffer by the thunk adaptor so they can be
    returned, ultimately, to the original caller.

  - Return values of dynamic size follow exactly 1 of 2
    possible calling conventions. The convention must be
    selected by the runtime layers.

    1. Everything is running within a single process; i.e.,
       this is a simulation. In this case, the kernel will
       create a span of data and that span will be returned
       to the original caller which will use it to construct
       the std::vector result object.

    2. There are multiple processes and/or memory spaces
       involved. The result span will be appended to the
       original data packet and the new data packet will be
       returned as a new span by the runtime. The calling
       code will follow a similar process, but the data will
       be passed in the runtime in a pointer-free encoding.

Make the cast more robust to sneak it past -Werror.

Update another launchKernel override.

Add some doxygen goop to try an evade CI issues.

Fix the python builder errors.

Sachin's fix.

Signed-off-by: Eric Schweitz <[email protected]>
Signed-off-by: Eric Schweitz <[email protected]>
@schweitzpgi schweitzpgi merged commit 4fe3c4b into NVIDIA:main Oct 18, 2024
132 checks passed
@schweitzpgi schweitzpgi deleted the ch-rezultvaloro.revivigxo branch October 18, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime CUDA quantum runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants