Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSHMEM Queues for Aggregation #483

Open
manjugv opened this issue Oct 29, 2021 · 1 comment
Open

OpenSHMEM Queues for Aggregation #483

manjugv opened this issue Oct 29, 2021 · 1 comment
Assignees

Comments

@manjugv
Copy link
Collaborator

manjugv commented Oct 29, 2021

Problem :

Communication and data aggregation is known to provide better performance characteristics for the PGAS/OpenSHMEM applications [1][2]. However, to leverage aggregation, the OpenSHMEM programming model lacks abstractions that can be used by applications to express aggregation intentions, or that could be used by developers to optimize the OpenSHMEM implementations for aggregation.

[1] Jason Devinney's Conveyors keynote
[2] Brad Chamberlain's Chapel keynote

Proposal :

Introduce OpenSHMEM queues as an abstraction to aggregate data and communication.

Details in the document pdf

(Caution: The document requires work to make it into a specification-complaint document.)

Impact on Users:

This provides an ability to aggregate communication and data.

Impact on implementation:

Implementations will have to implement the new interfaces described in the pdf.

@manjugv manjugv self-assigned this Nov 1, 2021
@naveen-rn
Copy link
Contributor

naveen-rn commented Jan 28, 2022

@manjugv Queries on data queues or misc questions on the endgame/support for user-defined op-types (which are essential for the targeted use-cases) are slated for another comment. Let me try to understanding the basic communication queues.

Please clarify the following:

  1. The differentiation between communication queues and contexts is very minimal. It almost seems both the SHMEM objects are performing the similar operation.

    • Assume a EXCLUSIVE queue - I suppose it is used by a single thread? How does it differ from a PRIVATE context?
    • AFAIU, there is no mandatory req that a queue is tied to a target process? So, there is no difference between context from the source side as well. It looks like irrespective of queues or contexts, there is sorting required on the source side to aggregate the message?
    • In general, it would be beneficial to understand why sorting on a queue would be much effective than sorting on a context.
  2. With respect to communication queues - the push operation semantics are not clear.

    • Does the return from push operation guarantee immediate progress? As the queue_progress usage doesn't seem mandatory, it looks the return form push operation guarantees immediate progress?
    • There is no guarantee that a new push operation is slated later - so once an op is pushed into the queue - the progress engine (either a host thread or a thread from the smartNIC) would immediately pickup the pushed event. If so, how do we chain the pushed operations?
  3. What is the need for the query_size operation. Why does the user need to know about the pending operations in the queue?

  4. Is the communication queue an object to setup the SHMEM users for future data aggregation operations with data queues ? If so, it makes sense. But I don't see a real benefit in introducing a new SHMEM object duplicating all existing operations to suite the new object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants