You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RPC stage implementation DistRpcPipelineStage doesn't enforce message ordering, which means we can't verify labels and thus can't measure model accuracy. The primary challenge is that we don't directly manage the PyTorch RPC threads. Furthermore, the current implementation requires the sending stage to wait for the receiver to be ready, which can result in a deadlock if there is a send failure. We could implement queues and threads like with the P2P comm framework, but this is likely to be more challenging to initialize from the master rank, and then you might as well just use the P2P comm framework.
The text was updated successfully, but these errors were encountered:
The RPC stage implementation
DistRpcPipelineStage
doesn't enforce message ordering, which means we can't verify labels and thus can't measure model accuracy. The primary challenge is that we don't directly manage the PyTorch RPC threads. Furthermore, the current implementation requires the sending stage to wait for the receiver to be ready, which can result in a deadlock if there is a send failure. We could implement queues and threads like with the P2P comm framework, but this is likely to be more challenging to initialize from the master rank, and then you might as well just use the P2P comm framework.The text was updated successfully, but these errors were encountered: