Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset parallel.loadmanager in EngineTestRunner after every data loading #517

Merged
merged 1 commit into from
Dec 15, 2023

Conversation

ptim0626
Copy link
Contributor

@ptim0626 ptim0626 commented Dec 1, 2023

This PR resets parallel.loadmanager after a test running through EngineTestRunner is completed, ensuring the sub-division of data is consistent among tests.

This is necessary because when the data is divided into blocks, it uses the same instance of parallel.loadmanager and the calculation of partition depends on self.load, which has been modified in-place by the previous loading. This is fine for normal reconstruction (not strictly for stochastic-type however), but not for comparing among tests when consistency is desired.

This small script using the MoonFlowerScan scan illustrates the difference:

from ptypy import utils as u
from ptypy.core import Ptycho
from ptypy.utils import parallel


def construct(reset=False):
    p = u.Param()
    p.scans = u.Param()
    p.scans.MF = u.Param()
    p.scans.MF.name = 'BlockFull'
    p.scans.MF.propagation = 'farfield'
    p.scans.MF.data = u.Param()
    p.scans.MF.data.name = 'MoonFlowerScan'
    p.scans.MF.data.num_frames = 200

    P = Ptycho(p, level=2)

    if reset:
        parallel.loadmanager.reset()

    return P

if __name__ == '__main__':
    for _ in range(5):
        P = construct(reset=False)
        active = [p.active for _, p in P.pods.items()]
        print(f'[{parallel.rank}] {sum(active)}')
        parallel.barrier()
        if parallel.master:
            print('----------')
        parallel.barrier()

When executing with 4 MPI ranks, you would get something similar to this:

[1] 40
[2] 40
[3] 41
[0] 40
----------
[3] 40
[1] 40
[2] 41
[0] 40
----------
[2] 40
[3] 40
[1] 41
[0] 40
----------
[2] 40
[3] 40
[0] 41
[1] 40
----------
[1] 40
[3] 41
[2] 40
[0] 40
----------

Note the number 41, the number of active pods, belongs to different rank when this is executed sequentially in a for-loop. This should not happen in testing. Changing to reset=True in the above script givies

[0] 40
[3] 41
[1] 40
[2] 40
----------
[3] 41
[1] 40
[2] 40
[0] 40
----------
[2] 40
[3] 41
[0] 40
[1] 40
----------
[2] 40
[1] 40
[0] 40
[3] 41
----------
[3] 41
[1] 40
[0] 40
[2] 40
----------

@ptim0626 ptim0626 changed the title Reset parallel.loadmanager in EngineTestRunner after each test completion Reset parallel.loadmanager in EngineTestRunner after every data loading Dec 1, 2023
@daurer daurer merged commit b2cf5d8 into dev Dec 15, 2023
4 checks passed
@daurer daurer deleted the reset_parallel_loadmanager branch December 15, 2023 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants