Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: darshan-runtime and Python STDIO/print captures #933

Open
tylerjereddy opened this issue May 12, 2023 · 0 comments
Open

BUG: darshan-runtime and Python STDIO/print captures #933

tylerjereddy opened this issue May 12, 2023 · 0 comments

Comments

@tylerjereddy
Copy link
Collaborator

tylerjereddy commented May 12, 2023

While onboarding Yaris, we observed a few very confusing issues with capturing CPython STDIO/print() activity in some Python "learning scripts." I've reproduced some of them myself locally just now so I'll share and describe the behavior below.

While this is unlikely to show up in say large National Lab code workflows, I can certainly see how it could cause a great deal of confusion while trying to learn to use the darshan runtime monitoring through to the HTML report workflow parts of the ecosystem/project.

First, let's try the first exercise I suggest, just print on two ranks, hard to imagine something simpler that is MPI-aware:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


if rank == 0:
    print("rank 0", flush=True)
if rank == 1:
    print("rank 1", flush=True)

mpirun -x LD_PRELOAD=/home/tyler/darshan_install/lib/libdarshan.so -x DARSHAN_LOGPATH=/home/tyler/LANL/rough_work/darshan/python_stagger_tests -n 2 python test.py

And the report shows (even with flush=True, on my machine), a lack of captured IO data.

image

If I increase the amount of data printed, there is still no capture of IO (same red text on the report):

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


if rank == 0:
    print("rank 0" * 100, flush=True)
if rank == 1:
    print("rank 1" * 100, flush=True)

Even if I add a few seconds of sleep after, the HTML report still indicates no IO capture:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


if rank == 0:
    print("rank 0" * 100, flush=True)
    time.sleep(5)
if rank == 1:
    print("rank 1" * 100, flush=True)
    time.sleep(5)

If I switch to explicit POSIX by writing to a file, all is good in the world again:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


if rank == 0:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")
if rank == 1:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")

image

We can stagger the IO with POSIX as well, which was the original purpose of the exercise, to understand IO patterns with simple examples like the one below. But, STDIO was invisible, so that made the exercise pretty confusing!

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


if rank == 0:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")
if rank == 1:
    time.sleep(5)
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant