Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducer tests: grab output/error, and use the correct python #1554

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tfogal
Copy link
Collaborator

@tfogal tfogal commented Dec 13, 2024

Had a minor issue on my system and needed the output to debug what was going wrong.

What does this PR do?

This changes what assertion errors will print out in the case of an error. Specifically this gives the output from subprocess.run in some tests.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@tfogal tfogal requested a review from kiya00 December 13, 2024 16:16
@tfogal
Copy link
Collaborator Author

tfogal commented Dec 13, 2024

Note this bit from the docs:

If capture_output is true, stdout and stderr will be captured. When used, the internal Popen object is automatically created with stdout and stderr both set to PIPE. The stdout and stderr arguments may not be supplied at the same time as capture_output. If you wish to capture and combine both streams into one, set stdout to PIPE and stderr to STDOUT, instead of using capture_output.

https://docs.python.org/3.13/library/subprocess.html#module-subprocess

@tfogal
Copy link
Collaborator Author

tfogal commented Dec 13, 2024

The errors from the broken CI were:

    from omegaconf import OmegaConf
E   ModuleNotFoundError: No module named 'omegaconf'
    from sentencepiece import SentencePieceProcessor
E   ModuleNotFoundError: No module named 'sentencepiece'
    import sentencepiece as spm
E   ModuleNotFoundError: No module named 'sentencepiece'
    from sentencepiece import SentencePieceProcessor
E   ModuleNotFoundError: No module named 'sentencepiece'
    import sentencepiece as spm
E   ModuleNotFoundError: No module named 'sentencepiece'
    import psutil
E   ModuleNotFoundError: No module named 'psutil'
    import nvfuser
E   ModuleNotFoundError: No module named 'nvfuser'
    assert (E   AssertionError: Trying to import a Triton executor, but it requires Triton version 2.1 or greater, and the current Triton version is None
    from thunder.core.langctx import langctx, Languages
E   ModuleNotFoundError: No module named 'thunder.core.langctx'
    from thunder.core.langctx import langctx, Languages

omegaconf, sentencepiece, and psutil might just need to be installed. But nvfuser, triton, and thunder.core.langctx are obviously there.

These are similar to issues I hit on my local system: the environment from a subprocess is not the same as the environment while running the main process. It's weird that this is hit now: as can be seen, the changes don't really change the test at all. Were these not getting run in CI before and somehow my fiddling caused them to run?

Instead of subprocess.run, I wonder if we should switch to creating a file with directories dumped (i.e. tmp_path, here), and then modify .azure/gpu-tests.yml to just run them as separate top-level processes.

But really want to hear from @kiya00 first.

@tfogal
Copy link
Collaborator Author

tfogal commented Dec 13, 2024

The errors from the broken CI were:
...
These are similar to issues I hit on my local system: the environment from a subprocess is not the same as the environment while running the main process. It's weird that this is hit now: as can be seen, the changes don't really change the test at all. Were these not getting run in CI before and somehow my fiddling caused them to run?

@mruberry pointed me at https://stackoverflow.com/questions/29523246/python-subprocess-is-running-a-different-version-of-python which had great advice---this was indeed the problem.

I've updated the patch to use sys.executable which resolves the test failures on my local system (and hopefully also in CI 🤞).

@tfogal tfogal changed the title reproducer tests: grab output/error. reproducer tests: grab output/error, and use the correct python Dec 13, 2024
@mruberry
Copy link
Collaborator

CI failure looks relevant:

FAILED thunder/tests/test_dynamo.py::test_dynamo_reproducer_split_DynamoThunder_cpu_None[repro] - TypeError: expected str, bytes or os.PathLike object, not list
FAILED thunder/tests/test_dynamo.py::test_dynamo_reproducer_split_DynamoThunder_cpu_None[benchmark] - TypeError: expected str, bytes or os.PathLike object, not list

@tfogal
Copy link
Collaborator Author

tfogal commented Dec 14, 2024

CI failure looks relevant:

agh, thanks though. will push a fix in a sec

Had a minor issue on my system and needed the actual output to
debug what was going wrong.
This ensures the python we pass to subprocess.run is the same one
as the thing that's running.
@tfogal tfogal force-pushed the tfogal/test-messages branch from 9797ac0 to 77ff13b Compare December 14, 2024 00:47
@tfogal
Copy link
Collaborator Author

tfogal commented Dec 14, 2024

Most recent CI issue was Error: The operation was canceled. Sounds like a timeout. Merging main just to restart CI and try again.
This is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants