Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for setting different DLIO_LOG_LEVEL #222

Merged
merged 20 commits into from
Feb 24, 2025
Merged

Support for setting different DLIO_LOG_LEVEL #222

merged 20 commits into from
Feb 24, 2025

Conversation

zhenghh04
Copy link
Member

@zhenghh04 zhenghh04 commented Aug 27, 2024

In this PR, we changed the per step output from info to debug to reduce the logging overhead.

We added support for changing logging level through environment variable DLIO_LOG_LEVEL which can be set to be error, warning, output, info, debug. By default we set as output.

All the per-step log is put >= info level. The output level only contains per epoch logs. With this, the log file size is significantly reduced.

In the debug level, the log format contains source file and line number, while other levels do not.

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add error logging as an option for pure performance mode.
For the final per metric we should use a different logging, which outputs PRINT messages to STDOUT and redirects to the dlio.log file.

The other logging is optional from the benchmark.

@hariharan-devarajan
Copy link
Collaborator

@zhenghh04 Can we not print the file name and line number for info logging. We should only do that for debug logging. This will significantly reduce log size.

@zhenghh04
Copy link
Member Author

@hariharan-devarajan , made the changes as you suggested. Please review it again.

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there. I see we have some code which got reverted from the mock logging.

@zhenghh04
Copy link
Member Author

@hariharan-devarajan , I am hesitating between using workflow.log_level to control the log level, vs using DLIO_LOG_LEVEL environment variable to control it. I slightly lean towards to the latter one, which is more common in other apps. What is your preference?

@hariharan-devarajan
Copy link
Collaborator

@hariharan-devarajan , I am hesitating between using workflow.log_level to control the log level, vs using DLIO_LOG_LEVEL environment variable to control it. I slightly lean towards to the latter one, which is more common in other apps. What is your preference?

I see a value in both. But I prefer the environment variable, too. By default, we should have WARN level, not info. Then we can make DLIO_LOG_LEVEL to higher logging levels like INFO and DEBUG.

We should switch the per epoch time to print and include the sample rate per epoch there.

Then, per step becomes info.

And variable logging becomes debug.

@zhenghh04
Copy link
Member Author

@hariharan-devarajan After reading the documentation more: https://docs.python.org/3/library/logging.html#levels, I feel that we should have default logging level to be info, and everything should go through logging, not through print.

What do you think?

@hariharan-devarajan
Copy link
Collaborator

So printing is different than logging in my opinion.

The things that tell us about benchmark high-level like initialize, progress, and metrics.

Logging here is for internal parts.

I wrote a logger in the past on c++ in which I added print as the highest log above error.

For benchmarks I feel it makes sense to have this.

@hariharan-devarajan
Copy link
Collaborator

Now if u want to use logging for printing, I would create two loggers. One for printing and one for logging internal stuff.

The printing could always be info, internal, and cannot change by benchmark parameters.

@zhenghh04
Copy link
Member Author

@hariharan-devarajan How about we just keep the logging system as this PR? We can have another PR to improve it?

I need this feature for now to have an easy way to reduce some I/O overhead from the logging.

@zhenghh04
Copy link
Member Author

We should add error logging as an option for pure performance mode. For the final per metric we should use a different logging, which outputs PRINT messages to STDOUT and redirects to the dlio.log file.

The other logging is optional from the benchmark.

Now it is doing that

@zhenghh04 zhenghh04 added the enhancement New feature or request label Feb 14, 2025
@zhenghh04
Copy link
Member Author

@zhenghh04 Can we not print the file name and line number for info logging. We should only do that for debug logging. This will significantly reduce log size.

This is done!

@zhenghh04
Copy link
Member Author

Now if u want to use logging for printing, I would create two loggers. One for printing and one for logging internal stuff.

The printing could always be info, internal, and cannot change by benchmark parameters.

Done!

@zhenghh04 zhenghh04 changed the title Changing logging levels DLIO log level support Feb 14, 2025
@zhenghh04
Copy link
Member Author

@hariharan-devarajan This PR is ready for review again.

@zhenghh04 zhenghh04 changed the title DLIO log level support Support for setting different DLIO_LOG_LEVEL Feb 21, 2025
Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation still refers to debug in workflow.

The rest looks good.

https://github.com/search?q=repo%3Aargonne-lcf%2Fdlio_benchmark+debug+path%3A*.rst&type=code

@zhenghh04
Copy link
Member Author

The documentation still refers to debug in workflow.

The rest looks good.

https://github.com/search?q=repo%3Aargonne-lcf%2Fdlio_benchmark+debug+path%3A*.rst&type=code

Just fixed this.

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@zhenghh04 zhenghh04 merged commit 6185001 into main Feb 24, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants