-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate cmor setups and logging #261
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I visually inspected the code change. Looks good over all.
@@ -60,7 +62,7 @@ def handle(infiles, tables, user_input_path, **kwargs): | |||
|
|||
ds = mpas.remap(ds, 'mpasocean', mappingFileName) | |||
|
|||
mpas.setup_cmor(VAR_NAME, tables, user_input_path, component='ocean') | |||
util.setup_cmor(VAR_NAME, tables, TABLE, user_input_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the syntax:(var_name=VAR_NAME, table_path=tables, table_name=TABLE, user_input_path=user_input_path)
is easier to follow than the simplified version(VAR_NAME, tables, TABLE, user_input_path)
, especially to differentiate tables
and TABLE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it across the board using "sed". The fact that it exists in 41 modules is clear evidence that this is NOT where we should be calling "cmor.setup" (or setup_cmor). The only variables in the call parameters are VAR_NAME and TABLE, and in principle we would not even know which handler module to select if we did not already have (essentially) this information BEFORE we call the handler.
Someday ... we should consolidate this processing. I know we want to "precompile" and not "interpret" for performance reasons, buy that should be solvable with an array of pre-compiled formulas indexed by variable and table., whose inputs are pointers to locations where data will be written for access.
In the meantime, I'll make the changes and push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted to named parameters for mpas handler calls to util.setup_cmor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang Also converted 4 residual "vars" handlers (areacella, clisccp, orog, sftlf), and had to change util because the named parameter was "varname", not "var_name". Now it is and all should be consistent.
The update looks good to me. Thank you for addressing the review comment. |
OK, I will wait for Tom's review next (and continue testing with the dsm_generate_CMIP6.sh driver). |
Can you paste what the log output looked like before and after this PR? I'm aware that you updated the logger message outputs to minimize verbosity. However, I suggest keeping at least the module name and the exact line of code because it helps immensely with debugging. |
@tomvothecoder For the most part, I did not change the log-output content. What I did was reduce the number of different "logger-setup" functions being called (now only the one in "_logger.py" is used, and named it "e2c_logger" to avoid conflicts. The log output itself has not (much) changed. But where the messages get logged (default logfile, console stream, etc) may have, due to changes in "propagate", and which handlers get registered with the logger, etc. CMOR (via "cmor.setup") establishes its own handlers into the logger, and I cannot but control the name of the log-file ("cmor_logs"). |
I created a new issue to explore including CMOR logs with the main |
From what I've seem, the "cmor_log" messages involve issues of CMIP6 conformance in the metadata. I tend to prefer to keep those (glaring) messages separate from the regular e2c messages - they hurt my eyes and are not amenable to machine-read. |
Yes I agree, the CMOR log messages aren't formatted well. I think it is a good idea to combine the e3sm_to_cmip and CMOR logs because it is easier to debug a single log file. Currently, when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 here is my review. I left suggestions and comments for you to address. Great work refactoring.
@tomvothecoder I think I covered most of the comments, save for a few. But I made the changes locally and cannot push them until I figure out the right way to do it. |
I was finally able to push through the "merge conflicts", and address most of the recommendations and comments Tom made. There are still some residual issues. Hopefully these can now stand-out and be more carefully addressed. I will need to do more testing before any merge-to-master occurs, but at least the major changes are now secured. |
My comment about combining logs was a separate suggestion and should not be included in this PR. You can move your comment above to #266.
I will do a re-review of this PR. There are also some failing unit tests that need to be fixed. Does this PR need to be merged any time soon? I'm assuming this is a nice-to-have enhancement and not a high priority/blocker, which means we can take our time reviewing. Correct? |
@tomvothecoder There is no rush as far as I am concerned - take your time. And of course, let me know where there are things I have neglected to address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @TonyB9000, I had some minor suggestions from this review and the previous review. Can you address them before we decide to merge? It looks like we're almost done. Thanks.
num_success = 0 | ||
num_failure = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 Can you open a separate GitHub issue for my comment above? If we decide to address it, then it should be performed in a separate PR.
e3sm_to_cmip/_logger.py
Outdated
|
||
def _setup_root_logger() -> str: # pragma: no cover | ||
"""Sets up the root logger. | ||
def _logger(name=None, logfilename=DEFAULT_LOG, set_log_level=None, to_console=False, to_logfile=False, propagate=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _logger(name=None, logfilename=DEFAULT_LOG, set_log_level=None, to_console=False, to_logfile=False, propagate=False): | |
def _logger(name=None, logfilename=DEFAULT_LOG, set_log_level=None, to_console=False, to_logfile=False, propagate=False): |
Suggestions
- Format function definition using
black
and use type annotations. - Rename
set_log_level
tolog_level
and passlogging
levels (e.g.,logging.level
). - Update docstring based on previous changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird - the "suggested change" (above) looks like no change, due to wrapping."
Addressed 2 and 3, and performed black.
Are type annotations really needed? There are scored of function definitions in mpas.py, etc, and none of them employ type annotations.
e3sm_to_cmip/_logger.py
Outdated
if set_log_level == "None" or set_log_level == "DEBUG": | ||
log_level = DEFAULT_LOG_LEVEL | ||
elif set_log_level == "INFO": | ||
log_level = logging.INFO | ||
elif set_log_level == "WARNING": | ||
log_level = logging.WARNING | ||
elif set_log_level == "ERROR": | ||
log_level = logging.ERROR | ||
elif set_log_level == "CRITICAL": | ||
log_level = logging.CRITICAL | ||
else: log_level = DEFAULT_LOG_LEVEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to remove this logic and make set_log_level
the logging
module levels instead of string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also suggest renaming set_log_level
to log_level
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned earlier - each of the handler modules would need to "import logging", just to be able to pass the defined values "logging.WARNING", "logging.INFO", etc. I can do that. - I don't suppose any significant overhead is incurred in having each handler "import" another module ("logging", on top of our "_logger.py"). It would certainly make "_logger" cleaner.
@@ -65,8 +66,7 @@ def handle(infiles, tables, user_input_path, **kwargs): | |||
# area_b is in square radians, so need to multiply by the earth_radius**2 | |||
ds[VAR_NAME] = earth_radius**2*area_b*ds[VAR_NAME] | |||
|
|||
setup_cmor(var_name=VAR_NAME, table_path=tables, table_name=TABLE, | |||
user_input_path=user_input_path) | |||
util.setup_cmor(var_name=VAR_NAME, table_path=tables, table_name=TABLE, user_input_path=user_input_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format using black
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run black
using pre-commit
or directly with the black
command.
Co-authored-by: Tom Vo <[email protected]>
Co-authored-by: Tom Vo <[email protected]>
Co-authored-by: Tom Vo <[email protected]>
Co-authored-by: Tom Vo <[email protected]>
Co-authored-by: Tom Vo <[email protected]>
@tomvothecoder Question: If I have accepted a pile of "commit changes" here on the web, but also want to edit and push larger changes from my local repo, how do I avoid clobbering one or the other? Can I "pull" from my remote PR to update my local repo first, before advancing new edits? NVMD - looks like "git pull" worked to fast-forward. |
Co-authored-by: Tom Vo <[email protected]>
…40+ modules to include logging module to accommodate, and addressed issues such as spurious newlines, or lack thereof.
…tups_and_logging' into consolidate_cmor_setups_and_logging
@tomvothecoder This should be VERY close now, to address your comments. I need to add the issue for "num_sucesses" vs "num_failures" in main.py, as this affects the conditions of exit. It may be necessary to make it another command-line option (exit on any failure, or return non-zero status on any failure, etc). |
@tomvothecoder I think I've addressed all of your comments. Could you re-review the changes? Thanks! |
Thank you @TonyB9000. I'm doing another review right now. A few notes:
|
Adding to my comment above, e3sm_to_cmip/_logger.py:45: error: Cannot resolve name "logger" (possible cyclic definition) [misc]
e3sm_to_cmip/_logger.py:45: note: Recursive types are not allowed at function scope
e3sm_to_cmip/util.py:24: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:502: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:512: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:528: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:552: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:561: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:571: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:645: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:649: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:662: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/handler.py:218: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:248: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:256: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:279: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:300: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:342: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:684: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:688: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:698: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:701: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:711: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/utils.py:22: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:22: error: Unexpected keyword argument "set_log_level" for "_logger"; did you mean "log_level"? [call-arg]
e3sm_to_cmip/cmor_handlers/utils.py:83: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:125: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:221: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:227: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/_logger.py:10: note: "_logger" defined here
e3sm_to_cmip/__main__.py:110: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:153: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:154: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:155: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:156: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:157: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:158: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:159: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:160: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:161: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:213: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:228: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:229: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:230: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:232: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:238: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:656: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:657: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:658: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:674: error: Unused "type: ignore" comment [unused-ignore]
e3sm_to_cmip/__main__.py:700: error: Incompatible types in assignment (expression has type "dict[str, Any]", variable has type "str | None") [assignment]
e3sm_to_cmip/__main__.py:708: error: Unsupported target for indexed assignment ("None") [index]
e3sm_to_cmip/__main__.py:710: error: Unsupported target for indexed assignment ("None") [index]
e3sm_to_cmip/__main__.py:716: error: Unused "type: ignore" comment [unused-ignore]
e3sm_to_cmip/__main__.py:752: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:772: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:773: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:774: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:836: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:861: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:865: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:878: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:970: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:972: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:985: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:989: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:990: error: Name "logger" is not defined [name-defined]
Found 65 errors in 5 files (checked 6 source files) I think an easier way to achieve a single global logger is to instantiate the logger once in Example (source):
|
I suggest we split up this PR into smaller PRs and merge the ones that are ready. I’m not 100% confident in the logging portion of this PR yet and think it can be done more cleanly/easily. It is also an enhancement and not a critical need. |
- Add type annotations to `_logger._logger` - Fix import order across modules using `isort`
@tomvothecoder Regarding your example for "achieve a single global logger", this was not my intent (although not a bad idea). I assumed different modules might want to configure logging differently (stream versus file, and propagate/not), and wanted a single and centralized call (found in _logger.py) to accomplish the work, based upon configuration choices. Also, I had not created the "_logger.py" module myself, and did not want to deviate too far from what I thought was its functionality. The main changes were to reduce the two functions (one for "root logger" one not) into a single function, and to provide "only console, only logfile, or both" as options. But I am certainly up for another approach. I did try to kill multiple birds with one stone - perhaps unwise. (ASIDE: I found it strange to have "main" be reduced to a single class that encapsulates everything, reducing the entire "main()" function call to "app = " and then "return app.run()". Is this a common pattern?) |
@tomvothecoder You asked: "Can you please paste the log output on main vs. with this PR?" I can do this, but a trial for atmos, land or ocean may produce different output, sue to their differences in handling. I'll work this up and provide the output. |
I have a bit more time and will look into a solution to logging. I implemented the
Can you clarify how you find this design pattern strange? From my experience, this is a common API design pattern for software programs. The core program is a class ( |
To split up this PR, you can:
|
@tomvothecoder @chengzhuzhang Having given this consideration, I am inclined to abandon this branch, create two new branches off of main (obtained by a fresh git-clone),
and then - reviewing each logged commit - re-introduce only the "cmor setup" or only the "logfile" work upon each branch. That way, if any commit happened to "overlap" (contain both sorts of changes), they can be easily kept separate. We can then test these changes separately as well. This seems the cleanest (and safest) approach, compared to branching off of a "muddy" branch and trying to selectively revert. |
Sounds good with me. Thank you Tony. |
@tomvothecoder @chengzhuzhang I am closing this issue/PR as abandoned, since it has been split into issue #275 (Consolidate cmor setup, since merged to master) and issue #274 (Redesign logging, soon to be PR'd). |
Description
This update to e2c addresses two areas of consolidation: setup_cmor functions (calls to cmor.setup) and e2c logging configurations. Other minor bug-fix and cleanup conducted. All changes tested against a sample from each variable type (Amon 2D, Amon 3D, day, 3hr, CFmon, Lmon, LImon, Omon, SImon).
CMOR SETUP CONSOLIDATION:
The motivation for cmor.setup reconfigurations is that this cmor function was called from 7 separate locations:
Aside from unnecessary replication of codes, each call to cmor.setup invokes a cmor-internal "logger" setup, whose only external accessibility is the name of the logfile to supply. There is no ability to alter its propagate(True/False), date-format or other attributes. (I believe it is responsible for the "duplicated log messages" problem, as only those messages with the CMOR date-format appear to be duplicated.)
Now, there are only TWO functions that invoke cmor.setup: util.setup_cmor(), and handler._setup_cmor_module(). The latter is applied to every handler managed through the "handlers.yaml" construct. The former handles all other variables. The function in mpas.py has been eliminated.
LOGGER CONSOLIDATION:
Likewise, the motivation in consolidating the logger functions is that python logging is "global" - different modules adding stream or file handlers to their loggers accumulate across all logging.
Now, there is a single "e2c_logger()" function in _logger.py, and it is called by ANY (code-accessible) module in e2c that requires logging. It can be configures to return the root logger, or a named-logger, to apply logging to a file or to console or both, and to propagate or not.
FIXING the problem where "--help" and "--version" cause a cmor_logs directory to be created in the user's current directory required some main-line refactoring. Previously, modules like mpas.py, handler.py, etc each would call their logger setup in the global space. That global space is activated whenever a module is "imported", irrespective of whether any functions defined within have been called. And ALL of these imports occur BEFORE we get a chance to parse the main command-line args (and discover that only --help or --version was being invoked.)
Now each of these modules has its own function to be called to set-up its logger. These are imported into "main" and invoked only AFTER arg-parsing is completed (and not at all for --help and --version calls).
CHANGES BY MODULE:
Aside from virtually ALL modules now using _logger:e2c_logger() for logging operations, the following changes have been made:
MINOR CHANGES:
Replaced some - but not all ".format()" string variable formatting with f"{va}" format style.
Checklist
If applicable: