-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mem variables #4692
Add mem variables #4692
Conversation
@@ -224,11 +224,33 @@ def get_job_overrides(self, job, case): | |||
if thread_count: | |||
overrides["thread_count"] = thread_count | |||
else: | |||
total_tasks = case.get_value("TOTALPES") * int(case.thread_count) | |||
total_tasks = case.get_value("TOTALPES") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change scares me a lot. Looking at a couple lines below, I see total_tasks being multiplied by thread_count. It makes no sense how that ever worked because that would have the thread_count multiplied twice. So I would approve of the change if that was the only use of total_tasks, but it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it because I the behavior I saw was that total_tasks was totalpesthread_countthread_count (twice). With this removed, total_tasks=totalpes*thread_count which I think is the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha so this was just a bug then. So task_count
from a jobs override is equivalent to TOTALPES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedwards4b , I do see the multiply below on line 229 is definitely wrong if we keep the original code for line 227. Are you saying case.get_value("TOTALPES") already takes threads into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If task_count
and TOTALPES
are equivalent then this makes more sense.
total_tasks = case.get_value("TOTALPES")
thread_count = case.thread_count
total_tasks *= thread_count
if int(total_tasks) < case.get_value("MAX_TASKS_PER_NODE"):
overrides["max_tasks_per_node"] = int(total_tasks)
CIME/XML/env_batch.py
Outdated
@@ -224,11 +224,33 @@ def get_job_overrides(self, job, case): | |||
if thread_count: | |||
overrides["thread_count"] = thread_count | |||
else: | |||
total_tasks = case.get_value("TOTALPES") * int(case.thread_count) | |||
total_tasks = case.get_value("TOTALPES") | |||
thread_count = case.thread_count | |||
if int(total_tasks) * int(thread_count) < case.get_value("MAX_TASKS_PER_NODE"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change this to:
if int(total_tasks) < case.get_value("MAX_TASKS_PER_NODE"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct.
try: | ||
mem_per_task = case.get_value("MEM_PER_TASK") | ||
max_mem_per_node = case.get_value("MAX_MEM_PER_NODE") | ||
mem_per_node = total_tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this to:
mem_per_node = case.get_value("TOTALPES")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do that because total_tasks may be the product of an override - for example the case.st_archive script overrides total_tasks and changes it to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about mem_per_node = total_tasks / thread_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better, thanks!
PBS on derecho requires specifying memory requirement per node, this pr provides that capability. MEM_PER_TASK and MAX_MEM_PER_NODE are defined in cmeps, and supported here. This is easily extendable to other systems if needed.
Test suite: scripts_regression_tests, ERP_Ln9_P24x3.f45_f45_mg37.QPWmaC6.derecho_intel.cam-outfrq9s_mee_fluxes
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes
User interface changes?:
Update gh-pages html (Y/N)?: