Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing FILENAME_MAX in runtime-directory #17059

Closed

Conversation

rchinmay
Copy link
Contributor

Replaces char arrays in runtime directory, which used FILENAME_MAX macro for its size, to dynamic memory allocation using malloc and free functions.

@rchinmay
Copy link
Contributor Author

Resolves #8757 for runtime directory.
@bradcray @daviditen Please review.
Thanks,
R. Chinmay

@bradcray

This comment has been minimized.

@rchinmay

This comment has been minimized.

@bradcray

This comment has been minimized.

@rchinmay

This comment has been minimized.

@bradcray

This comment has been minimized.

Signed-off-by: R Chinmay <[email protected]>
@@ -170,6 +170,17 @@ static char* chpl_launch_create_command(int argc, char* argv[],
#else
mypid = 0;
#endif
sysFilename=(char *)chpl_mem_allocMany((strlen(baseSysFilename) +
snprintf(NULL, 0, "%d", (int)mypid) + 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not seen this pattern before—clever. Given that this same expression appears three times in a row, how about storing it into an int and then re-using that int in these three malloc calls. I.e., something like:

int mypidBuffLen = snprintf(NULL, 0, "%d", (int)mypid);

strlen(argv[0]) + strlen(tempArg);
tmpStdoutFileNoFmt = (char *)chpl_mem_allocMany(tmpStdoutFileNoFmtLen,
sizeof(char), CHPL_RT_MD_FILENAME, -1, 0);
sprintf(tmpStdoutFileNoFmt, format, tmpDir, argv[0], tempArg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like this where you've computed and stored the buffer length, wouldn't it be safer / more future-proof to use snprintf() rather than sprintf() (to avoid buffer overrun in case someone changes one part of the malloc/snprintf pattern in the future and doesn't change the other part to match?). Essentially, I'm curious whether there was a rationale for using sprintf() in some cases and snprintf() in others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradcray

I'm curious whether there was a rationale for using sprintf() in some cases and snprintf() in others.

I kept the usage of snprintf() and sprintf() the same as in the code before changes (this PR). But, now since we have the lengths of the char arrays with us, it would be better (safer) to convert sprintf() to snprintf(), if we have the lengths of the char array already stored. This would also be more robust to changes in future, rather than sprintf(). I will make these changes.
Thanks,
R. Chinmay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your rationale makes sense, but if you're willing to make that change while here, I think that'd be great, thank you!

@bradcray
Copy link
Member

I think this is getting close, thanks @rchinmay (and apologies again for my slow response... you probably regret getting me as the lead reviewer).

I wanted to tag @ronawho here to see if he had thoughts about what I should do to test this before merging (I suspect there's no easy answer given that it's distinct launchers; but maybe I could test them all locally using asan just to make sure they "failed to launch" and didn't have memory errors, and then left the stress testing for the nightlies?)

Also tagging @cassella since he was the impetus behind the original issue #8757.

@ronawho
Copy link
Contributor

ronawho commented Feb 18, 2021

I wanted to tag @ronawho here to see if he had thoughts about what I should do to test this before merging (I suspect there's no easy answer given that it's distinct launchers; but maybe I could test them all locally using asan just to make sure they "failed to launch" and didn't have memory errors, and then left the stress testing for the nightlies?)

That's an interesting idea, but it'll probably fail in unhelpful ways today since some launchers call things like sinfo or cnselect to build the launch commands so you'll probably get early failures. I wonder if we could add a --dry-run option or something that would fake values for calls and then print out what command the launcher would use. Kinda like -v, but don't actually launch and don't require launcher utilities like cnselect/sinfo

For this PR, I think just ping me when you're ready for testing and I'll spot check a few launchers.

@bradcray
Copy link
Member

some launchers call things like sinfo or cnselect to build the launch commands

Oh, good point.

For this PR, I think just ping me when you're ready for testing and I'll spot check a few launchers.

That'd be very nice, thanks. I'd also be curious whether you'd prefer to hold off on merging this PR until after the release to live with it for awhile, or whether you think between your spot-checking and nightly testing we'd probably be safe enough.

@cassella
Copy link
Contributor

Hrm. In launch-pbs-aprun.c, I'm seeing the function genQsubOptions() sprintf()s into the global expectFilename:

static char* genQsubOptions(char* genFilename, char* projectString, qsubVersion qsub, 
                            int32_t numLocales, int32_t numCoresPerLocale) {
...
  char *qsubFilename = expectFilename;
...

  if (generate_qsub_script) {
    pid_t mypid = debug ? 0 : getpid();
    sprintf(qsubFilename, "qsub.%s-%d", genFilename, (int) mypid);

which may be a longer string.

This is pretty gross. If it were up to me, I'd make expectFilename local to chpl_launch_create_argv(), and give genQsubOptions() its own char * to manage.

(It's not straightforward since genQsubOptions() uses the buffer and returns it in one case, and returns it unmodified in the other. I think this is all just so the file didn't need to have two static buffers in it. If so, I think it would be cleaner if they didn't try to share like that. I don't know how much of a mess it'll be to straighten that out.)

@bradcray
Copy link
Member

@cassella : Just to make sure I'm following, your last comment is unrelated to this specific effort (retiring the use of FILENAME_MAX), is that right? (I understand that it's thematically related in that it relates to how launcher string buffers are allocated and populated).

Assuming so, I propose we consider that a pre-existing condition and not worry about it here. If something about this PR makes the existing code less attractive than it was, then I'd worry about it here.

@cassella
Copy link
Contributor

cassella commented Feb 24, 2021

@bradcray No, it is a new problem with this PR.

Currently, genQsubOptions() sprintf()s into the expectFilename buffer. Its pattern is small and the buffer is FILENAME_MAX, so that's probably fine (or at least, it has been thus far).

With the PR, it's still sprintf()ing into the same buffer, but now the buffer is sized precisely according to the batch filename and pid. Depending on what genQsubOptions()'s genFilename is, it may be sprintfing something larger than the buffer.

The "if it were up to me" part is just what I'd do about it, versus (I guess?) sizing the single array to the max of the two uses' lengths.

@ronawho
Copy link
Contributor

ronawho commented Mar 1, 2021

That'd be very nice, thanks. I'd also be curious whether you'd prefer to hold off on merging this PR until after the release to live with it for awhile, or whether you think between your spot-checking and nightly testing we'd probably be safe enough.

At this point I'd probably hold off until after the release. (I was hoping to get to this last week, but other release tasks have been taking up all my time)

@bradcray
Copy link
Member

@rchinmay : Now that we're through with the release, I wanted to ping you to see whether you're still interested in pursuing this effort? I think there were a few minor clean-up items in my last review that it looked like you'd intended to address; and then there's the more fundamental issue that @cassella brings up above.

@rchinmay
Copy link
Contributor Author

@bradcray, Thanks for informing on the release. I saw @cassella 's comment and I am currently working on it to solve the problem. Along with that, I will also make the clean-ups suggested by you.
Thanks,
R. Chinmay.

@bradcray
Copy link
Member

Great, thanks for the update!

@bradcray
Copy link
Member

@rchinmay : I'm looking through PRs that I'm associated with today and wanted to check in on this one. Are you still working on it? Thanks.

@bradcray
Copy link
Member

I'm going to park this until someone has a chance to get back to it.

@riftEmber
Copy link
Member

Picking this work back up in #26381, which incorporates the commits from this PR branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants