Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Command line interface update #980

Closed
wants to merge 4 commits into from
Closed

Conversation

mjuric
Copy link
Member

@mjuric mjuric commented Jul 12, 2024

This PR reworks the command line arguments to follow GNU conventions (resolving #978), and updates how we do logging.

Changes:

  • Use - in long arg names, single characters for short args.
  • Change the name of some parameters to (hopefully) better reflect what they're about
  • Set reasonable defaults for the outputs, so they don't have to be specified. In particular, if the user doesn't override it with --output-dir or --prefix, the outputs (and the log) will all be stored into sorcha-results* files (where the suffixes depend on the format, the file being stored, etc). You can see that in the screenshot (note the sorcha-results.csv , sorcha-results-stats.csv, and sorcha-results.log files).
  • Change how we handle the logs -- by default, write a /single/ log file named sorcha-results.log and append to it (don't overwrite). This solves the problem of accumulating many inconveniently named log files generated for every run. The user can still easily check if there were errors in the log by grepping for "ERROR".

An example of how it all works is shown in the first attached screenshot.

When it comes to logging, we now log the PID on each line, so it's easy to pull out a specific run with something like grep 68218 sorcha-results.log. The second screenshot shows what the log looks like now.

image
image

Review Checklist for Source Code Changes

  • Does pip install still work?
  • Have you written a unit test for any new functions?
  • Do all the units tests run successfully?
  • Does Sorcha run successfully on a test set of input files/databases?
  • Have you used black on the files you have updated to confirm python programming style guide enforcement?

@mjuric mjuric self-assigned this Jul 12, 2024
Copy link

codecov bot commented Jul 12, 2024

Codecov Report

Attention: Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.82%. Comparing base (1bb235c) to head (fada53b).
Report is 1 commits behind head on main.

Files Patch % Lines
src/sorcha/modules/PPCommandLineParser.py 95.83% 1 Missing ⚠️
src/sorcha/sorcha.py 0.00% 1 Missing ⚠️
src/sorcha/utilities/sorchaArguments.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #980      +/-   ##
==========================================
+ Coverage   80.60%   80.82%   +0.22%     
==========================================
  Files          69       69              
  Lines        3078     3072       -6     
==========================================
+ Hits         2481     2483       +2     
+ Misses        597      589       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mschwamb
Copy link
Collaborator

I am all for the changes in principle. I have some concerns I would like to see addressed.

Having to write out the sorcha command a few times for my runs manually, this is now much more I have to write. Like --integrator-data be -i ? and --force be -f ?

I find the categories input, output, and advanced confusing - it's not what you get with most man pages/--help so I am happy to have changes made but then I strongly prefer required and optional be the two categories, rather than inputs, outputs, advanced. It's not what most people are used to when they look at man pages and --help for commands. Also -f forcing output to overwrite existing files is not really an advanced feature so I find it confusing it is listed in advanced. In the current version of the code base there is the ability to make the tally file (now --stats in this PR). It is an optional output. It is not required. From the way the current arguments are listed I would think it is required. So can we please use required and optional arguments for the classes here.

We call one of the input files he physical parameters (colors, H, phase curves) file and the complex parameters file (things that add-ons needs like for light curve or cometary activity parameters). I really do not want to have --colors because that's going to be confusing when the rest of the help document and the paper talks about the physical parameters file and that file contains crucially H, so I would prefer it not be called the colors file.

same thing for --extra-object-data - I do not intuitively know what that means and give it's called complex physical parameters in the documentation I would prefer we don't change it here.

I don't love --intergrator-data though maybe it's - maybe it's better than what we currently have.

@mschwamb
Copy link
Collaborator

A strong no to the change in the log. I found it helpful in my runs to have separate log files per run. I somehow had the super computer running processes twice and it was causing issues with my output and it was easy to figure out what happened because two log files appeared in my run directory - I believe it this case they would combine into one log file and I would have to figure out by looking there are two processes IDs

The multiple log files is only an issue for Rubin operations. I think you can find away around this without changing the code as its primarily job is for running on HPCs and mistakes like running processes twice can happen. seeing multiple log files makes it easy to see what happened and also someone may push all output for simulations to the ame output directory which would then cause havoc on the log file.

log_location,
log_format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s ",
log_fn,
log_format="[%(asctime)s|%(process)d] %(levelname)s [%(name)s:%(lineno)d] %(message)s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process id needs to be in the log file name so I am against adding it here

@mjuric
Copy link
Member Author

mjuric commented Jul 13, 2024

Having to write out the sorcha command a few times for my runs manually, this is now much more I have to write. Like --integrator-data be -i ? and --force be -f ?

Easy to add -f (I'll do it), but think about it -- if you keep using --force to the point where you'd like to shorten it, shouldn't clobbering be the default?

I find the categories input, output, and advanced confusing - it's not what you get with most man pages/--help so I am happy to have changes made but then I strongly prefer required and optional be the two categories, rather than inputs, outputs, advanced. It's not what most people are used to when they look at man pages and --help for commands. Also -f forcing output to overwrite existing files is not really an advanced feature so I find it confusing it is listed in advanced. In the current version of the code base there is the ability to make the tally file (now --stats in this PR). It is an optional output. It is not required. From the way the current arguments are listed I would think it is required. So can we please use required and optional arguments for the classes here.

Sounds good -- that's just help output grouping, easy to change.

We call one of the input files he physical parameters (colors, H, phase curves) file and the complex parameters file (things that add-ons needs like for light curve or cometary activity parameters). I really do not want to have --colors because that's going to be confusing when the rest of the help document and the paper talks about the physical parameters file and that file contains crucially H, so I would prefer it not be called the colors file.

How about --photometric-parameters (or just --photometry, or --phot-params, or even --seds... the shorter the better, IMHO)?

To a dynamicist physical parameters are the mass, density, diameter, shape (or moments), strength, etc. (e.g. googling "asteroid physical parameters" surfaces this and this as first hits).

same thing for --extra-object-data - I do not intuitively know what that means and give it's called complex physical parameters in the documentation I would prefer we don't change it here.

How about --complex-photometric-parameters (or --complex-phot-params)?

Good point about the docs -- I forgot to update them.

(edit: 15 minutes later, I think I like --phot-params and --complex-phot-params)

@mjuric
Copy link
Member Author

mjuric commented Jul 13, 2024

For the logs, it comes from me getting disoriented with the number of files generated in a sorcha dir, as every invocation generates a new log file (including when you get the command-line arguments wrong). It's a pain to keep cleaning those up... willing to bet some small amount the end-users will hate it too :).

That said, you're totally right we need to have jobs write into separate log files (and outputs!) for each invocation. This is what --prefix is meant to take care of in the above scheme. Specifying --prefix changes the prefix of all outputs (what you called the "stem"), and you can use this in scheduler scripts. E.g., with SLURM, you'd write something like:

sorcha run --prefix="sorcha-outputs-$SLURM_JOB_ID" ...

in the submission script, which would then generate (for example, for job ID 42):

sorcha-outputs-42.csv
sorcha-outputs-42-stats.csv
sorcha-outputs-42.log

(allowing all jobs to share the same directory for output & logging).

Some options:

  1. We could retain the previous behavior, but add a --log-file [dest.log] switch which if present would redirect the log to (e.g) dest.log. My thinking was that most folks want something like this anyway, so making it the default seemed to make sense. Users can then use --prefix to cleanly manage cluster runs (not just for logs but for outputs as well).
  2. Or we could default to the new behavior (store the log into <prefix>.log by default), but add a command-line argument to activate the previous behavior if that's preferable for one's use case. E.g., something like --log-dest PID (or some better name?).
  3. Something else...

@mschwamb
Copy link
Collaborator

I think I like --phot-params and --complex-phot-params

We've had lots of discussions on the names of these files last July. Physical parameters and complex physical parameters is the names we came up with. We're not going to change them now. There's lots in the documentation and the paper to finish - that the priority needs to be on finishing this rather than should we rename files.

@mschwamb
Copy link
Collaborator

So let's keep physical parameters and complex physical parameters as the descriptors for the non-orbit input files for the simulated populations

@mschwamb
Copy link
Collaborator

mschwamb commented Jul 13, 2024

That said, you're totally right we need to have jobs write into separate log files (and outputs!) for each invocation. This is what --prefix is meant to take care of in the above scheme. Specifying --prefix changes the prefix of all outputs (what you called the "stem"), and you can use this in scheduler scripts. E.g., with SLURM, you'd write something like...

I somehow accidentally ran multiple identical sorcha commands on bura (strange things are going on when looking at what jobs are running on bura) and because I was using csv they just wrote on top of eachother/appended together. The only reason I new something was up was because I had two log files. Your method would means the log would append and I would have to actually read and understand the format of the output to understand what was happening. Seeing two log files actively being updated in the directory told me exactly what was going on. I think I had -f on, but I imagine most people want to have the -f option at least available so we shouldn't get rid of it but the log files give the safety of a double check. Your proposed method would mean I would have to take a detailed look at the log file. Most people will not.

I can maybe see the argument that the log will should have the prefix but the process ID needs to be there so it's always unique.

Happy to discuss on a call rather than here.

@mjuric
Copy link
Member Author

mjuric commented Jul 14, 2024

Allright, sounds like there's a no-go on this... 😢

@mjuric mjuric closed this Jul 14, 2024
This change adds a --process-subset option to `sorcha run`, which will make
Sorcha process only the specified subset of the input orbit file. Initially,
we allow the subsets to be specified as:

	--process-subset <split>/<nsplits>

i.e.

	--process-subset 2/5

In the example above, Sorcha would process the 2nd fifth of orbits in the
file.

This comes in handy when distributing the work on multi-core or multi-node
machines. I.e., if launching a SLURM job array, you'd write:

	sorcha run ... --process-subset $SLURM_ARRAY_TASK_ID/$SLURM_ARRAY_TASK_COUNT

in the SLURM submit script (launch the script with, say, `sbatch --array=1-5
...`)
@mschwamb mschwamb reopened this Jul 14, 2024
@mschwamb
Copy link
Collaborator

Closing again because aspects of this will be taken

@mschwamb mschwamb closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants