-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset metrics #514
Dataset metrics #514
Conversation
Needs dependencies on the scripts in the ninja build statements, still, so that rebuild triggers when utils update |
"""Get the name of the build directory for this flow""" | ||
return "dataset_metrics" | ||
|
||
def add_ninja_deps(self, deps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is required if it's just calling super. The behavior will be the same if you remove it.
|
||
def sort_metrics(metrics): | ||
"""Sort the values for each metric in the dictionary.""" | ||
for ip, _ in metrics.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to loop through keys, just use:
for ip in metrics:
@KeenanRileyFaulkner It looks like this is failing CI. Anything I can help with? |
Sorry. I didn't realize there was a formatting issue. I'm out of town I'm right now, but will try to fix it when I have a chance this weekend. |
@jgoeders All tests are passing now |
Adds two scripts to bfasst, allowing for efficient computation of metrics on randsoc-based graphs for gnn training datasets. The metrics can be computed ad-hoc through the use of command line options. Bfasst is used to iterate through a dataset, run process_graph.py on each graph in it, then run accumulate_metrics.py at the end.
If things run for too long, and you don't want to wait for all instances of process_graph.py to finish, kill the run and invoke the accumulation script on the output directory directly.
The final output is contained in a file called
summary_stats.log
, with individual metrics for each instance of each ip contained inmaster_metrics.log
. accumulate_metrics.py will overwrite these files every time, but options are available to specify a different output name, if the utility is used directly.