Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset metrics #514

Merged
merged 22 commits into from
Dec 9, 2024
Merged

Dataset metrics #514

merged 22 commits into from
Dec 9, 2024

Conversation

KeenanRileyFaulkner
Copy link
Contributor

Adds two scripts to bfasst, allowing for efficient computation of metrics on randsoc-based graphs for gnn training datasets. The metrics can be computed ad-hoc through the use of command line options. Bfasst is used to iterate through a dataset, run process_graph.py on each graph in it, then run accumulate_metrics.py at the end.

If things run for too long, and you don't want to wait for all instances of process_graph.py to finish, kill the run and invoke the accumulation script on the output directory directly.

The final output is contained in a file called summary_stats.log, with individual metrics for each instance of each ip contained in master_metrics.log. accumulate_metrics.py will overwrite these files every time, but options are available to specify a different output name, if the utility is used directly.

@KeenanRileyFaulkner
Copy link
Contributor Author

Needs dependencies on the scripts in the ninja build statements, still, so that rebuild triggers when utils update

"""Get the name of the build directory for this flow"""
return "dataset_metrics"

def add_ninja_deps(self, deps):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required if it's just calling super. The behavior will be the same if you remove it.


def sort_metrics(metrics):
"""Sort the values for each metric in the dictionary."""
for ip, _ in metrics.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to loop through keys, just use:

for ip in metrics:

@jgoeders
Copy link
Member

jgoeders commented Dec 5, 2024

@KeenanRileyFaulkner It looks like this is failing CI. Anything I can help with?

@KeenanRileyFaulkner
Copy link
Contributor Author

Sorry. I didn't realize there was a formatting issue. I'm out of town I'm right now, but will try to fix it when I have a chance this weekend.

@KeenanRileyFaulkner
Copy link
Contributor Author

@jgoeders All tests are passing now

@jgoeders jgoeders merged commit 1461dd1 into main Dec 9, 2024
18 checks passed
@jgoeders jgoeders deleted the dataset_metrics branch December 9, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants