-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter Tuning Code Integration #193
base: master
Are you sure you want to change the base?
Conversation
… what idea to use
config/egfr.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I add the gold standard to this?
edge_freq.to_csv(OUT_DIR + 'node-ensemble.csv', sep="\t", index=False) | ||
assert filecmp.cmp(OUT_DIR + 'node-ensemble.csv', EXPECT_DIR + 'expected-node-ensemble.csv', shallow=False) | ||
|
||
def test_precision_recal_curve_ensemble_nodes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how else to test the ensemble node outputs other than looking at the image
# adds evaluation per algorithm per dataset-goldstandard pair | ||
# evalution per algortihm will not run unless ml include and ml aggregate_per_algorithm is set to true | ||
aggregate_per_algorithm: true | ||
# TODO: should we decouple parts of eval that involve ml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of coupling happening now. I put in a solution for now in config.py, but is it worth separating the functions into their own true/ false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe deal with some of the coupling by giving warnings and stopping the flow rather than silently shutting things off
@@ -142,8 +142,14 @@ def pca(dataframe: pd.DataFrame, output_png: str, output_var: str, output_coord: | |||
if not isinstance(labels, bool): | |||
raise ValueError(f"labels={labels} must be True or False") | |||
|
|||
scaler = StandardScaler() | |||
#TODO: MinMaxScaler changes nothing about the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it is better to use StandardScalar or MinMaxScalar for the binary data
for file in file_paths: | ||
df = pd.read_table(file, sep="\t", header=0, usecols=["Node1", "Node2"]) | ||
# TODO: do we want to include the pathways that are empty for evaluation / in the pr_df? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the code will add a precision and recall for empty pathways. Is that something we shouldn't include?
final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-per-pathway.png',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs)) | ||
final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-pca-chosen-pathway.txt',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs)) | ||
final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-curve-ensemble-nodes.png',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs,algorithm_params=algorithms_with_params)) | ||
# TODO: should we provide the node ensemble frequencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are already calculating the node ensembles, should we give it to the user?
No description provided.