-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LR with priors initial implementation #66
Conversation
Wiz Scan Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
@@ -57,17 +59,22 @@ def sklearn_surrogate( | |||
unique_values, counts = np.unique(vector_second, return_counts=True) | |||
|
|||
# Establish min support for this type of ranking. | |||
if counts[0] < len(unique_values) * (2**5): | |||
estimate_feature_importance = 0 | |||
# if counts[0] < len(unique_values) * (2**5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove such comments
estimate_feature_importance = 1 + \ | ||
np.median(estimate_feature_importance_list) | ||
else: | ||
X = np.concatenate((X,vector_first.reshape(-1, 1)), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a space missing after X
, wondering why lint didn't catch that. @miha-jenko maybe some idea?
X = np.concatenate((X,vector_first.reshape(-1, 1)), axis=1) | ||
X = transf.fit_transform(X) | ||
estimate_feature_importance_list = cross_val_score( | ||
clf, X, vector_second, scoring='neg_log_loss', cv=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the num. of folds to top of the file as a constant for now
outrank/core_ranking.py
Outdated
@@ -130,9 +130,13 @@ def mixed_rank_graph( | |||
# Map the scoring calls to the worker pool | |||
pbar.set_description('Allocating thread pool') | |||
|
|||
reference_model_features = {} | |||
if 'prior' in args.heuristic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you check for -prior
at some point, but prior
at some other point. Consider creating a helper function is_prior_heuristic
or something, that unifies this behavior (and centralizes it)
outrank/core_ranking.py
Outdated
) | ||
if args.reference_model_JSON != '': | ||
model_combinations = extract_features_from_reference_JSON(args.reference_model_JSON, combined_features_only = True) | ||
model_combinations = [tuple(sorted(combination.split(','))) for combination in model_combinations] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combination delimiter could be a const, as it repeats
outrank/core_ranking.py
Outdated
random.shuffle(full_combination_space) | ||
full_combination_space = full_combination_space[ | ||
: args.combination_number_upper_bound | ||
] | ||
if is_prior_heuristic(args): | ||
full_combination_space = full_combination_space + [tuple for tuple in model_combinations if tuple not in full_combination_space] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this second part list(set(model_combinations).difference(set(full_combination_space)))
outrank/core_ranking.py
Outdated
@@ -225,7 +244,7 @@ def compute_combined_features( | |||
pbar.set_description('Concatenating into final frame ..') | |||
input_dataframe = pd.concat([input_dataframe, tmp_df], axis=1) | |||
del tmp_df | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this space
outrank/core_utils.py
Outdated
"""Given a model's JSON, extract unique features""" | ||
|
||
with open(json_path) as jp: | ||
content = json.load(jp) | ||
|
||
unique_features = set() | ||
feature_space = content['desc'].get('features', []) | ||
if full_feature_space: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full_feature_space
sounds somewhat odd for a flag that computes a set
outrank/core_utils.py
Outdated
@@ -641,3 +644,10 @@ def summarize_rare_counts( | |||
final_df.to_csv( | |||
f'{args.output_folder}/feature_sparsity_summary.tsv', index=False, sep='\t', | |||
) | |||
|
|||
|
|||
def is_prior_heuristic(args: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing return type
No description provided.