-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo experiments with fixed dataset #71
Conversation
…riments to be run tonight
… on linux machine
use_cases/eluc/data/conversion.py
Outdated
if old_abbrev in MANUAL_MAP.keys() and MANUAL_MAP[old_abbrev] in codes_df["Numeric code"].unique(): | ||
countries_df.iloc[i]["abbrevs"] = codes_df[codes_df["Numeric code"] == MANUAL_MAP[old_abbrev]]["Alpha-2 code"].iloc[0] | ||
countries_df.loc[i, "abbrevs"] = codes_df[codes_df["Numeric code"] == MANUAL_MAP[old_abbrev]]["Alpha-2 code"].iloc[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was giving me a pandas warning because the old way will be deprecated soon.
@@ -169,6 +169,7 @@ def __init__(self, start_year=1851, test_year=2012, end_year=2022, countries=Non | |||
|
|||
self.train_df = df.loc[start_year:test_year-1] | |||
self.test_df = df.loc[test_year:end_year-1] | |||
assert self.train_df['time'].max() == self.test_df["time"].min() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some assertions to make sure the same mistake doesn't happen again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reran experiments with fixed dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved predictor significance portion of notebook to its own python script
…updating with new trained TorchPrescriptors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Note: The original paper trains from 1982 onwards but this is too slow and large for the\n", | ||
"# purpose of this example.\n", | ||
"forest.fit(dataset.train_df.loc[2002:][constants.NN_FEATS], dataset.train_df.loc[2002:][\"ELUC\"])\n", | ||
"forest.save(\"predictors/sklearn/trained_models/experiment_rf\")" | ||
"forest_year = 1982\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment above you say training from 1982 "is too slow and large for the purpose of this example". But now you do it. Should you remove the comment?
] | ||
} | ||
], | ||
"source": [ | ||
"forest.load(\"predictors/sklearn/trained_models/experiment_rf\")\n", | ||
"# TODO: I don't think we can possibly load a model this big\n", | ||
"# forest.load(\"predictors/sklearn/trained_models/no_overlap_rf\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You saved the model a few cells above. So it's not too big to be loaded. Should you remove the TODO?
Fully trained torch prescriptors and updated experiments
… so it should be fine
Previously the dataset overlapped the year 2012 in the train and test set. This redoes all the results in the notebooks with the updated dataset. Additionally, predictor significance is moved to a script rather than notebook.