Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo experiments with fixed dataset #71

Merged
merged 11 commits into from
May 15, 2024
Merged

Redo experiments with fixed dataset #71

merged 11 commits into from
May 15, 2024

Conversation

danyoungday
Copy link
Collaborator

Previously the dataset overlapped the year 2012 in the train and test set. This redoes all the results in the notebooks with the updated dataset. Additionally, predictor significance is moved to a script rather than notebook.

@danyoungday danyoungday added the bug Something isn't working label Mar 13, 2024
@danyoungday danyoungday requested a review from ofrancon March 13, 2024 17:04
@danyoungday danyoungday self-assigned this Mar 13, 2024
if old_abbrev in MANUAL_MAP.keys() and MANUAL_MAP[old_abbrev] in codes_df["Numeric code"].unique():
countries_df.iloc[i]["abbrevs"] = codes_df[codes_df["Numeric code"] == MANUAL_MAP[old_abbrev]]["Alpha-2 code"].iloc[0]
countries_df.loc[i, "abbrevs"] = codes_df[codes_df["Numeric code"] == MANUAL_MAP[old_abbrev]]["Alpha-2 code"].iloc[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was giving me a pandas warning because the old way will be deprecated soon.

@@ -169,6 +169,7 @@ def __init__(self, start_year=1851, test_year=2012, end_year=2022, countries=Non

self.train_df = df.loc[start_year:test_year-1]
self.test_df = df.loc[test_year:end_year-1]
assert self.train_df['time'].max() == self.test_df["time"].min() - 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some assertions to make sure the same mistake doesn't happen again

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reran experiments with fixed dataset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved predictor significance portion of notebook to its own python script

Copy link
Member

@ofrancon ofrancon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

"metadata": {},
"outputs": [],
"source": [
"# Note: The original paper trains from 1982 onwards but this is too slow and large for the\n",
"# purpose of this example.\n",
"forest.fit(dataset.train_df.loc[2002:][constants.NN_FEATS], dataset.train_df.loc[2002:][\"ELUC\"])\n",
"forest.save(\"predictors/sklearn/trained_models/experiment_rf\")"
"forest_year = 1982\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment above you say training from 1982 "is too slow and large for the purpose of this example". But now you do it. Should you remove the comment?

]
}
],
"source": [
"forest.load(\"predictors/sklearn/trained_models/experiment_rf\")\n",
"# TODO: I don't think we can possibly load a model this big\n",
"# forest.load(\"predictors/sklearn/trained_models/no_overlap_rf\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You saved the model a few cells above. So it's not too big to be loaded. Should you remove the TODO?

@danyoungday danyoungday merged commit 87f5f97 into main May 15, 2024
1 check passed
@danyoungday danyoungday deleted the redo-results branch May 15, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants