-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor app with new predictor/prescriptor #87
Conversation
…till lots of work to do.
…nces to new predictor/prescriptor
…with new cand id format
df = pd.read_csv(constants.DATA_FILE_PATH, index_col=constants.INDEX_COLS) | ||
countries_df = regionmask.defined_regions.natural_earth_v5_0_0.countries_110.to_dataframe() | ||
df = pd.read_csv(app_constants.DATA_FILE_PATH, index_col=app_constants.INDEX_COLS) | ||
df.rename(columns={col + ".1": col for col in app_constants.INDEX_COLS}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little hacky because since we need time/lat/lon as context as well as indices we have 2 time/lat/lon columns: time, lat, lon, time.1, lat.1, lon.1 which we need to separate.
"hidden_size": 16, | ||
"out_size": len(constants.RECO_COLS) | ||
} | ||
prescriptor = TorchPrescriptor(None, encoder, None, 1, candidate_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently how we handle the prescriptor loading. We save a snapshot of the encoder and hard-code the candidate params (which could also theoretically be snapshotted after training later).
# If we have no prescription just return an empty chart | ||
if all(slider == 0 for slider in sliders): | ||
return utils.create_treemap(pd.Series([]), type_context=False, year=year) | ||
return utils.create_treemap(pd.Series([], dtype=float), type_context=False, year=year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a warning that the default pandas datatype is not float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely simple data processing script. In the future we may want to handle time/lat/lon being an index and context in a more elegant fashion.
from . import Predictor | ||
|
||
|
||
class Encoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this anymore because we have data.ELUCEncoder now
@@ -106,27 +72,17 @@ def create_check_options(values: list) -> list: | |||
"value": val}) | |||
return options | |||
|
|||
|
|||
def compute_percent_change(context: pd.Series, presc: pd.Series) -> float: | |||
def context_presc_to_df(context: pd.Series, presc: pd.Series) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new function to get the demo to work with the new prescriptors. Since the prescriptor wants a context_actions dataframe, we have to convert the context which we get from our dataset and the prescriptions which we get from our sliders into a context_actions_df by computing the diff and processing the zero change columns.
"Primary Vegetation", "primf", "primn", | ||
"Secondary Vegetation", "secdf", "secdn", | ||
"Urban", | ||
"Fields", "pastr", "range"] | ||
parents = ["", title, | ||
title, "Crops", "Crops", "C3", "C3", "C3", "C4", "C4", | ||
title, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to redo the treemap code because we no longer have specific crop types.
plo = px.colors.qualitative.Plotly | ||
dar = px.colors.qualitative.Dark24 | ||
#['crop', 'pastr', 'primf', 'primn', 'range', 'secdf', 'secdn', 'urban', 'nonland] | ||
colors = [plo[4], plo[0], plo[2], dar[14], plo[5], plo[7], dar[2], plo[3], plo[1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redid colors for treemap because there are no longer crop types
if type_context: | ||
fig.update_layout(showlegend=False) | ||
# To make up for the hidden legend | ||
fig.update_layout(margin={"t": 50, "b": 50, "l": 50, "r": 50}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to remove this code that added margins to the left pie chart because with the new size of the pie charts (since the extra crop sliders got removed so the chart is smaller) it's no longer necessary
predictors["Global Random Forest"] = global_rf | ||
|
||
return predictors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is currently hard-coded to load these predictors. We can handle this more elegantly but for now users aren't submitting models and ultimately we want to be in control of the models that get added, we aren't doing it automatically, so this in theory should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's fine
""" | ||
with open(path, "r", encoding="utf-8") as file: | ||
fields = json.load(file) | ||
return cls(fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows us to load an encoder from a fields file. This enables us to use a pretrained one in the demo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important we use the same encoder as the one that was used for training indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some linting changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly linting changes but there are some modifications to how candidate ids are saved to align with ESP's method
"parents": self.parents, | ||
"NSGA-II_rank": self.rank, # Named this to match ESP | ||
"distance": self.distance, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we save our candidate id as "id" in our logs. Not a huge fan of this change since "id" is a python builtin so we can't name our variables that but it matches ESP. This is still nicer than what I had before which was using gen and id as a double index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to match ESP anymore.
distance is the NSGA-II_distance
though, right? Maybe we
id
could become cid
for candidate id if needed. But dictionary keys don't conflict with python buildins so as long as we don't call our variables id
we're fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all the training code from this class to prescriptors/nsga2/trainer.py
@@ -30,7 +30,7 @@ | |||
print("Initializing prescription...") | |||
if "seed_dir" in config["evolution_params"].keys(): | |||
config["evolution_params"]["seed_dir"] = Path(config["evolution_params"]["seed_dir"]) | |||
tp = TorchPrescriptor( | |||
tp = TorchTrainer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses new trainer object instead of directly training a TorchPrescriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cut and pasted from TorchPrescriptor.
use_cases/eluc/tests/test_app.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests for some functions in the app. Still not nearly robust enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old tests that tested the compute_percent_change function from the demo now adapted to our new compute_percent_change in Prescriptor class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
recursive=y | ||
|
||
fail-under=9.0 | ||
fail-under=9.65 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍
Little by little we can get to 10.
use_cases/eluc/app/process_data.py
Outdated
""" | ||
Main function that loads the data and saves it. | ||
""" | ||
dataset = ELUCData(APP_START_YEAR-1, APP_START_YEAR, 2022) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to know, without looking at the code, that the params are start_year
, test_year
and end_year
. Maybe you can name the params here.
Should 2022
be a constant too? It's hard to know what where loading here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to be more clear with comments and pass the args in as kwargs
:param presc: Prescribed land use data | ||
:return: Percent land use change | ||
Takes a context with all columns and a presc with RECO_COLS and returns an updated context actions df. | ||
This df takes the difference between the RECO_COLS in presc and context and sets the DIFF_RECO_COLS to that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameters should still be documented, but we can take care of that later
predictors = {} | ||
nn_path = "danyoung/eluc-global-nn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move to constants
predictors["Global Random Forest"] = global_rf | ||
|
||
return predictors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's fine
@@ -17,7 +17,7 @@ | |||
"RUS": 643, | |||
"N": 578, | |||
"F": 250, | |||
"J": 388, | |||
# "J": 388, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was wrong with "J"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple countries with J in the converter I found. At some point the country code to name conversion scheme may have to be overhauled because it doesn't work for some countries.
""" | ||
with open(path, "r", encoding="utf-8") as file: | ||
fields = json.load(file) | ||
return cls(fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important we use the same encoder as the one that was used for training indeed.
"parents": self.parents, | ||
"NSGA-II_rank": self.rank, # Named this to match ESP | ||
"distance": self.distance, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to match ESP anymore.
distance is the NSGA-II_distance
though, right? Maybe we
id
could become cid
for candidate id if needed. But dictionary keys don't conflict with python buildins so as long as we don't call our variables id
we're fine.
Deleted old demo predictor/prescriptor code and updated app to use current predictor/prescriptors. Additionally refactored torch prescriptor so that the prescriptor object was standalone from the trainer.
Still todo after this PR:
Automate data downloading for the app. Currently spinning up the app requires you to run the process_data.py script however this script downloads the entire dataset before processing it. We should upload a demo dataset so that no processing has to be done by the app.
Currently predictor list is hard-coded and we will need to make it easier to add user models.
Prescriptors still follow ESP convention where we have a single directory with lots of models saved from a training run. Since we want to be able to add heuristics and other users' trained prescriptors we need to change the scope of the Prescriptor object from a group of prescriptors to a single model with context input and actions output.