-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "other" industry demand #355
Add "other" industry demand #355
Conversation
…en/add-chemical-industry
…into add-other-industry-demand
…ther-industry-demand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py
and other_industry.py
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised that the PR title mentions only "other", so feel free to ignore my "chemical_industry.py" comments.
If compared to the commit of #340 at the same time point, the only change is indeed that Now the branch of #340 has developed quite far (e.g. integrating the JRC data processing script), and these branches need to be merged with some effort. However, #340 has never developed the functionality of chemical industry or "other" industry, so functionality wise they are still separated. |
This PR now integrates the JRC module code and
|
@irm-codebase ready to rebase onto develop now that #340 is merged in! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brynpickering ready for review!
CHE pre-processing was leading to some funky issues, so I've added comments on how I dealt with it.
@@ -90,8 +81,11 @@ def fill_missing_countries_years( | |||
_to_fill = _to_fill.bfill(dim="year") | |||
all_filled = _to_fill.ffill(dim="year") | |||
|
|||
all_filled = jrc.ensure_standard_coordinates(all_filled) | |||
all_filled = all_filled.assign_attrs(units="twh") | |||
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHE was triggering assert
failures due to some missing data. For now I am just assuming those values are 0 (same as SCEC).
This is mostly because the CHE processing above this module does not seem to provide data for these sectors, meaning they are filled with nan
in all years, so none of our filling methods work.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some minor changes to make. I'll go through it again tomorrow by running the code and checking the results at different points. Would be good if a data consistency check existed somewhere, i.e. that no energy demand is lost / added.
modules/industry/config.yaml
Outdated
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
specific-industries: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this param. "specific" isn't very descriptive and "industries" might be better as "subsectors". subsectors-to-decarbonise
, subsectors-to-electrify
, electrified-subsectors
, ... ? I don't like any of those particularly but maybe they can trigger a better idea 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separated-subsectors
, subsectors-to-process-individually
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to non-generic-categories
and changed other
to generic-config
.
Hopefully this makes processing clearer.
modules/industry/industry.smk
Outdated
output: | ||
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_industry.py" | ||
if "Iron and steel" in config["params"]["specific-industries"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best way to add this conditionality? @timtroendle maybe you can comment.
The other approach would be to use a conditional list as inputs in a later rule, e.g.:
rule merge_industry_demands:
input:
specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))
where subsector_translator
is a helper function to map e.g. Steel and Iron
to steel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I have something useful to say as I am not fully aware of the context. I wonder: Why would steel industry be excluded? Is there a use-case for that? If not, then there is no conditionality needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not a "specific" industry then it gets automatically lumped in as "other". So it's possible to overhaul the steel industry to decarbonise feedstocks or to just pipe all demands without converting any processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best way to add this conditionality? @timtroendle maybe you can comment.
The other approach would be to use a conditional list as inputs in a later rule, e.g.:
rule merge_industry_demands: input: specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))where
subsector_translator
is a helper function to map e.g.Steel and Iron
tosteel
.
I like this approach because it's pretty easy to follow and does not add much complexity.
For now I'll keep it as-is since the merging step needs development (SCEC also has a scaling step there).
modules/industry/industry.smk
Outdated
conda: CONDA_PATH | ||
params: | ||
config_params = config["params"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a moment to separate out params. Pass specific-industries
and other
to the script individually so that e.g. steel
param changes don't re-trigger this rule.
modules/industry/industry.smk
Outdated
input: | ||
output: f"{BUILD_PATH}/other_industry.csv" | ||
path_energy_balances = config["inputs"]["path-energy-balances"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since inputs are all paths, I tend to prefer not prepending with path_
here and path-
in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep path
to avoid confusion between variables holding data and those holding strings.
A bit more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if all inputs hold strings...? How about config["input-paths"]["energy-balances"]
etc?
path_jrc_industry_production: str, | ||
path_output: Optional[str] = None, | ||
) -> xr.DataArray: | ||
"""Execute the default data processing pipeline all non-specific industries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Execute the default data processing pipeline all non-specific industries. | |
"""Merge all industries not selected for individual processing into a single `other` subsector using a default data processing pipeline. |
# Process data: | ||
# Extract useful dem. -> remove useful dem. from rest -> extract final dem. | ||
selected_useful = config_params["other"]["useful-demands"] | ||
other_useful_demand = jrc.convert_subsec_demand_to_carrier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsec -> subsector. Not worth the lack of readibility caused by removal of letters
|
||
final_method = config_params["other"]["final-energy-method"] | ||
jrc_energy = jrc_energy.drop_sel(subsection=selected_useful) | ||
if final_method == "priority": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other methods on the nearterm horizon? If not, it doesn't seem worth introducing this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another method that just keeps all the final demand without assumptions. There are probably better methods to process this part, but these two are good enough for now.
I changed the check to a match
-case
style so this can grow over time.
) | ||
|
||
# Fix the naming | ||
for carrier in JRC_TO_CALLIOPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite verbose. Something like:
other_demand.coords["carrier_name"] = other_demand["carrier_name"].to_series().rename(index=JRC_TO_CALLIOPE).index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually removed the renaming step from this file. It makes more sense to do it once we merge all category files into one and re-scale it (if necessary).
def transform_final_demand_by_priority( | ||
jrc_energy: xr.Dataset, carrier_priority: list[str] | ||
) -> xr.DataArray: | ||
"""Transform final demand of all sectors by giving priority to certain carriers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "category" (see prev. comment).
@brynpickering I've implemented your comments, and a couple of extras. The biggest updates are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost happy for this one to go through! Just a couple of minor comments on naming and unit checking.
modules/industry/config.yaml
Outdated
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
steel: | |||
non-generic-categories: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned that these will cause confusion down the line. How about "discrete-categories" and "merged-categories"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of "explicit" initially too. I then had a look in the dictionary and decided it wasn't quite right (it's more about leaving nothing implied). I'm not sure that "discrete" is right either. "separate"? "independent"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I also struggled with names. A good name has to do two things:
- Specify that these are specific / individual to a given category.
- Imply that all the rests will go to the "other" / generic / merged.
Here are words with the antonym, based on your comments.
- Specific / generic categories
- Separate / combined categories < --- I like this one the best.
- Independent / dependent categories
- Explicit / implicit categories
- Discrete / joined categories
I suggest "separate-category-processing" and "combined-category-processing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think "subsectors" is clearer, because "category" is generic. "category" is also an implementation detail: should we move to a different data source than JRC, that name would change. Anyways, "category" is fine by me.
I like the "separate" and "combined" distinction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timtroendle : I also do not like category, but keeping the code close to the data helps in this case because JRC-IDEES is already hard to wrap your head around due to the amount of columns it has (sector/category/subcategory/process/energy type...)
A different data source is going to require a different approach. This (and the JRC module above it) are just very tied to how the data was constructed. I think this is unavoidable for things as heterogeneous as industry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, "Separate" / "combined" sounds good!
modules/industry/config.yaml
Outdated
recycled-steel-share: 0.5 # % of recycled scrap steel for H-DRI | ||
generic-config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"merged-categories-config"?
modules/industry/industry.smk
Outdated
input: | ||
output: f"{BUILD_PATH}/other_industry.csv" | ||
path_energy_balances = config["inputs"]["path-energy-balances"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if all inputs hold strings...? How about config["input-paths"]["energy-balances"]
etc?
modules/industry/industry.smk
Outdated
output: f"{BUILD_PATH}/other_industry.csv" | ||
script: f"{SCRIPT_PATH}/other_industry.py" | ||
path_output = f"{BUILD_PATH}/annual_demand_generic.nc" | ||
script: f"{SCRIPT_PATH}/generic_processing.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"merged_category_processing.py"? I would always have category
added to whatever adjective we choose!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Why not use "subsector" instead of "category"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@irm-codebase is using "category" as it aligns with the JRC-IDEES naming convention. I don't really mind if it's subsector or category, so long as it remains the same throughout the whole submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timtroendle: Bryn is right on this one. Category is not a word we use often, but it makes sense int the context of JRC-IDEES data. I'd like to keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it become subsector
at some point in the module, as it gets closer to being a Calliope model input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would make sense! the idea is:
jrc categories -> specific/combined scripts -> combine/scale/disaggregate scripts -> calliope subsector
jrc_prod = jrc_prod.drop_sel(cat_name=non_generic_categories) | ||
|
||
# Process data: | ||
# Extract useful dem. -> remove useful dem. from rest -> extract final dem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dem -> demand
other_final_demand = transform_final_demand_by_priority( | ||
jrc_energy, generic_config["final-energy-carriers"] | ||
) | ||
case "keep everything": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this doesn't lead to any double counting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert statement would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brynpickering it shouldn't. It's basically "assume nothing, do nothing with the data and just give me the final demand.
If a useful demand is requested, the lines above should remove it. So no double counting is possible.
@@ -90,8 +81,11 @@ def fill_missing_countries_years( | |||
_to_fill = _to_fill.bfill(dim="year") | |||
all_filled = _to_fill.ffill(dim="year") | |||
|
|||
all_filled = jrc.ensure_standard_coordinates(all_filled) | |||
all_filled = all_filled.assign_attrs(units="twh") | |||
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.
@@ -64,16 +92,15 @@ def get_subsection_final_intensity( | |||
final_intensity = useful_intensity / carrier_eff | |||
|
|||
# Prettify | |||
final_intensity = ensure_standard_coordinates(final_intensity) | |||
final_intensity = final_intensity.assign_attrs(units="twh/kt") | |||
final_intensity = standardize(final_intensity, "twh/kt", name="final_intensity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth checking the unit of the incoming data before setting this to twh/kt
. I.e., useful_demand
should have the unit twh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually setting it because xarray
, for some horrible reason, decided to actually delete attributes by default.
The library is limited to handling only twh
as inputs.
I'll add a check at the start to ensure this is specified.
Another option would be to use pint (https://xarray.dev/blog/introducing-pint-xarray), but without standardising this on the JRC side there is not much benefit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually setting it because xarray, for some horrible reason, decided to actually delete attributes by default.
At what point in the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operations in general will delete xarray attributes: https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#what-is-your-approach-to-metadata
So, basically everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this in the xarray options, but I do not think it's a good idea unless we implement an actual unit checker. Better to lose them and mindfully set them than risk leaving invalid attributes during our operations.
useful_intensity = ensure_standard_coordinates(useful_intensity) | ||
useful_intensity = useful_intensity.assign_attrs(units="twh/kt") | ||
useful_intensity.name = "useful_intensity" | ||
useful_intensity = standardize(useful_intensity, "twh/kt", name="useful_intensity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above RE checking unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. I have a few minor comments.
CHANGELOG.md
Outdated
@@ -4,7 +4,7 @@ | |||
|
|||
### Added (models) | |||
|
|||
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical, "other". (Fixes #308, #310, #347, #345 and #346) | |||
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical. (Fixes #308, #309, #310, #347, #345 and #346) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "ADD industry module including steel and other industry energy demand"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it with a simpler message. I'll add chemical industry once that is done.
modules/industry/config.yaml
Outdated
@@ -9,5 +9,10 @@ industry: | |||
placeholder-out1: | |||
placeholder-out2: | |||
params: | |||
steel: | |||
non-generic-categories: ["Iron and steel", "Chemicals Industry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?
modules/industry/industry.smk
Outdated
@@ -13,36 +13,46 @@ validate(config, "./schema.yaml") | |||
|
|||
# Ensure rules are defined in order. | |||
# Otherwise commands like "rules.rulename.output" won't work! | |||
rule steel_industry: | |||
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES." | |||
if "Iron and steel" in config["params"]["non-generic-categories"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition doesn't seem necessary to me. Instead of making the rule conditional, make the inputs to a rule downstream conditional.
So, if "iron and steel" is in the config, then a downstream rule will require the file f"{BUILD_PATH}/annual_demand_steel.nc"
. You will need that anyways, right? Would be good to see how this is integrated eventually.
modules/industry/industry.smk
Outdated
path_output = f"{BUILD_PATH}/annual_demand_steel.nc" | ||
script: f"{SCRIPT_PATH}/steel_processing.py" | ||
|
||
if "Chemicals Industry" in config["params"]["non-generic-categories"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
modules/industry/industry.smk
Outdated
output: f"{BUILD_PATH}/other_industry.csv" | ||
script: f"{SCRIPT_PATH}/other_industry.py" | ||
path_output = f"{BUILD_PATH}/annual_demand_generic.nc" | ||
script: f"{SCRIPT_PATH}/generic_processing.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Why not use "subsector" instead of "category"?
carrier_eff = carrier_tot["useful"] / carrier_tot["final"] | ||
|
||
# Fill NaNs (where there is demand, but no consumption in that country) | ||
# First by country avg. (all years), then by year avg. (all countries). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should have a ASSUME statement so we can find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've also added it to other parts.
}) | ||
|
||
# Prettify | ||
new_carrier_useful_dem = standardize(new_carrier_useful_dem, "twh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be inlined with line 188.
jrc_prod = xr.open_dataarray(path_jrc_industry_production) | ||
|
||
# Remove data from all specifically processed industries | ||
cat_names_df = cat_names_df[~cat_names_df["jrc_idees"].isin(non_generic_categories)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to explicitly list the "generic_categories"/"generically_modelled_subsectors" instead of using all that are non_generic?
This would (1) document better which subscectors are included here, (2) safe-guard that list to possible changes in the list in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid that, because it introduces the risk of not processing an added category.
Besides, JRC-IDESS is unfortunately quite old (2015), and this processing is very tied to it. Do not see this is a future issue.
I have my sights on this other dataset: https://iopscience.iop.org/article/10.1088/2753-3751/ad4e39
but it is going to require a different module with its own steps.
other_final_demand = transform_final_demand_by_priority( | ||
jrc_energy, generic_config["final-energy-carriers"] | ||
) | ||
case "keep everything": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert statement would be helpful.
|
||
other_demand = jrc.standardize(other_demand, "twh") | ||
|
||
if path_output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? If it's needed, than the typt hint of the return type in the function signature must be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you mean path_output
, because the function just returns an xarray (to aid in testing). The file is saved only as an option.
Otherwise, the function would always return a None
, making testing it harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed for testing purposes, I'd recommend to let the function return a dataset which is stored somewhere else, as discussed in the last dev call. Otherwise you increase complexity of the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled. That is already what this is doing: the return other_demand
is an xarray. You could choose to save it or evaluate it directly in a test.
I'll just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if more fixes are needed
path-carrier-names: config/energy-balances/energy-balance-carrier-names.csv | ||
path-jrc-industry-energy: build/data/jrc-idees/industry/processed-energy.nc | ||
path-jrc-industry-production: build/data/jrc-idees/industry/processed-production.nc | ||
input-paths: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name change requested by Bryn. This change is also reflected in rule scripts.
conda: CONDA_PATH | ||
params: | ||
config_steel = config["params"]["steel"] | ||
non_generic_categories = config["params"]["non-generic-categories"], | ||
generic_config = config["params"]["generic-config"], | ||
input: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added to the "combine" rule.
# output: | ||
# script: | ||
SUFFIXES = [i.lower().replace(" ", "_") for i in config["params"]["specific-categories"]] | ||
rule combine_and_scale: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new rule makes sure that both "combined" and "specific" JRC categories are processed.
In the future they will be combined and scaled (chemicals industry must be finished first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the requested name changes show here, too.
# Combine and fill missing countries | ||
other_demand = xr.concat( | ||
[other_useful_demand, other_final_demand], dim="carrier_name" | ||
) | ||
|
||
assert other_demand.sum() < jrc_energy["final"].sum(), "Potential double counting!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple assert to avoid double counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be <=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I'll make a quick update.
@@ -6,6 +6,13 @@ | |||
STANDARD_COORDS = ["cat_name", "year", "country_code", "carrier_name"] | |||
|
|||
|
|||
def check_units(jrc_energy: xr.Dataset, jrc_prod: xr.DataArray) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method to ensure JRC data has the right format.
carrier_eff = carrier_tot["useful"] / carrier_tot["final"] | ||
|
||
# Fill NaNs (where there is demand, but no consumption in that country) | ||
# First by country avg. (all years), then by year avg. (all countries). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've also added it to other parts.
conda: CONDA_PATH | ||
params: | ||
config_steel = config["params"]["steel"] | ||
non_generic_categories = config["params"]["non-generic-categories"], | ||
generic_config = config["params"]["generic-config"], | ||
input: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of being explicit rather than implicit, I would add them here, too.
conda: CONDA_PATH | ||
params: | ||
config_steel = config["params"]["steel"] | ||
non_generic_categories = config["params"]["non-generic-categories"], | ||
generic_config = config["params"]["generic-config"], | ||
input: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the "combine" rule is downstream of this, so this won't work, right? I would definitely make the dependencies explicit, then you do not even have to think about this at all.
# Combine and fill missing countries | ||
other_demand = xr.concat( | ||
[other_useful_demand, other_final_demand], dim="carrier_name" | ||
) | ||
|
||
assert other_demand.sum() < jrc_energy["final"].sum(), "Potential double counting!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be <=
?
@timtroendle quick fix for the The aggregated rule runs with no problems! |
No idea why the docs are not working. It's not related to my changes, for what I can see. |
carrier_eff = carrier_eff.fillna(carrier_eff.mean(dim="year")) | ||
carrier_eff = carrier_eff.fillna(carrier_eff.mean(dim="country_code")) | ||
|
||
carrier_final_demand = useful_dem_tot / carrier_eff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.
This then becomes an issue for line 165. useful_dem_tot.sum()
will likely be larger than carrier_final_demand.sum()
because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that I found this function very confusing when converting it from SCEC's code. @brynpickering please correct if I say something wrong here.
This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.
This is intended. Basically, if a carrier was never used to meet a specific useful demand, it's efficiency (carrier_eff
) should be full of nan
values.
This then becomes an issue for line 165. useful_dem_tot.sum() will likely be larger than carrier_final_demand.sum() because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.
I see your point. How about this?
assert jrc_energy["useful].sel(carrier_name=carrier) < carrier_final_demand.sum(), "Creating energy!"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tud-mchen6 that being said, I think this function is only used in the "other"/combined industries processing, even in the old SCEC code. Just please make sure that this is the actual function you wish to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tud-mchen6 that being said, I think this function is only used in the "other"/combined industries processing, even in the old SCEC code. Just please make sure that this is the actual function you wish to use.
As I understand, that is not true. The old SCEC code uses electrical_consumption
in the function get_chem_energy_consumption
, and electrical_consumption
is generated by the function get_carrier_demand
, which is a similar version to get_carrier_demand
in jrc_idees_parser.py
in the old industry module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that I found this function very confusing when converting it from SCEC's code. @brynpickering please correct if I say something wrong here.
This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.
This is intended. Basically, if a carrier was never used to meet a specific useful demand, it's efficiency (
carrier_eff
) should be full ofnan
values.This then becomes an issue for line 165. useful_dem_tot.sum() will likely be larger than carrier_final_demand.sum() because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.
I see your point. How about this?
assert jrc_energy["useful].sel(carrier_name=carrier) < carrier_final_demand.sum(), "Creating energy!"
Your suggestion at the end looks good, except that you may need to add a .sum()
to the left hand side?
Fixes #309 .
Adding "other" industry in the industry module.
#340 is a prerequisite for this PR.
Checklist
Any checks which are not relevant to the PR can be pre-checked by the PR creator. All others should be checked by the reviewer. You can add extra checklist items here if required by the PR.