Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "other" industry demand #355

Merged

Conversation

tud-mchen6
Copy link

Fixes #309 .

Adding "other" industry in the industry module.

#340 is a prerequisite for this PR.

Checklist

Any checks which are not relevant to the PR can be pre-checked by the PR creator. All others should be checked by the reviewer. You can add extra checklist items here if required by the PR.

  • CHANGELOG updated
  • Minimal workflow tests pass
  • Tests added to cover contribution (not relevant)
  • Documentation updated (not relevant)
  • Configuration schema updated (not relevant)

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py and other_industry.py files?

modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
modules/industry/src/chemicals_industry.py Outdated Show resolved Hide resolved
Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised that the PR title mentions only "other", so feel free to ignore my "chemical_industry.py" comments.

modules/industry/src/other_industry.py Outdated Show resolved Hide resolved
modules/industry/src/other_industry.py Outdated Show resolved Hide resolved
modules/industry/src/other_industry.py Outdated Show resolved Hide resolved
@brynpickering brynpickering mentioned this pull request Apr 12, 2024
5 tasks
@tud-mchen6
Copy link
Author

@tud-mchen6 could you confirm that the only change in this PR compared to #340 is the chemicals_industry.py and other_industry.py files?

If compared to the commit of #340 at the same time point, the only change is indeed that other_industry.py is added and relevant rule is added in the industry.smk file.

Now the branch of #340 has developed quite far (e.g. integrating the JRC data processing script), and these branches need to be merged with some effort. However, #340 has never developed the functionality of chemical industry or "other" industry, so functionality wise they are still separated.

@irm-codebase irm-codebase added the Industry Industrial energy demand label Apr 23, 2024
@irm-codebase
Copy link
Contributor

This PR now integrates the JRC module code and xarray processing.
Also, I've added some quality of life processing for "other industries":

  • You can now "turn off" the processing of specific sectors via the configuration. All non specific sectors will be parsed through the "other" rule.
  • You can now select which carriers to extract at a final energy level and at an end-use level.
  • You can select the method of extraction. For now only "priority" is available, which reflects how SCEC does it.

@brynpickering
Copy link
Member

@irm-codebase ready to rebase onto develop now that #340 is merged in!

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brynpickering ready for review!

CHE pre-processing was leading to some funky issues, so I've added comments on how I dealt with it.

@@ -90,8 +81,11 @@ def fill_missing_countries_years(
_to_fill = _to_fill.bfill(dim="year")
all_filled = _to_fill.ffill(dim="year")

all_filled = jrc.ensure_standard_coordinates(all_filled)
all_filled = all_filled.assign_attrs(units="twh")
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHE was triggering assert failures due to some missing data. For now I am just assuming those values are 0 (same as SCEC).

This is mostly because the CHE processing above this module does not seem to provide data for these sectors, meaning they are filled with nan in all years, so none of our filling methods work.

Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor changes to make. I'll go through it again tomorrow by running the code and checking the results at different points. Would be good if a data consistency check existed somewhere, i.e. that no energy demand is lost / added.

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
specific-industries: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this param. "specific" isn't very descriptive and "industries" might be better as "subsectors". subsectors-to-decarbonise, subsectors-to-electrify, electrified-subsectors, ... ? I don't like any of those particularly but maybe they can trigger a better idea 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated-subsectors, subsectors-to-process-individually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to non-generic-categories and changed other to generic-config.
Hopefully this makes processing clearer.

output:
path_output = f"{BUILD_PATH}/annual_demand_steel.nc"
script: f"{SCRIPT_PATH}/steel_industry.py"
if "Iron and steel" in config["params"]["specific-industries"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to add this conditionality? @timtroendle maybe you can comment.

The other approach would be to use a conditional list as inputs in a later rule, e.g.:

rule merge_industry_demands:
  input:
    specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))

where subsector_translator is a helper function to map e.g. Steel and Iron to steel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I have something useful to say as I am not fully aware of the context. I wonder: Why would steel industry be excluded? Is there a use-case for that? If not, then there is no conditionality needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a "specific" industry then it gets automatically lumped in as "other". So it's possible to overhaul the steel industry to decarbonise feedstocks or to just pipe all demands without converting any processes

Copy link
Contributor

@irm-codebase irm-codebase Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to add this conditionality? @timtroendle maybe you can comment.

The other approach would be to use a conditional list as inputs in a later rule, e.g.:

rule merge_industry_demands:
  input:
    specific_industries = expand(f"{BUILD_PATH}/annual_demand_{subsector}.nc", subsector=subsector_translator(config["params"]["specific-industries"]))

where subsector_translator is a helper function to map e.g. Steel and Iron to steel.

I like this approach because it's pretty easy to follow and does not add much complexity.
For now I'll keep it as-is since the merging step needs development (SCEC also has a scaling step there).

modules/industry/industry.smk Show resolved Hide resolved
conda: CONDA_PATH
params:
config_params = config["params"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a moment to separate out params. Pass specific-industries and other to the script individually so that e.g. steel param changes don't re-trigger this rule.

input:
output: f"{BUILD_PATH}/other_industry.csv"
path_energy_balances = config["inputs"]["path-energy-balances"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since inputs are all paths, I tend to prefer not prepending with path_ here and path- in the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep path to avoid confusion between variables holding data and those holding strings.
A bit more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if all inputs hold strings...? How about config["input-paths"]["energy-balances"] etc?

path_jrc_industry_production: str,
path_output: Optional[str] = None,
) -> xr.DataArray:
"""Execute the default data processing pipeline all non-specific industries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Execute the default data processing pipeline all non-specific industries.
"""Merge all industries not selected for individual processing into a single `other` subsector using a default data processing pipeline.

# Process data:
# Extract useful dem. -> remove useful dem. from rest -> extract final dem.
selected_useful = config_params["other"]["useful-demands"]
other_useful_demand = jrc.convert_subsec_demand_to_carrier(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsec -> subsector. Not worth the lack of readibility caused by removal of letters


final_method = config_params["other"]["final-energy-method"]
jrc_energy = jrc_energy.drop_sel(subsection=selected_useful)
if final_method == "priority":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other methods on the nearterm horizon? If not, it doesn't seem worth introducing this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another method that just keeps all the final demand without assumptions. There are probably better methods to process this part, but these two are good enough for now.

I changed the check to a match-case style so this can grow over time.

)

# Fix the naming
for carrier in JRC_TO_CALLIOPE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite verbose. Something like:

other_demand.coords["carrier_name"] = other_demand["carrier_name"].to_series().rename(index=JRC_TO_CALLIOPE).index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually removed the renaming step from this file. It makes more sense to do it once we merge all category files into one and re-scale it (if necessary).

def transform_final_demand_by_priority(
jrc_energy: xr.Dataset, carrier_priority: list[str]
) -> xr.DataArray:
"""Transform final demand of all sectors by giving priority to certain carriers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsectors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "category" (see prev. comment).

@irm-codebase
Copy link
Contributor

@brynpickering I've implemented your comments, and a couple of extras.

The biggest updates are:

  • improved names of stuff to reduce ambiguity
  • standardized naming from "sector" to "category" to match how JRC names things
  • carrier names are no longer modified (better do it when aggregating/rescaling files)
  • added an additional processing option for final energy demand

Copy link
Member

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost happy for this one to go through! Just a couple of minor comments on naming and unit checking.

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
steel:
non-generic-categories: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned that these will cause confusion down the line. How about "discrete-categories" and "merged-categories"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of "explicit" initially too. I then had a look in the dictionary and decided it wasn't quite right (it's more about leaving nothing implied). I'm not sure that "discrete" is right either. "separate"? "independent"?

Copy link
Contributor

@irm-codebase irm-codebase Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I also struggled with names. A good name has to do two things:

  • Specify that these are specific / individual to a given category.
  • Imply that all the rests will go to the "other" / generic / merged.

Here are words with the antonym, based on your comments.

  • Specific / generic categories
  • Separate / combined categories < --- I like this one the best.
  • Independent / dependent categories
  • Explicit / implicit categories
  • Discrete / joined categories

I suggest "separate-category-processing" and "combined-category-processing".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think "subsectors" is clearer, because "category" is generic. "category" is also an implementation detail: should we move to a different data source than JRC, that name would change. Anyways, "category" is fine by me.

I like the "separate" and "combined" distinction!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timtroendle : I also do not like category, but keeping the code close to the data helps in this case because JRC-IDEES is already hard to wrap your head around due to the amount of columns it has (sector/category/subcategory/process/energy type...)

A different data source is going to require a different approach. This (and the JRC module above it) are just very tied to how the data was constructed. I think this is unavoidable for things as heterogeneous as industry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, "Separate" / "combined" sounds good!

recycled-steel-share: 0.5 # % of recycled scrap steel for H-DRI
generic-config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merged-categories-config"?

input:
output: f"{BUILD_PATH}/other_industry.csv"
path_energy_balances = config["inputs"]["path-energy-balances"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if all inputs hold strings...? How about config["input-paths"]["energy-balances"] etc?

output: f"{BUILD_PATH}/other_industry.csv"
script: f"{SCRIPT_PATH}/other_industry.py"
path_output = f"{BUILD_PATH}/annual_demand_generic.nc"
script: f"{SCRIPT_PATH}/generic_processing.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merged_category_processing.py"? I would always have category added to whatever adjective we choose!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Why not use "subsector" instead of "category"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irm-codebase is using "category" as it aligns with the JRC-IDEES naming convention. I don't really mind if it's subsector or category, so long as it remains the same throughout the whole submodule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timtroendle: Bryn is right on this one. Category is not a word we use often, but it makes sense int the context of JRC-IDEES data. I'd like to keep it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it become subsector at some point in the module, as it gets closer to being a Calliope model input?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would make sense! the idea is:
jrc categories -> specific/combined scripts -> combine/scale/disaggregate scripts -> calliope subsector

jrc_prod = jrc_prod.drop_sel(cat_name=non_generic_categories)

# Process data:
# Extract useful dem. -> remove useful dem. from rest -> extract final dem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dem -> demand

other_final_demand = transform_final_demand_by_priority(
jrc_energy, generic_config["final-energy-carriers"]
)
case "keep everything":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this doesn't lead to any double counting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert statement would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brynpickering it shouldn't. It's basically "assume nothing, do nothing with the data and just give me the final demand.

If a useful demand is requested, the lines above should remove it. So no double counting is possible.

@@ -90,8 +81,11 @@ def fill_missing_countries_years(
_to_fill = _to_fill.bfill(dim="year")
all_filled = _to_fill.ffill(dim="year")

all_filled = jrc.ensure_standard_coordinates(all_filled)
all_filled = all_filled.assign_attrs(units="twh")
# TODO: CHE has no values for "Wood and wood products" and "Transport Equipment".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have much choice on that. I assume they are in "other industry" in the CHE data so we can't extract them.

@@ -64,16 +92,15 @@ def get_subsection_final_intensity(
final_intensity = useful_intensity / carrier_eff

# Prettify
final_intensity = ensure_standard_coordinates(final_intensity)
final_intensity = final_intensity.assign_attrs(units="twh/kt")
final_intensity = standardize(final_intensity, "twh/kt", name="final_intensity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth checking the unit of the incoming data before setting this to twh/kt. I.e., useful_demand should have the unit twh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually setting it because xarray, for some horrible reason, decided to actually delete attributes by default.
The library is limited to handling only twh as inputs.

I'll add a check at the start to ensure this is specified.

Another option would be to use pint (https://xarray.dev/blog/introducing-pint-xarray), but without standardising this on the JRC side there is not much benefit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually setting it because xarray, for some horrible reason, decided to actually delete attributes by default.

At what point in the process?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operations in general will delete xarray attributes: https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#what-is-your-approach-to-metadata

So, basically everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this in the xarray options, but I do not think it's a good idea unless we implement an actual unit checker. Better to lose them and mindfully set them than risk leaving invalid attributes during our operations.

useful_intensity = ensure_standard_coordinates(useful_intensity)
useful_intensity = useful_intensity.assign_attrs(units="twh/kt")
useful_intensity.name = "useful_intensity"
useful_intensity = standardize(useful_intensity, "twh/kt", name="useful_intensity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above RE checking unit

Copy link
Member

@timtroendle timtroendle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I have a few minor comments.

CHANGELOG.md Outdated
@@ -4,7 +4,7 @@

### Added (models)

* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical, "other". (Fixes #308, #310, #347, #345 and #346)
* **ADD** industry module and steel industry energy demand processing. NOT CONNECTED TO THE MAIN WORKFLOW. Industry sectors pending: chemical. (Fixes #308, #309, #310, #347, #345 and #346)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "ADD industry module including steel and other industry energy demand"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it with a simpler message. I'll add chemical industry once that is done.

@@ -9,5 +9,10 @@ industry:
placeholder-out1:
placeholder-out2:
params:
steel:
non-generic-categories: ["Iron and steel", "Chemicals Industry"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "explicit-categories" or "explicit-subsectors" or even "explicitly-modelled-subsectors"?

@@ -13,36 +13,46 @@ validate(config, "./schema.yaml")

# Ensure rules are defined in order.
# Otherwise commands like "rules.rulename.output" won't work!
rule steel_industry:
message: "Calculate energy demand for the 'Iron and steel' sector in JRC-IDEES."
if "Iron and steel" in config["params"]["non-generic-categories"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition doesn't seem necessary to me. Instead of making the rule conditional, make the inputs to a rule downstream conditional.

So, if "iron and steel" is in the config, then a downstream rule will require the file f"{BUILD_PATH}/annual_demand_steel.nc". You will need that anyways, right? Would be good to see how this is integrated eventually.

path_output = f"{BUILD_PATH}/annual_demand_steel.nc"
script: f"{SCRIPT_PATH}/steel_processing.py"

if "Chemicals Industry" in config["params"]["non-generic-categories"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

output: f"{BUILD_PATH}/other_industry.csv"
script: f"{SCRIPT_PATH}/other_industry.py"
path_output = f"{BUILD_PATH}/annual_demand_generic.nc"
script: f"{SCRIPT_PATH}/generic_processing.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Why not use "subsector" instead of "category"?

carrier_eff = carrier_tot["useful"] / carrier_tot["final"]

# Fill NaNs (where there is demand, but no consumption in that country)
# First by country avg. (all years), then by year avg. (all countries).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should have a ASSUME statement so we can find it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also added it to other parts.

})

# Prettify
new_carrier_useful_dem = standardize(new_carrier_useful_dem, "twh")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be inlined with line 188.

jrc_prod = xr.open_dataarray(path_jrc_industry_production)

# Remove data from all specifically processed industries
cat_names_df = cat_names_df[~cat_names_df["jrc_idees"].isin(non_generic_categories)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to explicitly list the "generic_categories"/"generically_modelled_subsectors" instead of using all that are non_generic?

This would (1) document better which subscectors are included here, (2) safe-guard that list to possible changes in the list in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid that, because it introduces the risk of not processing an added category.
Besides, JRC-IDESS is unfortunately quite old (2015), and this processing is very tied to it. Do not see this is a future issue.

I have my sights on this other dataset: https://iopscience.iop.org/article/10.1088/2753-3751/ad4e39
but it is going to require a different module with its own steps.

other_final_demand = transform_final_demand_by_priority(
jrc_energy, generic_config["final-energy-carriers"]
)
case "keep everything":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert statement would be helpful.


other_demand = jrc.standardize(other_demand, "twh")

if path_output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? If it's needed, than the typt hint of the return type in the function signature must be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean path_output, because the function just returns an xarray (to aid in testing). The file is saved only as an option.

Otherwise, the function would always return a None, making testing it harder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed for testing purposes, I'd recommend to let the function return a dataset which is stored somewhere else, as discussed in the last dev call. Otherwise you increase complexity of the function signature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled. That is already what this is doing: the return other_demand is an xarray. You could choose to save it or evaluate it directly in a test.

I'll just remove it.

Copy link
Contributor

@irm-codebase irm-codebase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if more fixes are needed

path-carrier-names: config/energy-balances/energy-balance-carrier-names.csv
path-jrc-industry-energy: build/data/jrc-idees/industry/processed-energy.nc
path-jrc-industry-production: build/data/jrc-idees/industry/processed-production.nc
input-paths:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name change requested by Bryn. This change is also reflected in rule scripts.

conda: CONDA_PATH
params:
config_steel = config["params"]["steel"]
non_generic_categories = config["params"]["non-generic-categories"],
generic_config = config["params"]["generic-config"],
input:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to the "combine" rule.

# output:
# script:
SUFFIXES = [i.lower().replace(" ", "_") for i in config["params"]["specific-categories"]]
rule combine_and_scale:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new rule makes sure that both "combined" and "specific" JRC categories are processed.
In the future they will be combined and scaled (chemicals industry must be finished first).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the requested name changes show here, too.

# Combine and fill missing countries
other_demand = xr.concat(
[other_useful_demand, other_final_demand], dim="carrier_name"
)

assert other_demand.sum() < jrc_energy["final"].sum(), "Potential double counting!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple assert to avoid double counting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be <=?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I'll make a quick update.

@@ -6,6 +6,13 @@
STANDARD_COORDS = ["cat_name", "year", "country_code", "carrier_name"]


def check_units(jrc_energy: xr.Dataset, jrc_prod: xr.DataArray) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method to ensure JRC data has the right format.

carrier_eff = carrier_tot["useful"] / carrier_tot["final"]

# Fill NaNs (where there is demand, but no consumption in that country)
# First by country avg. (all years), then by year avg. (all countries).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also added it to other parts.

conda: CONDA_PATH
params:
config_steel = config["params"]["steel"]
non_generic_categories = config["params"]["non-generic-categories"],
generic_config = config["params"]["generic-config"],
input:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of being explicit rather than implicit, I would add them here, too.

conda: CONDA_PATH
params:
config_steel = config["params"]["steel"]
non_generic_categories = config["params"]["non-generic-categories"],
generic_config = config["params"]["generic-config"],
input:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the "combine" rule is downstream of this, so this won't work, right? I would definitely make the dependencies explicit, then you do not even have to think about this at all.

# Combine and fill missing countries
other_demand = xr.concat(
[other_useful_demand, other_final_demand], dim="carrier_name"
)

assert other_demand.sum() < jrc_energy["final"].sum(), "Potential double counting!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be <=?

@irm-codebase
Copy link
Contributor

@timtroendle quick fix for the assert case you mentioned. I also updated the name of the files, to make them match the category (did not do it before to avoid breaking the review relations).

The aggregated rule runs with no problems!

@irm-codebase
Copy link
Contributor

No idea why the docs are not working. It's not related to my changes, for what I can see.

@brynpickering brynpickering merged commit 5ad4e0f into calliope-project:develop Jul 9, 2024
4 checks passed
carrier_eff = carrier_eff.fillna(carrier_eff.mean(dim="year"))
carrier_eff = carrier_eff.fillna(carrier_eff.mean(dim="country_code"))

carrier_final_demand = useful_dem_tot / carrier_eff
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.
This then becomes an issue for line 165. useful_dem_tot.sum() will likely be larger than carrier_final_demand.sum() because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I found this function very confusing when converting it from SCEC's code. @brynpickering please correct if I say something wrong here.

This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.

This is intended. Basically, if a carrier was never used to meet a specific useful demand, it's efficiency (carrier_eff) should be full of nan values.

This then becomes an issue for line 165. useful_dem_tot.sum() will likely be larger than carrier_final_demand.sum() because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.

I see your point. How about this?

assert jrc_energy["useful].sel(carrier_name=carrier) < carrier_final_demand.sum(), "Creating energy!"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tud-mchen6 that being said, I think this function is only used in the "other"/combined industries processing, even in the old SCEC code. Just please make sure that this is the actual function you wish to use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tud-mchen6 that being said, I think this function is only used in the "other"/combined industries processing, even in the old SCEC code. Just please make sure that this is the actual function you wish to use.

As I understand, that is not true. The old SCEC code uses electrical_consumption in the function get_chem_energy_consumption, and electrical_consumption is generated by the function get_carrier_demand, which is a similar version to get_carrier_demand in jrc_idees_parser.py in the old industry module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I found this function very confusing when converting it from SCEC's code. @brynpickering please correct if I say something wrong here.

This basically means [all carriers demand added together] divided by [specific carrier efficiency]? But this formula only works for the entries that relate to the defined carrier and any other entries with different carriers will become null.

This is intended. Basically, if a carrier was never used to meet a specific useful demand, it's efficiency (carrier_eff) should be full of nan values.

This then becomes an issue for line 165. useful_dem_tot.sum() will likely be larger than carrier_final_demand.sum() because the previous one contains demand of all carriers, while the latter only 'transformed' demand of one carrier. At least that is what I get from processing chemicals industry.

I see your point. How about this?

assert jrc_energy["useful].sel(carrier_name=carrier) < carrier_final_demand.sum(), "Creating energy!"

Your suggestion at the end looks good, except that you may need to add a .sum() to the left hand side?

@irm-codebase irm-codebase mentioned this pull request Jul 25, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Industry Industrial energy demand
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get demand for "other" industry and process it
4 participants