Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter external repos #396

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

phackstock
Copy link
Contributor

@phackstock phackstock commented Sep 19, 2024

Closes #326.

This PR adds the feature to filter external repositories using include and exclude filters.
Any number of filters combining any attributes can be defined, example:

repositories:
  common-definitions:
    url: https://github.com/IAMconsortium/common-definitions.git/
definitions:
  variable:
    repository:
      name: common-definitions
      include:
        - name: [Primary Energy*, Final Energy*]
        - name: "Population*"
          tier: 1
      exclude:
        - name: "Final Energy|*|*"
  region:
    repository:
      name: common-definitions
      include:
        - hierarchy: [R5, R10]

For the variable section

in the example above we are including:

  1. All variables starting with Primary Energy or Final Energy
  2. All variables starting with Population and with the tier attribute equal to 1

From this list we are then excluding all variables that match "Final Energy||".
This means that the final resulting list will contain no Final Energy variables with
three or more levels.

For the region section

we are taking only R5 and R10 regions.

Changes

One of the changes that I have made is that all repositories in the definition section need to have the name key-word.
This would be a breaking change so all workflow repos that use external repositories would need to be updated.
I'd be happy to streamline it again so that just the name is allowed but for code simplicity I opted against that for now.

@phackstock phackstock added the enhancement New feature or request label Sep 19, 2024
@phackstock phackstock self-assigned this Sep 19, 2024
Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions inline.

As with my other review, I’d strongly advise against using fnmatch.

- name: "Population*"
tier: 1
exclude:
- name: "Final Energy|*|*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a misleading example - it seems to show only level-2 exclusion when in fact it excludes all variables at level 2 or below. Better to use the level-argument explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this is subjective then. For me it was totally clear that this excludes anything level 2 and beyond.
I can see your point though about this being ambiguous

@@ -0,0 +1,17 @@
repositories:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add more structure to the validation test data by using subfolders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#399 implements a cleanup of the test data folder, once that PR is merged, I'll rebase this one

@@ -126,12 +181,22 @@ def repos(self) -> dict[str, str]:
}


class MappingRepository(BaseModel):
name: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Das The mapping will also have to inherit the region-filters, right? Otherwise, a model could map to a region that is not included in the DataStructureDefinition.

@@ -5,5 +5,5 @@ repositories:
url: https://github.com/IAMconsortium/legacy-definitions.git/
mappings:
repositories:
- common-definitions
- legacy-definitions
- name: common-definitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name now required? Could cause a lot of headache of that is now mandatory…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the description of the PR it would be required as it stands now.
I can change the pydantic parsing so that it can still be used without the name attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, better to add a field-validator that translates a string to name: {value} to avoid breaking current workflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow attribute filtering in nomenclature.yaml for importing definitions form external repo
2 participants