Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal of expansion of anemoi-wide pre-commit hooks and ruff ruleset for code quality, security and maintainability #10

Open
JesperDramsch opened this issue Jul 22, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@JesperDramsch
Copy link
Member

Is your feature request related to a problem? Please describe.

Maintainability and code quality are important parts to keep anemoi alive and well and avoid technical debt.

Pre-commit hooks are an easy way to automatically flag and possibly fix these issues, but above this we can run pre-commit in Github Actions to automatically check code compliance across many definitions.

We already implement many pre-commit hooks, but from experience I would propose the following hooks in addition:

Describe the solution you'd like

  1. Use pygrep-hooks to enforce type annotations, check for blanket noqa's (they should be specific to what they exempt) and check for log.warns
-   repo: https://github.com/pre-commit/pygrep-hooks
    rev: v1.10.0  # Use the ref you want to point at
    hooks:
    -   id: python-use-type-annotations # Check for missing type annotations
    -   id: python-check-blanket-noqa # Check for # noqa: all
    -   id: python-no-log-warn # Check for log.warn
  1. Use docsig to check docstrings against function signature
-   repo: https://github.com/jshwi/docsig # Check docstrings against function sig
    rev: v0.44.2
    hooks:
    -   id: docsig
        args:
        - --ignore-no-params # Allow docstrings without parameters
        - --check-dunders    # Check dunder methods
        - --check-overridden # Check overridden methods
        - --check-protected  # Check protected methods
        - --check-class      # Check class docstrings
        - --disable=E113     # Disable empty docstrings
        - --summary          # Print a summary

There is no formal check that docstrings actually represent the function they describe. The additional settings make the docstring checking only necessary, when parameters are set and doesn't fail when no docstring is set, as a fairly lenient implementation. (This can be a point of discussion, whether we want to force docstrings and parameter descriptions.)

  1. I would suggest expanding the ruff ruleset to ALL, which expands the checking to a wide variety of possible problems.

By default ruff includes ruleset E and F which are pydocstyle errors and flake8 errors.

There are many more rulesets that improve the overall code quality and should be considered. We could also enable those explicitly, or simply rely on ALL to include the full expansion of community accepted best practices.

From experimentation I had also disabled some specific rulesets:

"E203", whitespace before punctuation

"D100", missing docstring in public module
"D101", missing docstring in public class
"D102", missing docstring in public method
"D103", missing docstring in public function
"D104", missing docstring in public package
"D105", missing docstring in magic method
"D401", First line of docstring written in imperative mood

"S101", Use of assert (asserts are usually good in our case)

"PT018", Composite pytest asserts

These could be discussed, as empty docstrings may actually not be wanted in the framework.

Describe alternatives you've considered

Alternatives would be to enable each package we want individually:

Currently in AIFS we are working with these:

select = [
    "A",    # flake8 builtins
    "B",    # flake8-bugbear
    "D",    # pydocstyle
    "E",    # pycodestyle error
    "W",    # pycodestyle warning
    "F",    # flake8 error
    "UP",   # Pyupgrade
    "SIM",  # flake8-simplify
    "N",    # pep8 naming
    "YTT",  # flake8-2020
    "S",    # bandit
    "COM",  # Commas
    "C4",   # Comprehensions
    "DTZ",  # Datetimes
    "ISC",  # Implicit string concatenation
    "ICN",  # Import conventions
    "LOG",  # Logging
    "PIE",  # Misc lints
    "T20",  # Print statements
    "PT",   # Pytest
    "Q",    # Quotes
    "RSE",  # Raises
    "TID",  # Tidy imports
    "PTH",  # Use Pathlib
    "PGH",  # Pygrep hooks
    "R",    # Refactor
    "FLY",  # Fstrings
    "PERF", # Perfomance linting
    "FURB", # Modernising code
    "RUF",  # Ruff specific
    "NPY",  # Numpys
    # "PL",  # Pylint
    # "TD",  # Todos
    # "FBT", # Boolean traps
    # "CPY", # Copyright
]

Only disabling four, which would still be useful for a framework. The boolean traps ruleset might make coding harder however, and might necessitate some refactoring of parts of the code.

Alternatively we can introduce pre-commit hooks these rulesets implement:

  1. Use bandit for common code vulnerabilities (implemented in ruleset S):
-   repo: https://github.com/pycqa/bandit # Check code for common security issues
    rev: 1.7.7
    hooks:
    -   id: bandit
  1. Use docformatter to enforce consistent formatting of docstrings (ruleset from pydocstyle D, E, W):
-   repo: https://github.com/PyCQA/docformatter # Format docstrings
    rev: v1.7.5
    hooks:
    -   id: docformatter
        args:
        - -s numpy
        - --black
        - --in-place
  1. Use pyupgrade to automatically upgrade Python syntax from older patterns (e.g. upgrading from percent style formatting '%s %s' % (a, b) to '{} {}'.format(a, b))
-   repo: https://github.com/asottile/pyupgrade # Upgrade Python syntax
    rev: v3.15.1
    hooks:
    -   id: pyupgrade
        args:
        - --py38-plus

Additional context

Bandit or ruleset S especially may expose vulnerabilities in code, which makes our life as maintainers a little easier.

Organisation

ECMWF

@JesperDramsch JesperDramsch added the enhancement New feature or request label Jul 22, 2024
@leifdenby
Copy link

Just commenting to say I strongly support adding these (particularly type annotations and docstrings) - sounds great! 🥳 . I missed the technical subgroup meeting where pre-commit was discussed due to holiday.

@JesperDramsch
Copy link
Member Author

Thanks @leifdenby!

I actually have an ammendment, we have experimentally switched on ALL during our work to factor out anemoi-training. And some of the ruleset is way too strict unfortunately. (Specifically the Pylint Conventions in C are a nightmare to work with and some of the annotations are over the top.)

I would like to amend the proposal to:

select = [
  "A", # flake8 builtins
  # "ANN", # flake8-annotations
  "ANN001", # Type annotation
  "ANN201", # Return type public
  "ANN202", # Return type private
  "ANN205", # Return type static method
  "ANN206", # Return type class method
  "ARG",    # Unused arguments
  "B",      # flake8-bugbear
  "BLE",    # Non-specific exceptions
  "C4",     # Comprehensions
  "C90",    # Mccabe complexity
  "COM",    # Commas
  "CPY",    # Copyright
  # "C",  # pylint convention
  "D", # pydocstyle
  # "DOC", # pydoclint
  "DTZ", # Datetimes
  "E",   # pycodestyle error
  "EM",  # Error messages
  "ERA", # Found unused code
  "F",   # flake8 error
  "FA",  # Future annotations
  # "EXE", # Executable
  # "FBT", # Boolean traps
  "FLY",  # Fstrings
  "FURB", # Modernising code
  "G",    # flake8 logging format
  "ICN",  # Import conventions
  "ISC",  # Implicit string concatenation
  "LOG",  # Logging
  # "I", # isort
  "N",    # pep8 naming
  "NPY",  # Numpys
  "PERF", # Perfomance linting
  "PGH",  # Pygrep hooks
  "PIE",  # Misc lints
  # "PL",  # Pylint (superset of C, E, R, W)
  "PT",  # Pytest
  "PTH", # Use Pathlib
  "Q",   # Quotes
  "R",   # Refactor
  "RET", # Return statements
  "RSE", # Raises
  "RUF", # Ruff specific
  "S",   # bandit !IMPORTANT
  "SIM", # flake8-simplify
  "T10", # Debug statements
  "T20", # Print statements
  "TCH", # Type checking blocks
  "TD",  # Todos
  "TID", # Tidy imports
  "TRY", # Exception handling antipatterns
  "UP",  # Pyupgrade
  "W",   # pycodestyle warning
  "YTT", # flake8-2020
]

with these specific ignores:

ignore = [
  "B018",
  "B028",
  "D100",
  "D101",
  "D102",
  "D103",
  "D104",
  "D105",
  "D401",
  "E203",
  "PT018",
  "S101",
  "TD003",
] 

Since you noted annotations @leifdenby, here is my reasoning to use a subset:

Annotations for
✅ ANN001 - function inputs
❌ ANN002 - *args
❌ ANN003 - **kwargs
❌ ANN101 - deprecated will be removed
❌ ANN102 - deprecated will be removed
return types:
✅ ANN201 - public functions
✅ ANN202 - private functions
❌ ANN204 - special ( init, call etc) method
✅ ANN205 - static method
✅ ANN206 - class method
Any:
❌ ANN401 - disallow Any

Reasoning:
I think annotating *args/**kwargs is a bit over the top, as they could always just be list/dict, which defeats the purpose.
init doesn't necessarily need a return type and we don't use call a lot, can be convinced either way.
Disallowing Any when we're enforving typehints seems like a nasty move.


I chose the "explicit add" route, which is quite verbose, but I think it's quite useful to see all the options. But I can be convinced that ALL with ignore C, ... would be better. I like the explicitly though. (This is a slightly extended ruleset from what we had been using on AIFS for a while, which seems to improve code quality without causing people to commit -n constantly.

@JesperDramsch
Copy link
Member Author

We also got the suggestion that adding Conventional Commits would be a great addition as a pre-commit hook.

  - repo: https://github.com/compilerla/conventional-pre-commit
    rev: <git sha or tag>
    hooks:
      - id: conventional-pre-commit
        stages: [commit-msg]
        args: []

We had been using those across some internal repos already anyways, so i would be in favour.

@floriankrb
Copy link
Member

IMO, one point to have in mind in configuring ruff is that we should keep refactoring easy and fast.

This is especially true in the early stages of development where flexibility is crucial, but stays valid as long as the code is alive. As an example, requesting having a docstring for every function would not make sense and I don't think anybody wants this.
This applies also to annotations (or unit tests, or documentation) and I hope we will find the right trade-off between writing code fast and writing production-ready code.

@floriankrb
Copy link
Member

And duck typing should be kept possible too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants