Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Run Pylint strictly on new files, softly on history #11212

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
105 changes: 99 additions & 6 deletions .github/workflows/code-formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ name: Isort and Black Formatting; PyLint Docs check
# For details see https://github.com/orgs/community/discussions/25702

on:
pull_request_target:
pull_request: #pull_request_target:
paths:
- '**.py'
types: [ opened, synchronize, reopened, labeled, unlabeled ]
Expand All @@ -20,7 +20,7 @@ defaults:
shell: bash -x -e -u -o pipefail {0}

jobs:
reformat_with_isort_and_black_and_pylint:
reformat_with_isort_and_black:
runs-on: ubuntu-latest
permissions:
# write permissions required to commit changes
Expand Down Expand Up @@ -72,16 +72,109 @@ jobs:
message: Apply isort and black reformatting
commit: --signoff

check_pylint:
name: "check_pylint (strict-mode: ${{ matrix.strict-mode }})"
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
env:
THRESHOLD: 1730937600 # On this date (2024/11/07) we decided to add Pylint. It shall only run in strict mode for files added past this date. For files prior to this date, we will only add a PR comment with PyLint's stdout.
strategy:
matrix:
strict-mode: ["true", "false"]
steps:
- name: Checkout branch
uses: actions/checkout@v4
with:
# setup repository and ref for PRs, see
# https://github.com/EndBug/add-and-commit?tab=readme-ov-file#working-with-prs
repository: ${{ github.event.pull_request.head.repo.full_name }}
ref: ${{ github.event.pull_request.head.ref }}

# https://github.com/tj-actions/changed-files
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v44
with:
files: |
**.py

- name: Setup Python env
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: pylint
if: ${{ steps.changed-files.outputs.any_changed == 'true' && !contains( github.event.pull_request.labels.*.name, 'skip-docs') }}
id: pylint
env:
# only *.py files included
STRICT_MODE: ${{ matrix.strict-mode }}
CHANGED_FILES: "${{ steps.changed-files.outputs.all_changed_files }}"
run: |
pip install pylint

ADDITIONAL_PYLINT_ARGS=()
echo ${{ github.event.pull_request.labels.*.name }}
FILTERED=()
for file in $CHANGED_FILES; do
DATE=$(git log --follow --format=%ad --date=unix $file | tail -1)

if [[ "$STRICT_MODE" == "true" ]]; then
if [[ "$DATE" -gt "$THRESHOLD" ]]; then
FILTERED+=("$file")
fi
else
FILTERED+=("$file")
fi
done

if [ ${#FILTERED[@]} -eq 0 ]; then
echo "No files to check."
exit 0
fi

echo "Will run on these files:
${FILTERED[@]}"

set +e
LOG=$(pylint ${FILTERED[@]})
EXIT_CODE=$?
set -e

echo "$LOG"
if [[ "$EXIT_CODE" -ne 0 && "$STRICT_MODE" == "true" ]]; then
echo $EXIT_CODE
else
echo "OUTPUT<<EOF" >> $GITHUB_ENV
echo "$LOG" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV

echo "log=$LOG"
fi
exit $([[ "$EXIT_CODE" -ne 0 && "$STRICT_MODE" == "true" ]] && echo $EXIT_CODE || echo 0)

- name: Find Comment
if: ${{ matrix.strict-mode == 'false' && env.OUTPUT != '' }}
uses: peter-evans/find-comment@v3
id: fc
with:
issue-number: ${{ github.event.number }}
body-includes: <!-- pylint-output -->

- name: Add PR comment for PyLint
if: ${{ matrix.strict-mode == 'false' && env.OUTPUT != '' }}
uses: peter-evans/create-or-update-comment@v4
with:
comment-id: ${{ steps.fc.outputs.comment-id }}
issue-number: ${{ github.event.number }}
edit-mode: replace
body: |
<!-- pylint-output -->

Your code was analyzed with PyLint. The following annotations have been identified:

```
${{ env.OUTPUT }}
```

pylint ${ADDITIONAL_PYLINT_ARGS[@]} $CHANGED_FILES || \
{ echo "Pylint failed. In case of long strings, format docstrings and other strings manually."; exit 1; }
Please help us to raise the code-quality of NeMo!
2 changes: 1 addition & 1 deletion nemo/core/classes/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
class Dataset(data.Dataset, Typing, Serialization):
"""Dataset with output ports
Please Note: Subclasses of IterableDataset should *not* implement input_types.
Please Note: Subclasses of IterableDataset should *not* implement input_types..
"""

def _collate_fn(self, batch):
Expand Down
108 changes: 108 additions & 0 deletions nemo/core/classes/dataset2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import Optional

from torch.utils import data

from nemo.core.classes import Serialization, Typing, typecheck

__all__ = ['Dataset', 'IterableDataset']


class Dataset(data.Dataset, Typing, Serialization):
"""Dataset with output ports

Please Note: Subclasses of IterableDataset should *not* implement input_types..
"""

def _collate_fn(self, batch):
"""
A default implementation of a collation function.
Users should override this method to define custom data loaders.
"""
return data.dataloader.default_collate(batch)

@typecheck()
def collate_fn(self, batch):
"""
This is the method that user pass as functor to DataLoader.
The method optionally performs neural type checking and add types to the outputs.

Please note, subclasses of Dataset should not implement `input_types`.

Usage:

.. code-block:: python

dataloader = torch.utils.data.DataLoader(
....,
collate_fn=dataset.collate_fn,
....
)

Returns:
Collated batch, with or without types.
"""
if self.input_types is not None:
raise TypeError("Datasets should not implement `input_types` as they are not checked")

# Simply forward the inner `_collate_fn`
return self._collate_fn(batch)


class IterableDataset(data.IterableDataset, Typing, Serialization):
"""Iterable Dataset with output ports

Please Note: Subclasses of IterableDataset should *not* implement input_types.
"""

def _collate_fn(self, batch):
"""
A default implementation of a collation function.
Users should override this method to define custom data loaders.
"""
return data.dataloader.default_collate(batch)

@typecheck()
def collate_fn(self, batch):
"""
This is the method that user pass as functor to DataLoader.
The method optionally performs neural type checking and add types to the outputs.

# Usage:
dataloader = torch.utils.data.DataLoader(
....,
collate_fn=dataset.collate_fn,
....
)

Returns:
Collated batch, with or without types.
"""
if self.input_types is not None:
raise TypeError("Datasets should not implement `input_types` as they are not checked")

# Simply forward the inner `_collate_fn`
return self._collate_fn(batch)


@dataclass
class DatasetConfig:
# ...
batch_size: int = 32
drop_last: bool = False
shuffle: bool = False
num_workers: Optional[int] = 0
pin_memory: bool = True
2 changes: 1 addition & 1 deletion nemo/core/classes/exportable.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
class Exportable(ABC):
"""
This Interface should be implemented by particular classes derived from nemo.core.NeuralModule or nemo.core.ModelPT.
It gives these entities ability to be exported for deployment to formats such as ONNX.
It gives these entities ability to be exported for deployment to formats such as ONNX..
Usage:
# exporting pre-trained model to ONNX file for deployment.
Expand Down
Loading