Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fixed]: Issues resolved raised by mypy for issue Make #22313 #2438

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions nncf/common/graph/transformations/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ def __init__(self, target_type: TargetType):
:param target_type: Type of the target point.
"""
self._target_type = target_type
self.target_node_name: str = ""
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_node_name parameter is valid for all inherited classes except TFTargetPoint. I see three possible solutions here:

  1. Do not inheret TFTargetPoint from the TargetPoint class and implement target_node_name attribute as a required init attribute.
  2. Make an interface for a target points classes with target_node_name attribute and use it in correspondent typehints
  3. Rename layer_name and op_name to target_node_name in the TFTargetPoints.

I suggest to implement the third solution
CC: @alexsu52


@property
def type(self) -> TargetType:
Expand Down
4 changes: 4 additions & 0 deletions nncf/common/logging/track_progress.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ def __iter__(self) -> Iterable[ProgressType]:
self.sequence, total=self.total, description=self.description, update_period=self.update_period
)

def __next__(self):
with self.progress:
return next(self.iterator)

def __enter__(self):
self.progress.start()
self.task = self.progress.add_task(self.description, total=self.total)
Expand Down
26 changes: 9 additions & 17 deletions nncf/common/tensor_statistics/aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,11 @@
# limitations under the License.
from abc import ABC
from abc import abstractmethod
from itertools import islice
from typing import Any, Dict, TypeVar

from nncf.common import factory
from nncf.common.graph.graph import NNCFGraph
from nncf.common.graph.transformations.layout import TransformationLayout
from nncf.common.logging.track_progress import track
from nncf.common.tensor import NNCFTensor
from nncf.common.tensor_statistics.statistic_point import StatisticPointsContainer
from nncf.data.dataset import Dataset
Expand All @@ -30,9 +28,9 @@ class StatisticsAggregator(ABC):
Base class for statistics collection.
"""

def __init__(self, dataset: Dataset):
def __init__(self, dataset: Dataset[int, int]):
self.dataset = dataset
self.stat_subset_size = None
self.stat_subset_size = 0
self.statistic_points = StatisticPointsContainer()

def collect_statistics(self, model: TModel, graph: NNCFGraph) -> None:
Expand All @@ -50,21 +48,15 @@ def collect_statistics(self, model: TModel, graph: NNCFGraph) -> None:

merged_statistics = self._get_merged_statistic_points(self.statistic_points, model, graph)
transformation_layout = self._get_transformation_layout_extra_outputs(merged_statistics)
model_with_outputs = model_transformer.transform(transformation_layout)
model_with_outputs: TModel = model_transformer.transform(transformation_layout)
engine = factory.EngineFactory.create(model_with_outputs)

dataset_length = self.dataset.get_length()
total = (
min(dataset_length or self.stat_subset_size, self.stat_subset_size)
if self.stat_subset_size is not None
else None
)
Comment on lines -56 to -61
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was logic before: if self.stat_subset_size is None then subset size is equal to the length of the whole dataset. As default value is 0 for self.stat_subset_size and this check is gone, I don't see the place where this logic is implemented. Could you please explain where is it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to set self.stat_subset_size to the length of the whole dataset if it is None is not explicitly present. However, there are steps taken during the registration of statistic points that indirectly handle the calculation of self.stat_subset_size.

Screenshot 2024-02-12 at 3 44 35 PM

In this part of the code, it iterates over the registered statistic points and their associated tensor collectors. For each tensor collector, it checks if self.stat_subset_size is None. If it is None, it sets self.stat_subset_size to the num_samples attribute of the tensor collector.
If self.stat_subset_size is not None, it compares it with the num_samples attribute of the tensor collector and updates self.stat_subset_size to be the maximum of the two values.
This logic effectively sets self.stat_subset_size based on the maximum number of samples encountered among all tensor collectors associated with the registered statistic points. While it doesn't directly set it to the length of the whole dataset if self.stat_subset_size is None, it achieves a similar effect by considering the maximum number of samples encountered during registration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you. But despite the fact that this should work in every case, I prefer explicit double check

empty_statistics = True
for input_data in track(
islice(self.dataset.get_inference_data(), self.stat_subset_size),
total=total,
description="Statistics collection",
):
data_iterable = iter([self.dataset.get_inference_data()])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep track call over here

for input_data in data_iterable:
outputs = engine.infer(input_data)
processed_outputs = self._process_outputs(outputs)
self._register_statistics(processed_outputs, merged_statistics)
Comment on lines +56 to +59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplicate

outputs = engine.infer(input_data)
processed_outputs = self._process_outputs(outputs)
self._register_statistics(processed_outputs, merged_statistics)
Expand All @@ -87,7 +79,7 @@ def register_statistic_points(self, statistic_points: StatisticPointsContainer)

for _, _statistic_points in self.statistic_points.items():
for _statistic_point in _statistic_points:
for _, tensor_collectors in _statistic_point.algorithm_to_tensor_collectors.items():
for tensor_collectors in _statistic_point.algorithm_to_tensor_collectors.values():
for tensor_collector in tensor_collectors:
Comment on lines +82 to 83
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real issue?

if self.stat_subset_size is None:
self.stat_subset_size = tensor_collector.num_samples
Expand Down
Loading
Loading