Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: cross validate timings #233

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alekseykalyagin
Copy link

@alekseykalyagin alekseykalyagin commented Dec 12, 2024

Added compute_timings argument to cross_validate

Closes #138

Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (9b3992e) to head (46e05c9).
Report is 86 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##              main      #233     +/-   ##
===========================================
  Coverage   100.00%   100.00%             
===========================================
  Files           45        59     +14     
  Lines         2242      3896   +1654     
===========================================
+ Hits          2242      3896   +1654     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rectools/model_selection/cross_validate.py Outdated Show resolved Hide resolved
rectools/model_selection/cross_validate.py Outdated Show resolved Hide resolved
rectools/model_selection/cross_validate.py Outdated Show resolved Hide resolved
tests/model_selection/test_cross_validate.py Outdated Show resolved Hide resolved
rectools/model_selection/cross_validate.py Outdated Show resolved Hide resolved
tests/model_selection/test_cross_validate.py Outdated Show resolved Hide resolved
tests/model_selection/test_cross_validate.py Outdated Show resolved Hide resolved
@@ -36,6 +57,7 @@ def cross_validate( # pylint: disable=too-many-locals
ref_models: tp.Optional[tp.List[str]] = None,
validate_ref_models: bool = False,
on_unsupported_targets: ErrorBehaviour = "warn",
compute_timings: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add new argument to docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this param? what's wrong if we always measure the time?

else:
yield


def cross_validate( # pylint: disable=too-many-locals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update CHANGELOG.MD

@blondered blondered requested a review from feldlime January 10, 2025 09:25
@@ -24,6 +25,26 @@
from .splitter import Splitter


@contextmanager
def compute_timing(label: str, timings: tp.Optional[tp.Dict[str, float]] = None) -> tp.Iterator[None]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this function accepts the label timings param but don't really need it

Let's please rewrite it in one of the following ways:

  1. Remove both params and simply return the elapsed time without dictionaries
  2. Rewrite it as a class

I personally prefer the second option since it's clearer.
But anyway let's not use this labels and dictionary inside. We can easily fill them out of the class

And example (it's simplified a bit, please add init if required by linters, also types)

class Timer:        
    def __enter__(self):
        self._start = time.perf_counter()
        self._end = None
        return self

    def __exit__(self, *args):
        self._end = time.perf_counter()

    @property
    def elapsed(self):
        return self._end - self._start
    
    
with Timer() as timer:
    # code
    pass

fit_time = timer.elapsed

@@ -36,6 +57,7 @@ def cross_validate( # pylint: disable=too-many-locals
ref_models: tp.Optional[tp.List[str]] = None,
validate_ref_models: bool = False,
on_unsupported_targets: ErrorBehaviour = "warn",
compute_timings: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this param? what's wrong if we always measure the time?

Dictionary to store the timing results. If None, timing is not recorded.
"""
if timings is not None:
start_time = time.time()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use time.perf_counter instead, it's more correct for measuring time intervals

if timings is not None:
start_time = time.time()
yield
timings[label] = round(time.time() - start_time, 5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do this

  1. If we need to format the values somehow it should always be done separately. We should separate the computing level and presentation level. From the computing level we should always return the raw values.
  2. In this specific case I think we shouldn't format the value at all. I don't see much sense in it, also we're not doing this for other metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Argument to compute timing in cross-validate
3 participants