-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: cross validate timings #233
base: main
Are you sure you want to change the base?
Feature: cross validate timings #233
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #233 +/- ##
===========================================
Coverage 100.00% 100.00%
===========================================
Files 45 59 +14
Lines 2242 3896 +1654
===========================================
+ Hits 2242 3896 +1654 ☔ View full report in Codecov by Sentry. |
@@ -36,6 +57,7 @@ def cross_validate( # pylint: disable=too-many-locals | |||
ref_models: tp.Optional[tp.List[str]] = None, | |||
validate_ref_models: bool = False, | |||
on_unsupported_targets: ErrorBehaviour = "warn", | |||
compute_timings: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add new argument to docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this param? what's wrong if we always measure the time?
else: | ||
yield | ||
|
||
|
||
def cross_validate( # pylint: disable=too-many-locals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update CHANGELOG.MD
@@ -24,6 +25,26 @@ | |||
from .splitter import Splitter | |||
|
|||
|
|||
@contextmanager | |||
def compute_timing(label: str, timings: tp.Optional[tp.Dict[str, float]] = None) -> tp.Iterator[None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this function accepts the label
timings
param but don't really need it
Let's please rewrite it in one of the following ways:
- Remove both params and simply return the elapsed time without dictionaries
- Rewrite it as a class
I personally prefer the second option since it's clearer.
But anyway let's not use this labels and dictionary inside. We can easily fill them out of the class
And example (it's simplified a bit, please add init if required by linters, also types)
class Timer:
def __enter__(self):
self._start = time.perf_counter()
self._end = None
return self
def __exit__(self, *args):
self._end = time.perf_counter()
@property
def elapsed(self):
return self._end - self._start
with Timer() as timer:
# code
pass
fit_time = timer.elapsed
@@ -36,6 +57,7 @@ def cross_validate( # pylint: disable=too-many-locals | |||
ref_models: tp.Optional[tp.List[str]] = None, | |||
validate_ref_models: bool = False, | |||
on_unsupported_targets: ErrorBehaviour = "warn", | |||
compute_timings: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this param? what's wrong if we always measure the time?
Dictionary to store the timing results. If None, timing is not recorded. | ||
""" | ||
if timings is not None: | ||
start_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use time.perf_counter
instead, it's more correct for measuring time intervals
if timings is not None: | ||
start_time = time.time() | ||
yield | ||
timings[label] = round(time.time() - start_time, 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't do this
- If we need to format the values somehow it should always be done separately. We should separate the computing level and presentation level. From the computing level we should always return the raw values.
- In this specific case I think we shouldn't format the value at all. I don't see much sense in it, also we're not doing this for other metrics.
Added
compute_timings
argument tocross_validate
Closes #138