-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile some of the CLE report's methods #35830
base: master
Are you sure you want to change the base?
Conversation
@CodeRabbit please review |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
corehq/apps/reports/tests/test_case_list_explorer_report.py (2)
87-108
: Improve property access and approve test implementation.The test implementation correctly verifies the profiling functionality. However, there's a minor improvement needed in how the property is accessed.
Apply this diff to improve the property access:
- cle.json_response + _ = cle.json_responseThis change explicitly shows that we're intentionally accessing the property for its side effects.
🧰 Tools
🪛 Ruff (0.8.2)
106-106: Found useless expression. Either assign it to a variable or remove it.
(B018)
102-107
: Enhance test assertions and fix property access.The assertions verify the profiler state correctly, but there are a few improvements to consider:
- Add descriptive messages to assertions for better test failure diagnostics
- Fix the property access on line 106 which is currently not assigned or used
Apply this diff to improve the test:
- assert cle.profiler_enabled - assert cle.profiler.timing_context.to_dict()['percent_total'] is None + assert cle.profiler_enabled, "Profiler should be enabled for CaseListExplorer" + assert cle.profiler.timing_context.to_dict()['percent_total'] is None, "Initial percent_total should be None" # Run the CLE - cle.json_response - assert cle.profiler.timing_context.to_dict()['percent_total'] == 100.0 + _ = cle.json_response # Access the property to trigger profiling + assert cle.profiler.timing_context.to_dict()['percent_total'] == 100.0, "Final percent_total should be 100.0"🧰 Tools
🪛 Ruff (0.8.2)
106-106: Found useless expression. Either assign it to a variable or remove it.
(B018)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
corehq/apps/reports/standard/__init__.py
(3 hunks)corehq/apps/reports/standard/cases/basic.py
(5 hunks)corehq/apps/reports/standard/cases/case_list_explorer.py
(6 hunks)corehq/apps/reports/tests/test_case_list_explorer_report.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: - Review the code following best practises and st...
**/*.py
: - Review the code following best practises and standards
corehq/apps/reports/tests/test_case_list_explorer_report.py
corehq/apps/reports/standard/__init__.py
corehq/apps/reports/standard/cases/basic.py
corehq/apps/reports/standard/cases/case_list_explorer.py
🪛 Ruff (0.8.2)
corehq/apps/reports/tests/test_case_list_explorer_report.py
106-106: Found useless expression. Either assign it to a variable or remove it.
(B018)
🔇 Additional comments (4)
corehq/apps/reports/standard/__init__.py (1)
313-377
: Well-implemented profiling decorator with comprehensive documentation!The implementation is clean and follows best practices:
- Proper use of
wraps
to preserve function metadata- Elegant fallback to
nullcontext()
for when profiling is disabled- Clear and detailed documentation with usage examples
corehq/apps/reports/standard/cases/case_list_explorer.py (2)
66-67
: Profiler configuration looks good!The profiler is properly configured with appropriate values for
profiler_enabled
andprofiler_name
.
127-130
: Good separation of concerns for profiling!Extracting
_apply_xpath_case_search_filter
into a separate method allows for more granular profiling of the XPath case filter application.corehq/apps/reports/tests/test_case_list_explorer_report.py (1)
87-101
: LGTM! Well-structured test setup.The test setup follows consistent patterns with other test methods in the class and properly sets up the required test data.
@CodeRabbit resolve |
9decda0
to
0ab5c4f
Compare
return self._get_rows(data) | ||
|
||
with self.profiler.timing_context("Retrieving rows"): | ||
data = (wrap_case_search_hit(row) for row in self.es_results['hits'].get('hits', [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could consider tracking "Fetching results from ES" and "Wrapping result" as two different metrics, just to be very specific about how long ES took.
|
||
with self.profiler.timing_context("Retrieving rows"): | ||
data = (wrap_case_search_hit(row) for row in self.es_results['hits'].get('hits', [])) | ||
with self.profiler.timing_context("Parsing rows"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: preparing data for display?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that. I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -334,3 +336,18 @@ def _get_search_class(self): | |||
if not self.search_class: | |||
raise NotImplementedError("You must define a search_class attribute.") | |||
return self.search_class | |||
|
|||
|
|||
def profile(name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo very nice!
with self.profiler.timing_context("Parsing rows"): | ||
return self._get_rows(data) | ||
data = (wrap_case_search_hit(row) for row in self.es_results['hits'].get('hits', [])) | ||
return self._get_rows(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I liked the specific tracking before. There is a lot happening here to put under one metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a screenshot so you can see how the different timings are being reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the "wrap_case_search_hit" and the time taken by ES to get the data is clubbed into "Retrieving rows" right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Feel free to suggest different wording if you think it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore this is this is drilling to deep for metrics.
I was just thinking of
@profile("Retrieving rows")
def rows(self):
self.track_search()
es_hits = self.es_results()
data = self._wrap_results(es_hits)
return self._get_rows(data)
@profile("Fetching Elasticsearch hits")
def es_results(self):
#...
@profile("Wrap Elasticsearch hits")
def _wrap_results(es_hits):
return [wrap_case_search_hit(row) for row in es_hits]
def _get_rows():
#...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
Can you share a screenshot or two in the description on how this would look?
@mkangia See updated description. |
Now that I'm looking at the timings again more closely I think there's still some room to add more profiling, since the "ES Query" part only makes up about 15% of the "Retrieving rows", which in turn makes up 96% of the total (for that test run). Maybe I'll run it a couple of times more and see if I can capture the other parts. |
Actually, on a bunch of subsequent runs locally it seem like the execution of the query takes the majority of the time of the "ES query" block, which in turn makes up > 80% of the "Retrieving rows" block, so I think the current profiling "checkpoints" is good enough. |
@@ -13,7 +15,7 @@ | |||
from corehq.apps.reports.generic import ElasticProjectInspectionReport | |||
from corehq.apps.reports.standard import ( | |||
ProjectReport, | |||
ProjectReportParametersMixin, | |||
ProjectReportParametersMixin, ESQueryProfilerMixin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def wrapper(obj, *args, **kwargs): | ||
with obj.profiler.timing_context(name) if obj.profiler_enabled else contextlib.nullcontext(): | ||
result = func(obj, *args, **kwargs) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it simpler with?
if obj.profiler_enabled:
with obj.profiler.timing_context(name):
return func(obj, *args, **kwargs)
else:
return func(obj, *args, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -107,7 +109,8 @@ def _build_query(self): | |||
@memoized | |||
def es_results(self): | |||
try: | |||
return self._build_query().run().raw | |||
with self.profiler.timing_context("ES query") if self.profiler_enabled else contextlib.nullcontext(): | |||
return self._build_query().run().raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for sake of code clarity can this be extracted in a method and decorated with profile as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
As a follow up, you could consider how the metrics how up on admin restore page. just a better UI experience than a JSON.
Note: This PR is the second in a series to capture timing information around the CLE processing.
Technical Summary
Ticket
This ticket makes use of the
ESQueryProfilerMixin
by adding it to theCaseListMixin
and augmenting some methods of theCaseListExplorer
to capture timing information around the processing of report rows.A new decorator,
profile
, has been created to make it easier (and less verbose) to make use of the profiler for capturing a report method's timing information.Below is a screenshot of the raw json of what the profiling data looks like (example taken from running on local machine). Pay attention to the nesting and what each nested block reports.
Safety Assurance
Safety story
Tested locally
Automated test coverage
Appended one-liner to test. Don't think much more is really required at this stage.
QA Plan
QA to commence as part of this PR
Rollback instructions
Labels & Review