Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][fn] Record Pulsar Function processing time properly for asynchronous functions #23811

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

walkinggo
Copy link

@walkinggo walkinggo commented Jan 4, 2025

Fixes #23705

Motivation

See #23705
In the realm of asynchronous processing, precise timing and performance metrics are essential for effective monitoring and optimization. Apache Pulsar Functions, as a distributed compute platform, relies heavily on asynchronous operations to process and transform data streams. However, the current implementation lacks a robust mechanism for accurately capturing and reporting the execution times of these asynchronous functions.

Modifications

This Pull Request introduces several modifications to the Apache Pulsar Functions project aimed at improving the accuracy and reliability of asynchronous function execution time statistics. The primary goal is to enhance monitoring and analysis capabilities for function performance. Here’s a breakdown of the key changes:

  1. Removal of processTimeStart Method: The processTimeStart method in ComponentStatsManager has been removed. This method was previously used to record the start time of asynchronous function execution. The start time is now recorded in the JavaExecutionResult object, providing a more centralized and consistent approach.

  2. Modification of JavaInstanceRunnable: The run method in JavaInstanceRunnable has been updated to reflect the removal of processTimeStart. The stats.processTimeStart() call has been deleted, and the stats.processTimeEnd() method now accepts the start time as a parameter to calculate the total execution time.

  3. Update to FunctionStatsManager: The processTimeEnd method in FunctionStatsManager has been modified to remove the processTimeStart member variable and accept the start time as a parameter. This allows for accurate calculation of the function’s execution time.

  4. Changes to SinkStatsManager and SourceStatsManager: The processTimeEnd methods in SinkStatsManager and SourceStatsManager have been updated to accept the start time as a parameter. However, since these classes do not record processing time, the method bodies remain empty.

  5. Addition of startTime in JavaExecutionResult: The JavaExecutionResult class now includes a startTime member variable to store the start time of asynchronous function execution. This allows for accurate calculation of execution time within the handleResult method of JavaInstanceRunnable.

  6. Modification of AsyncFuncRequest: The AsyncFuncRequest class in JavaInstance now includes a result member variable of type JavaExecutionResult. This change ensures that the processAsyncResultsInInputOrder method uses the existing JavaExecutionResult object instead of creating a new one, maintaining consistency and avoiding duplication.

  7. Use of Same ExecutionResult in Non-asyncPreserveInputOrder Mode: In scenarios where asyncPreserveInputOrderForOutputMessages is disabled, the same executionResult object is now used to avoid unnecessary object creation and potential issues with result assignment.

  8. Fix for Result and Exception Handling: Two patches address potential bugs related to result and exception handling in JavaInstance. The processAsyncResultsInInputOrder method now ensures that the result and userException fields of JavaExecutionResult are properly set, improving the reliability of the execution result.

  9. Addition of Test Case: A new test case, testAsyncFunctionTime, has been added to verify the accuracy of asynchronous function execution time recording and calculation. This test ensures that the start time recorded in JavaExecutionResult is within an acceptable range of the actual start time.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't seem to be related to the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, uploading these file changes was a mistake. I've force-pushed in a later commit to remove those changes.

Comment on lines 341 to 359
CompletableFuture<InstanceObserver> future = CompletableFuture.supplyAsync(() -> {
JavaExecutionResult result;
InstanceObserver instanceObserver = new InstanceObserver();
instanceObserver.setStartTime(System.nanoTime());
// process the message
Thread.currentThread().setContextClassLoader(functionClassLoader);
result = javaInstance.handleMessage(
currentRecord,
currentRecord.getValue(),
asyncResultConsumer,
asyncErrorHandler);
Thread.currentThread().setContextClassLoader(instanceClassLoader);
instanceObserver.setJavaExecutionResult(result);
return instanceObserver;
}).whenComplete((res, ex) -> {
stats.processTimeEnd(res.getStartTime());
});

JavaExecutionResult result = future.join().getJavaExecutionResult();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense and is completely unnecessary.
Instead, you should simply pass the starting time nanos as an argument that gets added to the JavaExecutionResult instance field (add a new field startTimeNanos to JavaExecutionResult).
The result can be recorded in the handleResult method. The processTimeStart method should be completely removed from the ComponentStatsManager class and the processTimeEnd method should accept the starting time as a parameter. The value gets taken from the startTimeNanos field of the JavaExecutionResult instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have completed these tasks.

@walkinggo walkinggo force-pushed the complete-get-proper-time branch from 58d57af to 3511c82 Compare January 4, 2025 17:10
@walkinggo
Copy link
Author

It looks like due to code formatting, many unintended changes were introduced. I added startTimeNanos in JavaExecutionResult and removed it from ComponentStatsManager. Now, the processTimeEnd method in ComponentStatsManager takes the start time as a parameter and calculates the final time. I also removed the processTimeStart method.

@walkinggo
Copy link
Author

Do I need to create a new PR to remove the unnecessary line changes caused by formatting? @ @lhotari

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, going in the right direction!

@lhotari
Copy link
Member

lhotari commented Jan 4, 2025

Do I need to create a new PR to remove the unnecessary line changes caused by formatting? @ @lhotari

It makes it slightly harder to read the diff when there are formatting changes. However there's the "hide whitespace" option to address that in GitHub UI.
Are you using IntelliJ/IDEA for development? Have you setup the code style according to the instructions at https://pulsar.apache.org/contribute/setup-ide/#configure-code-style ? I noticed that the import formatting doesn't match the Pulsar formatting since there are no empty lines in the Pulsar code style.
In the past, I have noticed that some code in the Pulsar Functions packages is using some non-default formatting so if you reformat the code, it will result in many changes. As long as you are using the Pulsar code style, the formatting changes are fine to be included in this PR.

@lhotari lhotari changed the title [Bug][pulsar-function]Solve pulsar Function processing time doesn't get properly recorded for asynchronous functions with completefuture . [fix][fn] Record Pulsar Function processing time properly for asynchronous functions Jan 4, 2025
@walkinggo
Copy link
Author

I didn't set the correct code formatting in IDEA. Therefore, I will close this PR and open a new one to fix the issue.see #23820

@walkinggo walkinggo closed this Jan 7, 2025
@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

I didn't set the correct code formatting in IDEA. Therefore, I will close this PR and open a new one to fix the issue.see #23820

@walkinggo There shouldn't be a need to create new PRs in this type of cases. You could have simply reformatted the code. All commits in the PR will get squashed into a single commit when the PR gets merged.

@walkinggo walkinggo reopened this Jan 7, 2025
@walkinggo
Copy link
Author

@lhotari Alright, I've used another branch to solve the issue. I'll now overwrite the current branch with the other one and close the next PR.

@walkinggo walkinggo force-pushed the complete-get-proper-time branch 2 times, most recently from 5775db1 to 0344726 Compare January 7, 2025 17:02
@walkinggo
Copy link
Author

Should I modify the related code logic based on the following error that appeared in my local CI check? @lhotari

[ERROR] Medium: org.apache.pulsar.functions.instance.JavaInstance$AsyncFuncRequest.getResult() may expose internal representation by returning JavaInstance$AsyncFuncRequest.result [org.apache.pulsar.functions.instance.JavaInstance$AsyncFuncRequest] At JavaInstance.java:[line 50] EI_EXPOSE_REP

@walkinggo
Copy link
Author

I pass the start time of the result as a parameter rather than making it an inherent attribute of the result. This is because I don't want the result to become an inherent variable of AsyncFuncRequest. If it were an inherent variable, the result would be exposed, triggering the EI_EXPOSE_REP warning.

@lhotari
Copy link
Member

lhotari commented Jan 9, 2025

I pass the start time of the result as a parameter rather than making it an inherent attribute of the result. This is because I don't want the result to become an inherent variable of AsyncFuncRequest. If it were an inherent variable, the result would be exposed, triggering the EI_EXPOSE_REP warning.

That makes the code worse in my opinion. I'd recommend adding a definition to the src/main/resources/findbugsExclude.xml file to ignore the EI_EXPOSE_REP SpotBugs/FindBugs check for the particular method. We do that already for many other locations. This is the file where you can add a new entry: https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/resources/findbugsExclude.xml .

  <Match>
    <Class name="org.apache.pulsar.functions.instance.JavaInstance$AsyncFuncRequest"/>
    <Method name="getResult"/>
    <Bug pattern="EI_EXPOSE_REP"/>
  </Match>

After adding this rule, you can revert 7aba620 to get back to the better version of the code.

@walkinggo
Copy link
Author

I have added the content to the corresponding file. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Pulsar Function processing time doesn't get properly recorded for asynchronous functions
2 participants