Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send raw history blobs from history service to frontend #7179

Merged
merged 23 commits into from
Feb 4, 2025

Conversation

prathyushpv
Copy link
Contributor

What changed?

Change to send raw history blobs from history service to frontend service. History service returns a new proto message that has a repeated bytes history field.
This response is wire compatible with the original response which has temporal.api.history.v1.History type for this field. This allows history service to not deserialize events from this data blob. This considerably reduces CPU usage.

History service still needs event_id and version decoded from history events. For this we use a new proto message StrippedHistoryEvent which has these two fields only. It takes considerably less CPU to decode events to this struct.

Why?

We have seen incidents of high history CPU usage when large number of GetWorkflowExecutionHistory calls are made to workflows which has large history. With this change we can reduce the CPU burden on history service during this API call.

How did you test it?

Existing unit tests and manual test to run workflows.

Potential risks

Documentation

Is hotfix candidate?

No

Copy link
Member

@dnr dnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's draft but I just started looking and had a few comments

Makefile Outdated Show resolved Hide resolved
common/persistence/history_manager_util.go Outdated Show resolved Hide resolved
common/persistence/serialization/serializer.go Outdated Show resolved Hide resolved
switch data.EncodingType {
case enumspb.ENCODING_TYPE_PROTO3:
err = proto.UnmarshalOptions{
DiscardUnknown: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment here that this option is key to the performance gains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -65,3 +66,28 @@ message TaskRange {
TaskKey inclusive_min_task_key = 1;
TaskKey exclusive_max_task_key = 2;
}

message StrippedHistoryEvent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this is a subset of HistoryEvent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

message GetWorkflowExecutionHistoryResponse {
repeated bytes history = 1;
// Raw history is an alternate representation of history that may be returned if configured on
// the frontend. This is not supported by all SDKs. Either this or `history` will be set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment about supported by SDKs? that's not relevant here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah right. Removed it.

@@ -981,6 +981,10 @@ message GetWorkflowExecutionHistoryResponse {
temporal.api.workflowservice.v1.GetWorkflowExecutionHistoryResponse response = 1;
}

message GetWorkflowExecutionHistoryResponseWithRaw {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this is wire-compatible with GetWorkflowExecutionHistoryResponse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@prathyushpv prathyushpv marked this pull request as ready for review February 1, 2025 04:40
@prathyushpv prathyushpv requested a review from a team as a code owner February 1, 2025 04:40
@prathyushpv prathyushpv requested a review from yycptt February 1, 2025 04:40
develop/protoc.sh Outdated Show resolved Hide resolved
@@ -177,6 +179,33 @@ func (t *serializerImpl) DeserializeEvents(data *commonpb.DataBlob) ([]*historyp
return events.Events, nil
}

func (t *serializerImpl) DeserializeStrippedEvents(data *commonpb.DataBlob) ([]*historyspb.StrippedHistoryEvent, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for this? To make sure the two proto defs are wire compatible and also unknown fields are dropped during decoding (i.e. no unknown field is empty).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this.
Also added a test to verify that HistoryService's response is wire compatible with GetWorkflowExecutionHistoryResponse.

@@ -305,3 +305,91 @@ func validateTransientWorkflowTaskEvents(

return nil
}

func FixFollowEvents(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this function to service/frontend if it's no longer used by history service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

service/frontend/workflow_handler.go Show resolved Hide resolved
service/history/api/get_history_util.go Show resolved Hide resolved
common/persistence/history_manager.go Show resolved Hide resolved
Co-authored-by: Yichao Yang <[email protected]>
@prathyushpv prathyushpv merged commit e4cfc5a into main Feb 4, 2025
50 checks passed
@prathyushpv prathyushpv deleted the ppv/raw_history branch February 4, 2025 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants