Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] ConsumptionTask Refactor #1318

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

KaiSernLim
Copy link
Contributor

@KaiSernLim KaiSernLim commented Nov 19, 2024

Summary

ConsumerRefactorx2

StoreIngestionTask.java is getting too large and should be refactored.

There are three threads: ConsumptionTask, StoreBufferDrainer, and StoreIngestionTask that all share the methods in StoreIngestionTask.java, and it is difficult to know which part of the ingestion pipeline a StoreIngestionTask method is part of, just by looking at it.

  1. Moved all ConsumptionTask-only functions and member variables out of StoreIngestionTask into a class such as StorePartitionDataReceiver.
  2. I also plan on renaming StorePartitionDataReceiver to ConsumedPartitionReceiver to better indicate that it's part of the ConsumptionTask thread, but I'll leave it out of the diff to help with the review. Feel free to provide additional suggestions.

How was this PR tested?

GHCI

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm supportive of reducing the size of the SIT classes and subclasses.

I left a few minor comments and questions. Please take a look and LMK what you think.

Comment on lines 2914 to 2531
protected Lazy<VeniceWriter<byte[], byte[], byte[]>> getVeniceWriter(
PartitionConsumptionState partitionConsumptionState) {
return partitionConsumptionState.getVeniceWriterLazyRef();
}

protected void setRealTimeVeniceWriterRef(PartitionConsumptionState partitionConsumptionState) {
partitionConsumptionState.setVeniceWriterLazyRef(veniceWriterForRealTime);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need functions for this? Why not just call partitionConsumptionState.getVeniceWriterLazyRef() directly wherever we need to get the VW? And same for the setter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the writer on the PCS is not the real time writer, and it's being swapped out for veniceWriterForRealTime, which is a member of LeaderFollowerStoreIngestionTask.

Comment on lines +321 to +494
PartitionConsumptionState partitionConsumptionState =
storeIngestionTask.getPartitionConsumptionState(topicPartition.getPartitionNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be great if the PCS became a class property of the StorePartitionDataReceiver, rather than be held in a map inside the SIT. I suspect this is easier said than done, and no need to do it just yet... A large refactoring like this is ideally kept free of functional changes, and that one may be too risky. Better do it as a smaller follow up change. But just wanted to share this crazy dream 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo, that's an interesting idea. I've never thought of that before. I'd definitely be interested in attempting something like that out someday.

The way I saw it was that the PCS map is a "global" variable that is shared among the ConsumptionTask, Drainer, and SIT threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCS becoming part of consumer thread only would require offset update not happening in either SIT or Drainer then? This seems like it will break the offset commit behavior IIUC

@sixpluszero
Copy link
Contributor

I have a general comment - from my understanding, SIT thread also interacting with the SIT functions and PCS states. Even though I don't think this part of change is scoped into current PR, but the diagram created above seems to not including SIT thread. Do you mind adding a minor update to include that as well? This can be added to future diagram to talk about how the overall refactor works before vs after (btw I like all your diagrams, as they are clear to understand, thank you!)

@KaiSernLim KaiSernLim self-assigned this Jan 3, 2025
@KaiSernLim KaiSernLim changed the title [server] Consumer Refactor (Part 1) [server] ConsumptionTask Refactor Jan 6, 2025
…nTask` into `StorePartitionDataReceiver`. 🧹
…ngestionTask` into `StorePartitionDataReceiver`. 🧹
…ght before `waitUntilValueSchemaAvailable()` from `StoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…rom `StoreIngestionTask` / `LeaderFollowerStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…from LeaderFollowerStoreIngestionTask / ActiveActiveStoreIngestionTask into StorePartitionDataReceiver. 🧹
…FollowerStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…rStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…rom LeaderFollowerStoreIngestionTask into StorePartitionDataReceiver. 🧹
…werStoreIngestionTask` / `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…tionTask` into `StorePartitionDataReceiver`. 🧹
…tionTask` into `StorePartitionDataReceiver`. 🧹
…Task` / `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…ionTask` / `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…k` / `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…stionTask` into `StorePartitionDataReceiver`. 🧹
…veStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…ecord()` from `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…sk` into `StorePartitionDataReceiver`. 🧹
…onTask` into `StorePartitionDataReceiver`. 🧹
…toreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…rFollowerStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…SchemaByteBufferFromStorage()` from `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…Manager` from `ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…ActiveActiveStoreIngestionTask` into `StorePartitionDataReceiver`. 🧹
…Task`, because it's just `versionedIngestionStats`? 🧹
@@ -193,59 +153,9 @@ public static int getKeyLevelLockMaxPoolSizeBasedOnServerConfig(VeniceServerConf
* serverConfig.getKafkaClusterIdToUrlMap().size() * multiplier + 1;
}

@Override
protected DelegateConsumerRecordResult delegateConsumerRecord(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is too big and is a bit confusing with this override method. I think you moved these all into StorePartitionDataReceiver class, but how do you do the override there then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was moved into delegateConsumerRecordMaybeWithLock()

Copy link
Contributor

@sixpluszero sixpluszero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally took a pass over this gigantic change. Overall it looks good, left a comment on whether we can further split the data receiver class out by moving util method out. But overall the structure is ok. And need a batch of rebase again...

}
}

private void validatePostOperationResultsAndRecord(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is it possible that we move some util methods out of this class and make the file size smaller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you think it is wise to instead let the DataReceiver class sub-class by AA or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants