-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52470][ML][CONNECT] Support model summary offloading #51187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
The CI failure https://github.com/WeichenXu123/spark/actions/runs/15703082903/job/44242454735#step:7:3170 is not related to my PR. |
merged to master |
@WeichenXu123 The test case "pyspark.ml.tests.connect.test_parity_pipeline" experienced timeout issues twice in the recent commit pipelines. And I don't think I've encountered this issue before. Could you please help confirm whether this is related to the current pr? Thanks ~
also cc @zhengruifeng |
it may also hang in other ml connect tests: pyspark.ml.tests.connect.test_parity_classification in https://github.com/apache/spark/actions/runs/15741346719/job/44367299872 |
What changes were proposed in this pull request?
This PR makes Spark Connect ML supporting model summary offloading.
Model summary offloading is hard to support because it contains a Spark dataset which can't be easily serialized in Spark driver (NOTE: we can't java serializer to serialize the Spark dataset logical plan otherwise it is a RCE vulnerability),
to address the issue, when saving Summary to disk, it only saves the necessary data fields,
when loading Summary back, the client needs to send the dataset to Spark driver again,
to achieve it, 2 new proto messages are introduced:
CreateSummary
inMlCommand
2:
model_summary_dataset
inMlRelation
Why are the changes needed?
Support model summary offloading.
Without this, the model summary will be evicted from Spark driver memory after default 15min timeout, results in
model.summary
API unavailability.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.