-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH]: Garbage collection for soft deleted attached functions #5774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
3e8363c to
d996d65
Compare
|
Background Garbage-Collector Pipeline for Soft-Deleted Attached Functions This PR wires up an end-to-end garbage-collection workflow that permanently removes “attached functions” after they have been soft-deleted. The change introduces a Rust-based GC worker, new SysDB task orchestration plumbing, configuration knobs for retention/batching, and updates the front-end to soft-delete instead of immediately hard-deleting. The goal is to keep the reversible-delete safety net while guaranteeing eventual storage reclamation and lower operational cost. Roll-out is non-breaking but requires a small DB migration for the new task kind and the addition of a GC configuration block. Key Changes• New Rust garbage_collector_component.rs that scans for AttachedFunctions with deleted=true beyond a configurable grace period and issues hard deletes in batches. Affected Areas• Rust GC worker (src/garbage_collector_component.rs) This summary was automatically generated by @propel-code-bot |
d996d65 to
9d104b4
Compare
| for attached_function_id in attached_functions_to_delete { | ||
| match self | ||
| .sysdb_client | ||
| .finish_attached_function_deletion(attached_function_id) | ||
| .await | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1+n problem. Perhaps we make finish_attached_function_deletion take a list of function ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the pattern of collection deletion. It seems like the deletions for those weren't batched as it's easier to debug.
9d104b4 to
9763d56
Compare
| DetachFunctionError::Internal(Box::new(chroma_error::TonicError( | ||
| tonic::Status::unimplemented("Not implemented"), | ||
| ))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
It's good that you're handling the NotImplemented error case. However, returning a generic tonic::Status::unimplemented("Not implemented") might be a bit opaque for clients. It would be more informative to specify which backend is missing the implementation, which can aid in debugging, especially in environments where different backends might be in use.
Consider making the error message more specific.
Context for Agents
[**BestPractice**]
It's good that you're handling the `NotImplemented` error case. However, returning a generic `tonic::Status::unimplemented("Not implemented")` might be a bit opaque for clients. It would be more informative to specify which backend is missing the implementation, which can aid in debugging, especially in environments where different backends might be in use.
Consider making the error message more specific.
File: rust/frontend/src/impls/service_based_frontend.rs
Line: 19819763d56 to
eae5d7f
Compare
| impl ChromaError for GetSoftDeletedAttachedFunctionsError { | ||
| fn code(&self) -> ErrorCodes { | ||
| ErrorCodes::Internal | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
To improve error handling consistency, the ChromaError implementation for GetSoftDeletedAttachedFunctionsError should propagate the underlying gRPC status code instead of always returning Internal. This follows the standard tonic pattern for error propagation and provides more specific error information to upstream callers.
Suggested Change
| impl ChromaError for GetSoftDeletedAttachedFunctionsError { | |
| fn code(&self) -> ErrorCodes { | |
| ErrorCodes::Internal | |
| } | |
| } | |
| impl ChromaError for GetSoftDeletedAttachedFunctionsError { | |
| fn code(&self) -> ErrorCodes { | |
| match self { | |
| GetSoftDeletedAttachedFunctionsError::FailedToGetSoftDeletedAttachedFunctions(e) => { | |
| e.code().into() | |
| } | |
| GetSoftDeletedAttachedFunctionsError::ServerReturnedInvalidData => ErrorCodes::Internal, | |
| GetSoftDeletedAttachedFunctionsError::NotImplemented => ErrorCodes::Internal, | |
| } | |
| } | |
| } |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
To improve error handling consistency, the `ChromaError` implementation for `GetSoftDeletedAttachedFunctionsError` should propagate the underlying gRPC status code instead of always returning `Internal`. This follows the standard tonic pattern for error propagation and provides more specific error information to upstream callers.
<details>
<summary>Suggested Change</summary>
```suggestion
impl ChromaError for GetSoftDeletedAttachedFunctionsError {
fn code(&self) -> ErrorCodes {
match self {
GetSoftDeletedAttachedFunctionsError::FailedToGetSoftDeletedAttachedFunctions(e) => {
e.code().into()
}
GetSoftDeletedAttachedFunctionsError::ServerReturnedInvalidData => ErrorCodes::Internal,
GetSoftDeletedAttachedFunctionsError::NotImplemented => ErrorCodes::Internal,
}
}
}
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: rust/sysdb/src/sysdb.rs
Line: 2297| impl ChromaError for FinishAttachedFunctionDeletionError { | ||
| fn code(&self) -> ErrorCodes { | ||
| ErrorCodes::Internal | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Similar to other error types in this file, the ChromaError implementation for FinishAttachedFunctionDeletionError should propagate the gRPC status code from the tonic::Status to provide more detailed error information. This follows the standard tonic error handling pattern.
Suggested Change
| impl ChromaError for FinishAttachedFunctionDeletionError { | |
| fn code(&self) -> ErrorCodes { | |
| ErrorCodes::Internal | |
| } | |
| } | |
| impl ChromaError for FinishAttachedFunctionDeletionError { | |
| fn code(&self) -> ErrorCodes { | |
| match self { | |
| FinishAttachedFunctionDeletionError::FailedToFinishDeletion(e) => e.code().into(), | |
| FinishAttachedFunctionDeletionError::NotImplemented => ErrorCodes::Internal, | |
| } | |
| } | |
| } |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
Similar to other error types in this file, the `ChromaError` implementation for `FinishAttachedFunctionDeletionError` should propagate the gRPC status code from the `tonic::Status` to provide more detailed error information. This follows the standard tonic error handling pattern.
<details>
<summary>Suggested Change</summary>
```suggestion
impl ChromaError for FinishAttachedFunctionDeletionError {
fn code(&self) -> ErrorCodes {
match self {
FinishAttachedFunctionDeletionError::FailedToFinishDeletion(e) => e.code().into(),
FinishAttachedFunctionDeletionError::NotImplemented => ErrorCodes::Internal,
}
}
}
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: rust/sysdb/src/sysdb.rs
Line: 2311eae5d7f to
d667ef4
Compare
d667ef4 to
8936c5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main question is whether we should be hard deleting functions prior to executing collection GC
should we hard delete functions per collection instead to match the existing pattern?
|
Functions could be deleted for undeleted collections. Also in the future, it could be possible to have multiple input collections for a function. I guess one thing to consider is maybe disabling a function if one of its input collections is deleted but that could also be detected when the function attempts to run. |
8936c5b to
d719e8c
Compare

Description of changes
This change introduces garbage collection for soft deleted attached functions.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_