Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-5313: Remove Metakey #5313

Merged
merged 3 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions core/src/docs/rfcs/5314_remove_metakey.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
- Proposal Name: `remove_metakey`
- Start Date: 2024-11-12
- RFC PR: [apache/opendal#5313](https://github.com/apache/opendal/pull/5313)
- Tracking Issue: [apache/opendal#5314](https://github.com/apache/opendal/issues/5314)

# Summary

Remove the `Metakey` concept from OpenDAL and replace it with a simpler and more predictable metadata handling mechanism.

# Motivation

The current `Metakey` design has several issues:

1. Performance Impact: Users often unknowingly trigger expensive operations (e.g., using `Metakey::Full` causes additional stat calls)
2. Usability Issues: Users frequently attempt to access metadata that wasn't explicitly requested
3. API Confusion: Overlap between `Metakey::Version` and the new `version(bool)` parameter
4. Implementation Complexity: Service developers find it challenging to implement `Metakey` correctly
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

The expected outcome is a simpler, more predictable API that prevents common mistakes and improves performance by default.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

# Guide-level explanation

Instead of using `Metakey` to specify which metadata fields to fetch, services will now declare their metadata capabilities upfront through a new `MetadataCapability` struct:

```rust
let entries = op.list("path").await?;
for entry in entries {
if op.metadata_capability().content_type {
println!("Content-Type: {}", entry.metadata().content_type());
}
}
```

If users need additional metadata not provided by `list`:

```rust
let entries = op.list("path").await?;
for entry in entries {
let mut meta = entry.metadata();
if !op.metadata_capability().etag {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this usage, it seems to be the metadata capability of the list operation ?

meta = op.stat(&entry.path()).await?;
}
println!("Content-Type: {}", meta.etag());
}
```

For existing OpenDAL users, the main changes are:

- Remove all `metakey()` calls from their code
- Use `metadata_capability()` to check available metadata
- Explicitly call `stat()` when needed

# Reference-level explanation

The implementation involves:

1. Remove the `Metakey` enum
2. Add new `MetadataCapability` struct:
```rust
pub struct MetadataCapability {
pub content_length: bool,
pub content_type: bool,
pub last_modified: bool,
pub etag: bool,
pub mode: bool,
pub version: bool,
...
}
```

3. Add method to Operator to query capabilities:
```rust
impl Operator {
pub fn metadata_capability(&self) -> MetadataCapability;
}
```

4. Modify list operation to never perform implicit stat calls
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
5. Update all service implementations to declare their metadata capabilities

Each service implementation will need to:
- Remove `Metakey` handling logic
- Implement `metadata_capability()` to accurately reflect what metadata they provide by default
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
- Ensure list operations return metadata that's always available without extra API calls

# Drawbacks

- Breaking change for existing users
- Loss of fine-grained control over metadata fetching
- Potential increased API calls if users need multiple metadata fields

# Rationale and alternatives

This design is superior because:
- Prevents performance pitfalls by default
- Makes metadata availability explicit
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
- Simplifies service implementation
- Provides clearer mental model

Alternatives considered:
1. Keep `Metakey` but make it more restrictive
2. Add warnings for potentially expensive operations
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
3. Make stat calls async/lazy

Not making this change would continue the current issues of performance problems and API misuse.

# Prior art

None

# Unresolved questions

None

# Future possibilities

- Add metadata prefetching optimization
- Add metadata caching layer
- Support for custom metadata fields
- Automated capability detection
4 changes: 4 additions & 0 deletions core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -236,3 +236,7 @@ pub mod rfc_4382_range_based_read {}
/// Executor API
#[doc = include_str!("4638_executor.md")]
pub mod rfc_4638_executor {}

/// Remove metakey
#[doc = include_str!("5314_remove_metakey.md")]
pub mod rfc_5314_remove_metakey {}
Loading