Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-5444: Operator From Uri #5444

Merged
merged 6 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions core/src/docs/rfcs/5444_operator_from_uri.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
- Proposal Name: `operator_from_uri`
- Start Date: 2024-12-23
- RFC PR: [apache/opendal#5444](https://github.com/apache/opendal/pull/5444)
- Tracking Issue: [apache/opendal#5445](https://github.com/apache/opendal/issues/5445)

# Summary

This RFC proposes adding URI-based configuration support to OpenDAL, allowing users to create operators directly from URIs. The proposal introduces a new `from_uri` API in both the `Operator` and `Configurator` traits, along with an `OperatorRegistry` to manage operator factories.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

# Motivation

Currently, creating an operator in OpenDAL requires explicit configuration through builder patterns. While this approach provides type safety and clear documentation, it can be verbose and inflexible for simple use cases. Many storage systems are naturally identified by URIs (e.g., `s3://bucket/path`, `fs:///path/to/dir`).

Adding URI-based configuration would:

- Simplify operator creation for common use cases
- Enable configuration via connection strings (common in many applications)
- Make OpenDAL more approachable for new users
- Allow dynamic operator creation based on runtime configuration

# Guide-level explanation

The new API allows creating operators directly from URIs:

```rust
// Create an operator using URI
let op = Operator::from_uri("s3://my-bucket/path", vec![
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
("access_key_id".to_string(), "xxx".to_string()),
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
("secret_key_key".to_string(), "yyy".to_string()),
])?;

// Create a file system operator
let op = Operator::from_uri("fs:///tmp/test", vec![])?;

// Using with custom registry
let registry = OperatorRegistry::new();
registry.register("custom", my_factory);
let op = registry.parse("custom://endpoint", options)?;
```

# Reference-level explanation

The implementation consists of three main components:

1. The `OperatorRegistry`:

```rust
type OperatorFactory = fn(http::Uri, HashMap<String, String>) -> Result<Operator>;
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

pub struct OperatorRegistry {
register: Arc<Mutex<HashMap<String, OperatorFactory>>>,
}

impl OperatorRegistry {
fn register(&self, scheme: &str, factory: OperatorFactory) {
...
}

fn parse(&self, uri: &str, options: impl IntoIterator<Item = (String, String)>) -> Result<Operator> {
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
...
}
}
```

2. The `Configurator` trait extension:

```rust
impl Configurator for S3Config {
fn from_uri(uri: &str, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> {
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
...
}
}
```

3. The `Operator` factory method:

```rust
impl Operator {
pub fn from_uri(
uri: &str,
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
options: impl IntoIterator<Item = (String, String)>,
) -> Result<Self> {
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
static REGISTRY: Lazy<OperatorRegistry> = Lazy::new(|| {
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
let registry = OperatorRegistry::new();
// Register built-in operators
registry.register("s3", s3_factory);
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
registry.register("fs", fs_factory);
// ...
registry
});

REGISTRY.parse(uri, options)
}
}
```

We are intentionally using `&str` instead of `Scheme` here to simplify working with external components outside this crate. Additionally, we plan to remove `Scheme` from our public API soon to enable splitting OpenDAL into multiple crates.

# Drawbacks

- Increases API surface area
- Less type safety compared to builder patterns
- Potential for confusing error messages with invalid URIs
- Need to maintain backwards compatibility

# Rationale and alternatives

Alternatives considered:

1. Connection string format instead of URIs
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
2. Builder pattern with URI parsing
3. Macro-based configuration

URI-based configuration was chosen because:

- URIs are widely understood
- Natural fit for storage locations
- Extensible through custom schemes
- Common in similar tools

# Prior art

Similar patterns exist in:

- Rust's `url` crate
- Database connection strings (PostgreSQL, MongoDB)
- AWS SDK endpoint configuration
- Python's `urllib`

# Unresolved questions

- Should we support custom URI parsing per operator?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should. Uri parsing should be done with crates like https://docs.rs/http/latest/http/uri/struct.Uri.html

What we can do is to use custom query params for example, in order to specify options that wouldn't be present in the "official" connection uri

But I don't think we should do custom parsing.

- How to handle scheme conflicts?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the register method of the SchemaRegistry should panic in that case.

- Should we support URI validation?
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
- How to handle complex configurations that don't map well to URIs?
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

# Future possibilities

- Support for connection string format
- URI templates for batch operations
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
- Custom scheme handlers
- Configuration presets
- URI validation middleware
- Dynamic operator loading based on URI schemes
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -240,3 +240,7 @@ pub mod rfc_4638_executor {}
/// Remove metakey
#[doc = include_str!("5314_remove_metakey.md")]
pub mod rfc_5314_remove_metakey {}

/// Operator from uri
#[doc = include_str!("5444_operator_from_uri.md")]
pub mod rfc_5444_operator_from_uri {}
Loading