Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Lance storage options #2830

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rust/lance-io/src/object_store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
use url::Url;

use super::local::LocalObjectReader;
mod options;
mod tracing;
use self::tracing::ObjectStoreTracingExt;
use crate::{object_reader::CloudObjectReader, object_writer::ObjectWriter, traits::Reader};
Expand Down Expand Up @@ -610,7 +611,7 @@
pub fn remove_stream<'a>(
&'a self,
locations: BoxStream<'a, Result<Path>>,
) -> BoxStream<Result<Path>> {

Check warning on line 614 in rust/lance-io/src/object_store.rs

View workflow job for this annotation

GitHub Actions / linux-build (nightly)

elided lifetime has a name
self.inner
.delete_stream(locations.err_into::<ObjectStoreError>().boxed())
.err_into::<Error>()
Expand Down
98 changes: 98 additions & 0 deletions rust/lance-io/src/object_store/options.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The Lance Authors

// Inspired by AmazonS3ConfigKey and friends from object-store

use std::str::FromStr;

/// Configuration keys for Lance Object Store
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum LanceStorageOption {
/// Whether to use to use the same size for all parts in a multipart upload.
/// By default, this is false, unless `ENDPOINT_URL` is set to a Cloudflare
/// R2 endpoint.
UseConstantSizeUploadParts,
/// Whether it is safe to assume that list operations return results in
/// lexicographical order. This is used for optimizing discover of the
/// latest manifest.
ListIsLexigraphicallySorted,
/// The number of IO operations to perform in parallel.
IoParallelism,
}

impl AsRef<str> for LanceStorageOption {
fn as_ref(&self) -> &str {
match self {
Self::UseConstantSizeUploadParts => "lance_use_constant_size_upload_parts",
Self::ListIsLexigraphicallySorted => "lance_list_is_lexigraphically_sorted",
Self::IoParallelism => "lance_io_parallelism",
}
}
}

impl FromStr for LanceStorageOption {
type Err = ();

fn from_str(s: &str) -> Result<Self, ()> {
match s {
"lance_use_constant_size_upload_parts" => Ok(Self::UseConstantSizeUploadParts),
"lance_list_is_lexigraphically_sorted" => Ok(Self::ListIsLexigraphicallySorted),
"lance_io_parallelism" => Ok(Self::IoParallelism),
_ => Err(()),
}
}
}

fn extract_lance_storage_options<'a>(

Check warning on line 46 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-arm

function `extract_lance_storage_options` is never used

Check warning on line 46 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (stable)

function `extract_lance_storage_options` is never used

Check warning on line 46 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (nightly)

function `extract_lance_storage_options` is never used
options: impl IntoIterator<Item = (&'a str, &'a str)> + 'a,
) -> impl Iterator<Item = (LanceStorageOption, &'a str)> + 'a {
options.into_iter().filter_map(|(key, value)| {
let key = key.parse().ok()?;
Some((key, value))
})
}

#[derive(Default, Debug)]
pub struct LanceStorageConfig {
use_constant_size_upload_parts: Option<bool>,

Check warning on line 57 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-arm

fields `use_constant_size_upload_parts`, `list_is_lexigraphically_sorted`, and `io_parallelism` are never read

Check warning on line 57 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (stable)

fields `use_constant_size_upload_parts`, `list_is_lexigraphically_sorted`, and `io_parallelism` are never read

Check warning on line 57 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (nightly)

fields `use_constant_size_upload_parts`, `list_is_lexigraphically_sorted`, and `io_parallelism` are never read
list_is_lexigraphically_sorted: Option<bool>,
io_parallelism: Option<usize>,
}

impl LanceStorageConfig {
fn with_config(&mut self, key: LanceStorageOption, value: &str) {

Check warning on line 63 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-arm

method `with_config` is never used

Check warning on line 63 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (stable)

method `with_config` is never used

Check warning on line 63 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (nightly)

method `with_config` is never used
match key {
LanceStorageOption::UseConstantSizeUploadParts => {
self.use_constant_size_upload_parts = Some(value.parse().unwrap());
}
LanceStorageOption::ListIsLexigraphicallySorted => {
self.list_is_lexigraphically_sorted = Some(value.parse().unwrap());
}
LanceStorageOption::IoParallelism => {
self.io_parallelism = Some(value.parse().unwrap());
}
}
}
}

pub fn infer_lance_storage_options<'a>(

Check warning on line 78 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-arm

function `infer_lance_storage_options` is never used

Check warning on line 78 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (stable)

function `infer_lance_storage_options` is never used

Check warning on line 78 in rust/lance-io/src/object_store/options.rs

View workflow job for this annotation

GitHub Actions / linux-build (nightly)

function `infer_lance_storage_options` is never used
options: impl IntoIterator<Item = (&'a str, &'a str)> + 'a,
) -> LanceStorageConfig {
let mut config = LanceStorageConfig::default();

for (os_key, os_value) in std::env::vars_os() {
if let (Some(key), Some(value)) = (os_key.to_str(), os_value.to_str()) {
if key.starts_with("LANCE_") {
if let Ok(config_key) = key.to_ascii_lowercase().parse() {
config.with_config(config_key, value);
}
}
}
}

for (key, value) in extract_lance_storage_options(options) {
config.with_config(key, value);
}

config
}
Loading