-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add no_hardlinks option to LocalConfig and fix error handling #642
Conversation
I've reviewed the changes in apache/arrow-rs#5094 and made some additional modifications. I'm pleased to report that we've made significant progress. The Given that this functionality is designed for file systems without hardlinks, which presents some testing challenges, I've conducted manual experiments after building the project. The results have been positive so far. Here's an example of the SQL operations I've successfully tested: CREATE TABLE test (application_ID TEXT);
SELECT * FROM test;
INSERT INTO test (application_ID)
VALUES
('APP2024-001'),
('APP2024-002'),
('APP2024-003'),
('APP2024-004'),
('APP2024-005'),
('APP2024-006'),
('APP2024-007'),
('APP2024-008'),
('APP2024-009'),
('APP2024-010');
SELECT * FROM test;
ALTER TABLE test RENAME TO testy;
SELECT * FROM testy; All of these operations executed without any errors, which is encouraging. I'd appreciate any suggestions for other common query operations that we should verify. Your expertise would be valuable in ensuring we've covered all the essential scenarios. Thank you for your ongoing support and collaboration on this project! |
@mrchypark Looks good, thanks! Do you mind fixing the formatting / Clippy errors in CI (it's failing right now: https://github.com/mrchypark/seafowl/actions/runs/10667722628/job/29565871004)? I'm trying to see how to enable workflows to run inside of this PR but can't, I can only see CI runs on your original repo. |
I think we need to extend seafowl/.github/workflows/ci.yml Lines 3 to 4 in c22c3fc
We also probably need some settings tweaks along these lines: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository#controlling-changes-from-forks-to-workflows-in-public-repositories |
@gruuya got it, thanks! @mrchypark if you rebase on our main branch to pick up https://github.com/splitgraph/seafowl/pull/649/files, we should be able to run CI on this PR right here |
@mrchypark I checked your branch locally and fixed Clippy/formatting errors, if you push this patch to your branch, CI should pass and we'll merge: commit 45c9215f38d91491c42e6aaf688c4985d6ed9802
Author: Artjoms Iškovs <[email protected]>
Date: Tue Sep 3 10:47:50 2024 +0100
Fix formatting and Clippy
diff --git a/object_store_factory/src/local.rs b/object_store_factory/src/local.rs
index 9d107a4..c494b2b 100644
--- a/object_store_factory/src/local.rs
+++ b/object_store_factory/src/local.rs
@@ -19,20 +19,27 @@ impl LocalConfig {
map: &HashMap<String, String>,
) -> Result<Self, object_store::Error> {
Ok(Self {
- data_dir: map.get("data_dir")
- .ok_or_else(|| object_store::Error::Generic {
- store: "local",
- source: "Missing data_dir".into(),
- })?
- .clone(),
- disable_hardlinks: map.get("disable_hardlinks").map(|s| s == "true").unwrap_or(false),
+ data_dir: map
+ .get("data_dir")
+ .ok_or_else(|| object_store::Error::Generic {
+ store: "local",
+ source: "Missing data_dir".into(),
+ })?
+ .clone(),
+ disable_hardlinks: map
+ .get("disable_hardlinks")
+ .map(|s| s == "true")
+ .unwrap_or(false),
})
}
pub fn to_hashmap(&self) -> HashMap<String, String> {
let mut map = HashMap::new();
map.insert("data_dir".to_string(), self.data_dir.clone());
- map.insert("disable_hardlinks".to_string(), self.disable_hardlinks.to_string());
+ map.insert(
+ "disable_hardlinks".to_string(),
+ self.disable_hardlinks.to_string(),
+ );
map
}
@@ -57,7 +64,7 @@ mod tests {
let config = LocalConfig::from_hashmap(&map)
.expect("Failed to create config from hashmap");
assert_eq!(config.data_dir, "/tmp/data".to_string());
- assert_eq!(config.disable_hardlinks, false); // Default value
+ assert!(!config.disable_hardlinks); // Default value
}
#[test]
@@ -69,7 +76,7 @@ mod tests {
let config = LocalConfig::from_hashmap(&map)
.expect("Failed to create config from hashmap");
assert_eq!(config.data_dir, "/tmp/data".to_string());
- assert_eq!(config.disable_hardlinks, true);
+ assert!(config.disable_hardlinks);
}
#[test]
@@ -81,7 +88,7 @@ mod tests {
let config = LocalConfig::from_hashmap(&map)
.expect("Failed to create config from hashmap");
assert_eq!(config.data_dir, "/tmp/data".to_string());
- assert_eq!(config.disable_hardlinks, false);
+ assert!(!config.disable_hardlinks);
}
#[test]
@@ -133,7 +140,7 @@ mod tests {
#[test]
fn test_default_false() {
- assert_eq!(default_false(), false);
+ assert!(!default_false());
}
#[test]
@@ -146,7 +153,7 @@ mod tests {
let config: LocalConfig = serde_json::from_str(json).unwrap();
assert_eq!(config.data_dir, "/tmp/data");
- assert_eq!(config.disable_hardlinks, false);
+ assert!(!config.disable_hardlinks);
}
#[test]
@@ -160,6 +167,6 @@ mod tests {
let config: LocalConfig = serde_json::from_str(json).unwrap();
assert_eq!(config.data_dir, "/tmp/data");
- assert_eq!(config.disable_hardlinks, true);
+ assert!(config.disable_hardlinks);
}
-}
\ No newline at end of file
+}
diff --git a/src/main.rs b/src/main.rs
index 46eab5f..1caf26e 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -29,7 +29,7 @@ use seafowl::{
use tokio::time::{interval, Duration};
use tracing::level_filters::LevelFilter;
-use tracing::{error, info, debug, subscriber, warn};
+use tracing::{debug, error, info, subscriber, warn};
use tracing_log::LogTracer;
use tracing_subscriber::filter::EnvFilter;
diff --git a/src/object_store/wrapped.rs b/src/object_store/wrapped.rs
index ad4446d..a17b10c 100644
--- a/src/object_store/wrapped.rs
+++ b/src/object_store/wrapped.rs
@@ -16,8 +16,8 @@ use url::Url;
use object_store_factory::aws::S3Config;
use object_store_factory::google::GCSConfig;
-use object_store_factory::ObjectStoreConfig;
use object_store_factory::local::LocalConfig;
+use object_store_factory::ObjectStoreConfig;
// Wrapper around the object_store crate that holds on to the original config
// in order to provide a more efficient "upload" for the local object store
@@ -152,8 +152,22 @@ impl ObjectStore for InternalObjectStore {
payload: PutPayload,
opts: PutOptions,
) -> Result<PutResult> {
- if let ObjectStoreConfig::Local(LocalConfig { disable_hardlinks: true, .. }) = self.config {
- return self.inner.put_opts(location, payload, PutOptions{mode: object_store::PutMode::Overwrite, ..opts}).await
+ if let ObjectStoreConfig::Local(LocalConfig {
+ disable_hardlinks: true,
+ ..
+ }) = self.config
+ {
+ return self
+ .inner
+ .put_opts(
+ location,
+ payload,
+ PutOptions {
+ mode: object_store::PutMode::Overwrite,
+ ..opts
+ },
+ )
+ .await;
};
self.inner.put_opts(location, payload, opts).await
}
@@ -243,7 +257,11 @@ impl ObjectStore for InternalObjectStore {
///
/// Will return an error if the destination already has an object.
async fn copy_if_not_exists(&self, from: &Path, to: &Path) -> Result<()> {
- if let ObjectStoreConfig::Local(LocalConfig { disable_hardlinks: true, .. }) = self.config {
+ if let ObjectStoreConfig::Local(LocalConfig {
+ disable_hardlinks: true,
+ ..
+ }) = self.config
+ {
return self.inner.copy(from, to).await;
}
self.inner.copy_if_not_exists(from, to).await
@@ -261,7 +279,11 @@ impl ObjectStore for InternalObjectStore {
// this with a lock too, so look into using that down the line instead.
return self.inner.rename(from, to).await;
}
- if let ObjectStoreConfig::Local(LocalConfig { disable_hardlinks: true, .. }) = self.config {
+ if let ObjectStoreConfig::Local(LocalConfig {
+ disable_hardlinks: true,
+ ..
+ }) = self.config
+ {
return self.inner.rename(from, to).await;
}
self.inner.rename_if_not_exists(from, to).await
@@ -274,7 +296,7 @@ mod tests {
use crate::object_store::wrapped::InternalObjectStore;
use datafusion::common::Result;
use rstest::rstest;
-
+
use object_store_factory::aws::S3Config;
use object_store_factory::ObjectStoreConfig;
|
OK @mrchypark I'll fix the build on a separate branch in our repo and raise it as a separate PR, give me a few minutes |
What am I missing? |
All good now @mrchypark, I fixed the tests in #650 and we'll merge that PR instead, I'll close this PR. Thanks a lot for the contribution! |
@mrchypark for future needs you can also do |
What
Add
no_hardlinks
option toLocalConfig
for object store operations. Try to resolve this issue.How
no_hardlinks
boolean field toLocalConfig
structfrom_hashmap
andto_hashmap
methods to handle the new fieldcopy_if_not_exists
andrename_if_not_exists
methods inInternalObjectStore
to useno_hardlinks
optionWhy
Tests
no_hardlinks
option