-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate large files download using concurent chunks for lakectl download recursive #8613
base: master
Are you sure you want to change the base?
Conversation
…kectl fs download recursive The curent `lakectl fs download` supports downloading large files by splitting the files and downloading the content in parallel. This change add the same support to the recursive option with the support to run multiple downloads (of parallel chunks) in parallel. TODO New downloader do not uses sync infrastructure, currently support the same flags: use-presign and parallel downloads. Need to look into progress bar with chunks support. Support no-progress flag as sync does.
if !recursive { | ||
src := uri.URI{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description says that we are modifying download with the recursive option. However there appear to be significant changes to the !recursive code branch. What is the nature of these changes?
@@ -644,10 +642,6 @@ func TestLakectlFsDownload(t *testing.T) { | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.2") | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.3") | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.4") | |||
require.Contains(t, sanitizedResult, "Download Summary:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the user still want to see these summaries? Shouldn't they be supported?
@@ -690,7 +680,6 @@ func TestLakectlFsUpload(t *testing.T) { | |||
t.Run("single_file_with_recursive", func(t *testing.T) { | |||
vars["FILE_PATH"] = "data/ro/ro_1k.0" | |||
RunCmdAndVerifySuccessWithFile(t, Lakectl()+" fs upload --recursive -s files/ro_1k lakefs://"+repoName+"/"+mainBranch+"/"+vars["FILE_PATH"]+" -s files/ro_1k", false, "lakectl_fs_upload", vars) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ultimately a performance improvement issue. Therefore some benchmarks should be taken to verify not just correctness but comparative performance between new/old implementation in various cases i.e. few small files, lots of small files, few large files, lots of large files...
The current
lakectl fs download
supports downloading large files by splitting the files and downloading the content in parallel.This change add the same support to the recursive option with the support to run multiple downloads (of parallel chunks) in parallel.
Close #8383