Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate large files download using concurent chunks for lakectl download recursive #8613

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Feb 5, 2025

The current lakectl fs download supports downloading large files by splitting the files and downloading the content in parallel.

This change add the same support to the recursive option with the support to run multiple downloads (of parallel chunks) in parallel.

Close #8383

…kectl fs download recursive

The curent `lakectl fs download` supports downloading large files by
splitting the files and downloading the content in parallel.

This change add the same support to the recursive option with the
support to run multiple downloads (of parallel chunks) in parallel.

TODO

New downloader do not uses sync infrastructure, currently support the same flags: use-presign and parallel downloads.
Need to look into progress bar with chunks support.
Support no-progress flag as sync does.
@nopcoder nopcoder self-assigned this Feb 5, 2025
@nopcoder nopcoder added the include-changelog PR description should be included in next release changelog label Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link

github-actions bot commented Feb 5, 2025

E2E Test Results - Quickstart

11 passed

@nopcoder nopcoder requested a review from a team February 6, 2025 17:17
@nopcoder nopcoder removed the request for review from a team February 7, 2025 10:58
@nopcoder nopcoder requested a review from a team February 7, 2025 21:05
if !recursive {
src := uri.URI{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says that we are modifying download with the recursive option. However there appear to be significant changes to the !recursive code branch. What is the nature of these changes?

@@ -644,10 +642,6 @@ func TestLakectlFsDownload(t *testing.T) {
require.Contains(t, sanitizedResult, "download ro/ro_1k.2")
require.Contains(t, sanitizedResult, "download ro/ro_1k.3")
require.Contains(t, sanitizedResult, "download ro/ro_1k.4")
require.Contains(t, sanitizedResult, "Download Summary:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the user still want to see these summaries? Shouldn't they be supported?

@@ -690,7 +680,6 @@ func TestLakectlFsUpload(t *testing.T) {
t.Run("single_file_with_recursive", func(t *testing.T) {
vars["FILE_PATH"] = "data/ro/ro_1k.0"
RunCmdAndVerifySuccessWithFile(t, Lakectl()+" fs upload --recursive -s files/ro_1k lakefs://"+repoName+"/"+mainBranch+"/"+vars["FILE_PATH"]+" -s files/ro_1k", false, "lakectl_fs_upload", vars)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ultimately a performance improvement issue. Therefore some benchmarks should be taken to verify not just correctness but comparative performance between new/old implementation in various cases i.e. few small files, lots of small files, few large files, lots of large files...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lakectl fs recursive download to use multipart
2 participants