Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocess Option #10

Open
Petahhh opened this issue Aug 7, 2015 · 3 comments
Open

Multiprocess Option #10

Petahhh opened this issue Aug 7, 2015 · 3 comments
Assignees

Comments

@Petahhh
Copy link
Contributor

Petahhh commented Aug 7, 2015

For directories with 20+ million files, multiple processes will be necessary to complete the upload within a reasonable amount of time.

@Petahhh Petahhh self-assigned this Aug 7, 2015
@cudevmaxwell
Copy link

This is the big difference between the two versions, and I'm still not sure which is the 'better' option.

Using a multiprocessing pool to create workers who use the raw swift library calls might be more efficient, but not as 'safe'. We could reuse the code from before the PR.

Using subcommand to fire off batches of 'swift upload...' shell commands is less efficient, but 'safer'.

The latest PR was a big downgrade, in terms of speed. I definitely want to fix that.

I'm willing to try replacing subcommand with http://docs.openstack.org/developer/python-swiftclient/swiftclient.html#module-swiftclient.multithreading or our own multiprocess pool, see if we can get the upload speed up while still doing md5sum checks.

@cudevmaxwell
Copy link

While doing the large batch upload for #9, it's become pretty clear that a single thread calling 'swift upload' millions of times isn't going to cut it. Way too slow. Even if we don't call the library directly, and continue to use subcommand, we'll need a multiprocess pool to call 'swift upload' in parallel. We can use existing code in bulkupload.py.

@cudevmaxwell
Copy link

On my test VM, uploading 1,000,000 1-2 kb files to a container using swift upload took 838m28.226s. We should shoot for performance in this tool to fall within 110% of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants