Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async S3 HTTP with botornado #14

Closed
wants to merge 2 commits into from
Closed

async S3 HTTP with botornado #14

wants to merge 2 commits into from

Conversation

pda
Copy link

@pda pda commented Nov 18, 2014

Summary: thumbor runs on tornado which is async, but thumbor_aws uses boto for HTTP to S3 which is blocking, causing awful performance. This patch uses botornado, a tornado async partial port of boto, to make S3 requests non-blocking.

Test material at https://gist.github.com/pda/5dcf54410b696072fcbd


In production we have an ELB load balancer in front of two EC2 instances each running thumbor + thumbor_aws inside a docker container. We had reports of blank images, and then hints that they were HTTP 503 errors, but it wasn't reproducible. Our logs shows that the load balancer frequently marked the instances as unhealthy, meaning they've failed to respond to health checks. The thumbor processes had been up and running for many days, and had no suspicious log entries.

I suspected that something was blocking inside the thumbor event loop, and narrowed it down to boto in thumbor_aws.

Using a controllably slow backend behind thumbor_aws, I was able to demonstrate the single-concurrency of boto by showing slow requests blocking fast requests:

$ ./race.rb http://localhost:8888/unsafe/sleep/{10,1000,100,10000,500}

dispatching http://localhost:8888/unsafe/sleep/10
dispatching http://localhost:8888/unsafe/sleep/1000
dispatching http://localhost:8888/unsafe/sleep/100
dispatching http://localhost:8888/unsafe/sleep/10000
dispatching http://localhost:8888/unsafe/sleep/500
78 ms for http://localhost:8888/unsafe/sleep/10
2030 ms for http://localhost:8888/unsafe/sleep/1000
2168 ms for http://localhost:8888/unsafe/sleep/100
22108 ms for http://localhost:8888/unsafe/sleep/10000
23051 ms for http://localhost:8888/unsafe/sleep/500

(The double times in the benchmarks are due to the test server sleeping n milliseconds for both the HEAD and GET requests. Perhaps that HEAD request should be eliminated from thumbor_aws, but that's a separate issue).

Note that the responses are serialized into the order they arrived, and that the time is roughly cumulative — the final request which should have taken 1,000ms took 23,000ms.

Switching s3_loader.py from boto to botornado fixes that:

$ ./race.rb http://localhost:8888/unsafe/sleep/{10,1000,100,10000,500}

dispatching http://localhost:8888/unsafe/sleep/10
dispatching http://localhost:8888/unsafe/sleep/1000
dispatching http://localhost:8888/unsafe/sleep/100
dispatching http://localhost:8888/unsafe/sleep/10000
dispatching http://localhost:8888/unsafe/sleep/500
89 ms for http://localhost:8888/unsafe/sleep/10
228 ms for http://localhost:8888/unsafe/sleep/100
1024 ms for http://localhost:8888/unsafe/sleep/500
2028 ms for http://localhost:8888/unsafe/sleep/1000
20024 ms for http://localhost:8888/unsafe/sleep/10000

This fix uses botornado to do async HTTP requests to S3 in a way that fits the tornado event loop model, without blocking other requests.

I've only applied it to the loaders module, as that's the only component that we're using. It should probably also be applied to the storages and result_storages modules. Perhaps we can ship this fix first, though?

@willtrking
Copy link
Owner

Thanks for the patch! My only concern with merging this in is backwards compatibility of botornado with boto. Aside from installing botornado is this a drop in replacement for the existing plugin, or are there additional steps needed for installation?

A good solution may be to have a config option allowing people to switch between the two.

@pda
Copy link
Author

pda commented Nov 18, 2014

I don't write much python, so I've barely used boto, and I'd never heard of botornado until I found it trying to fix this. I couldn't comment on backwards compatibility except that “it seems to work” :)

I'm not sure about the config option — I think using a blocking client like boto in an async environment like tornado is a serious bug. I would suggest tagging the current version as v1.0.0 with a big warning in the README about it blocking, and then integrate botornado and release it as v2.0.0. That way people can deliberately upgrade, or stay on the blocking version if it's working for them and they're concerned about incompatibility.

@fluke
Copy link

fluke commented Feb 22, 2015

I think this needs to be implemented. Could you merge this in.

@pda
Copy link
Author

pda commented Feb 22, 2015

I discovered that this wasn't loading AWS credentials from the environment in the expected way.
In the end, I wrote https://github.com/99designs/thumbor_botornado which has a very simple botornado loader and also a loader which can sit in front of that to handle non-S3 URLs separately. 99designs.com has since been using it to serve millions of images.

So — I'm not sure about this PR, it might need more work, but I'm not using it anymore.

@fluke
Copy link

fluke commented Feb 23, 2015

@pda Well as of now I'm simply using the HTTP loader with URL signing. As I was coming across this issue #19 with multiple buckets. What are the benefits of using the AWS bucket method?

@pda
Copy link
Author

pda commented Feb 23, 2015

@kartikluke If the original images are in a public readable S3 bucket, then the HTTP loader is perfect. But if the original images aren't public readable, then S3 credentials are required… as far as I know the HTTP loader doesn't know how to do that…?

@fluke
Copy link

fluke commented Feb 23, 2015

Ah that's correct. Sorry. Got confused.

On Tue 24 Feb, 2015 00:08 Paul Annesley [email protected] wrote:

@kartikluke https://github.com/kartikluke If the original images are in
a public readable S3 bucket, then the HTTP loader is perfect. But if the
original images aren't public readable, then S3 credentials are
required… as far as I know the HTTP loader doesn't know how to do that…?


Reply to this email directly or view it on GitHub
#14 (comment).

dhardy92 added a commit to dhardy92/thumbor_aws that referenced this pull request Mar 24, 2015
* hashed path in S3 prevent to serach by prefix in keys. Without real advantage (not a real FS)
* add vows based storage tests
* add .gitignore for *.pyc
dhardy92 added a commit to dhardy92/thumbor_aws that referenced this pull request Mar 24, 2015
* hashed path in S3 prevent to serach by prefix in keys. Without real advantage (not a real FS)
* add vows based storage tests
* add .gitignore for *.pyc
dhardy92 added a commit to dhardy92/thumbor_aws that referenced this pull request Mar 25, 2015
* hashed path in S3 prevent to serach by prefix in keys. Without real advantage (not a real FS)
* add vows based storage tests
* add .gitignore for *.pyc
@pda
Copy link
Author

pda commented Apr 14, 2015

Closing this; as I mentioned I wrote https://github.com/99designs/thumbor_botornado which we're using. Feel free to use the code from this PR or that 99designs/thumbor_botornado repo, though.

@pda pda closed this Apr 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants