-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow non-AWS endpoints #100
Conversation
b1886d4
to
714ccf8
Compare
lib/logstash/outputs/s3.rb
Outdated
@@ -147,6 +152,7 @@ def aws_s3_config | |||
|
|||
def full_options | |||
aws_options_hash.merge(signature_options) | |||
aws_options_hash.merge(endpoint_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.merge
returns a new hash rather than modifying aws_options_hash
in-place. I'm not a Rubyist so I can't say what the prettiest way to fix this is but aws_options_hash.merge(signature_options).merge(endpoint_options)
should get the tests passing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This PR needs to track AWS v2 library changes after #102 merges. |
714ccf8
to
83fab47
Compare
Addressed all outstanding review comments. |
lib/logstash/outputs/s3.rb
Outdated
@@ -84,6 +86,9 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base | |||
# S3 bucket | |||
config :bucket, :validate => :string | |||
|
|||
# endpoint | |||
config :endpoint, :validate => :string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you'd want to reject a config where both endpoint and region are specified. Right? Can you add that?
Also, can you add more detailed docs for this new setting? For users only familiar with AWS, I imagine this setting may be confusing. Further, describing the format (a url) would be good as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
region: it may be valid to provide both a region and an endpoint for some providers. v4 signing includes the region in its signature.
docs: added blurb and example.
83fab47
to
e5caf36
Compare
Rebased onto master and addressed all outstanding review comments. |
Any news here? We'd love to start using it! If you need any help please give me a shout. |
Rebased onto master. @ph What remains to merge this pull request? |
@jordansissel is there anything missing to update your code review status ? |
@andrewgaul can you separate the |
@robbat2 Done. Can you test the latest commit? @jordansissel @ph What remains to merge this pull request? |
@andrewgaul The change seems OK to me, can we add some unit tests to make sure we don't break it in a future release. Also this branch seems to broke all the tests see #100 ? Can you verify that and I will do a review of the code. As a side node, this change enable to use third party S3 implementation, but for now we have no |
@ph for the test suite, do you have a way to handle credentials and keep them private? If so, I can set up a test account for you on a large public Ceph deployment. |
@robbat2 We were currently not running the integration suite on Travis we usually do it locally when we develop a feature, I know that Travis can keep credentials hidden in the environment, but I haven't tried to implement it yet. If you want to test the current suite with Celph, you can look at the content of this file And we you run the test with the appropriate tag like this:
|
@@ -106,6 +109,13 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base | |||
# S3 bucket | |||
config :bucket, :validate => :string, :required => true | |||
|
|||
# Specify a custom endpoint for use with non-AWS S3 implementations, e.g., | |||
# Ceph. Provide a URL in the format http://127.0.0.1:8080/ | |||
config :endpoint, :validate => :string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "endpoint" the best name?
Is http
the right scheme for the url? Citing existing exsample:
s3cmd
allowss3://
uris3cmd
config hashost_base
andhost_bucket
for setting hostnames different from AWS S3.- hadoop uses
s3a://
(and olders3n
ands3
) - boto allows users to set the
host
for the endpoint, but not a URL (based on my probably-incomplete research)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Endpoint is the name in the AWS documentation, but it's NOT a URL, it's a host and port.
- Eg
boto.s3.S3Connection
has ahost
parameter that takes an optional port, eg127.0.0.1:8080
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some users want HTTP on trusted networks and others HTTPS on the wider Internet.
lib/logstash/outputs/s3.rb
Outdated
config :endpoint, :validate => :string | ||
|
||
# When false, specify the bucket in the subdomain. When true, specify the bucket in the path. | ||
config :force_path_style, :validate => :bool, :default => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"force path style" feels like the wrong name for this setting.
It feels weird to have users specify the bucket in the path. Here are some citations of existing examples:
- hadoop uses the "host" part of a URI for the bucket, such as
s3a://bucket-name/...
- s3cmd lets you specify how to take a bucket name and turn it into a hostname (host_bucket)
The name of the setting "force path style" doesn't indicate clearly enough what it does. We need a more informative name for this setting. Alternately, we can step back and look at the problem space from a different perspective and maybe come up with a solution that doesn't need this kind of flag.
:bool
is not a correct validation. Did you mean :boolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boto uses CallingFormat
, Boto3 uses addressing_style
(https://github.com/boto/botocore/blob/develop/botocore/client.py#L156).
More significantly, with Boto3 & AWSv4 signatures, it almost always uses path-style, and suffer the 301 redirect in some cases: https://github.com/boto/botocore/blob/develop/botocore/utils.py#L649
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "path style" thing seems like an easy way to confuse users. Is it necessary? The linked boto code seems to transform user input (a path with a bucket as the first part) and turn it into a dns name. Am I reading this right? If so, why do we need two ways to specify an s3 path? Having multiple ways feels like an easy way to confuse users and make troubleshooting harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up agreeing that this "path style" thing is necessary, then I propose naming it in a way that more strongly indicates the actual behavior ("path style" is not an obvious behavior). For example, bucket_name_in_path
instead.
My first preference would be to remove the setting entirely, so let's figure out if we can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree the naming is terrible, but it's inherited from the AWS documentation.
There are two inputs: S3-server-address (host+port
), bucket.
There are two common formats:
- Path:
http://host:port/bucket/
,https://host:port/bucket/
** AWS documentation uses: Path-style in AWS documentation. - Subdomain:
http://bucket.host:port/
,https://bucket.host:port/
** AWS documentation uses both: Subdomain calling format and Virtual Hosted–Style - (and some more exotic variations in calling format that are not relevant to us)
To use the Subdomain format, there are a LOT of conditions:
- If there is a DNS wildcard for
*.host
AND the bucket name is DNS-compatible [lowercase-only, no characters invalid in DNS] - If there is SSL on the port [which might NOT be port 443], there ALSO needs to be a wildcard SSL certificate and the bucket name ALSO must not contain periods [which would cause the cert to never match [or you have to write custom HTTPS hostname validation].
If you're running something like minio or a small ceph install as a local S3 server, and thus have an IP+port only for connecting, eg http://127.0.0.1:8080/, then you MUST use path-style access. You cannot use Subdomain access at all. This is also relevant if you don't have a DNS wildcard.
if you're connecting to a larger S3 implementation, including AWS & multi-region/federated Ceph deployments, and the protocol ALLOWS you to be issued a 307 redirection to another host, usually one where your data is actually sorted (this can be seen on AWS if you create an EU s3 bucket, then try to access it via the US endpoints): this redirect can add significant latency in some cases, so hitting the correct host right away is important, and is more common in path-style access than subdomain.
This PR has needs unit tests and most importantly needs integration tests. Because this PR adds commentary indicating things like CEPH will work with this plugin, and given this is a plugin is on our support matrix, I want to resolve the lack of integration tests against CEPH before we can merge this. Next steps:
I've also added some in-line comments about the new config settings which I'd like to work towards resolving. |
CHANGELOG.md
Outdated
@@ -1,3 +1,6 @@ | |||
## 4.0.6 | |||
- Support for non-AWS endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds new functionality, so instead of a patch/bugfix version bump (4.0.5 -> 4.0.6) this needs to be a minor bump (4.0.5 -> 4.1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Let's also get the documentation on this plugin updated to indicate that non-AWS S3 endpoints are a best effort as we (logstash team) have no subject matter expertise in CEPH or any other S3-compatible system. |
I addressed the trivial comments but I lack both interest and ability to test this further as I no longer use logstash. Happy for you to close this PR or merge it if you want to address a popular user request. |
@andrewgaul Thank you for your efforts! I think @ph and I can carry it forward from here. |
I just wanted to say this feature is extremely useful! I have managed to successfully test this commit with logstash 5.3.0 uploading to a local ceph cluster. Thanks @andrewgaul |
@av3ri what is the endpoint parameter name? If I use
I have this error:
|
Ok, this commit isn't merged :( |
@harobed
Hope this helps. |
@av3ri thanks Now I have this issue: With :
And:
I have this error:
What is my mistake? Best regards, |
Fixed with this option:
|
Do you have an ETA for merging ? |
Could you maybe merge the newest master (4.0.13) into this PR if there is still long before it could be merged to master? |
This is useful for local Ceph S3 deployments. Fixes logstash-plugins#10. Fixes logstash-plugins#65.
bc3035b
to
a46f4d8
Compare
@fulder I rebased the change but lack insight into if or when this might merge. |
@gaul thanks! Im currently using this PR and there were some bugs fixed at master not present in this PR before the rebase. Looking forward to see it merged to master in the future ;) |
@acchen97 - If this gets merged, because we are lacking staffing to support CEPH and other technologies, we'll need to update the support matrix to specifically exclude the use of this new |
@jordansissel I'm good with that. |
Outstanding items needing further review:
If you have questions or want guidance on any of the above items, we can help! :) |
@RemiDesgrange Please see my above comment for things missing, but necessary, before we can merge this. |
I noticed the custom-endpoint branch is no more. Does that mean the merge is to happen soon-ish? :) |
@@ -106,6 +109,13 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base | |||
# S3 bucket | |||
config :bucket, :validate => :string, :required => true | |||
|
|||
# Specify a custom endpoint for use with non-AWS S3 implementations, e.g., | |||
# Ceph. Provide a URL in the format http://127.0.0.1:8080/ | |||
config :endpoint, :validate => :string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be available through the mixin all the aws like plugins use
So no need to implement it here.
Only the force_path_style
setting below is s3 specific, can you respin this PR to only add that setting?
endpoint customization has been done in https://github.com/logstash-plugins/logstash-mixin-aws/blob/master/lib/logstash/plugin_mixins/aws_config/generic.rb#L29-L30 |
This is useful for local Ceph S3 deployments. Fixes #10. Fixes #65.