Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-AWS endpoints #100

Closed
wants to merge 1 commit into from
Closed

Conversation

gaul
Copy link

@gaul gaul commented Sep 7, 2016

This is useful for local Ceph S3 deployments. Fixes #10. Fixes #65.

@gaul gaul force-pushed the custom-endpoint branch 3 times, most recently from b1886d4 to 714ccf8 Compare September 12, 2016 17:08
@@ -147,6 +152,7 @@ def aws_s3_config

def full_options
aws_options_hash.merge(signature_options)
aws_options_hash.merge(endpoint_options)
Copy link

@mattbriancon mattbriancon Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.merge returns a new hash rather than modifying aws_options_hash in-place. I'm not a Rubyist so I can't say what the prettiest way to fix this is but aws_options_hash.merge(signature_options).merge(endpoint_options) should get the tests passing again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gaul
Copy link
Author

gaul commented Sep 19, 2016

This PR needs to track AWS v2 library changes after #102 merges.

@gaul
Copy link
Author

gaul commented Nov 11, 2016

Addressed all outstanding review comments.

@@ -84,6 +86,9 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base
# S3 bucket
config :bucket, :validate => :string

# endpoint
config :endpoint, :validate => :string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you'd want to reject a config where both endpoint and region are specified. Right? Can you add that?

Also, can you add more detailed docs for this new setting? For users only familiar with AWS, I imagine this setting may be confusing. Further, describing the format (a url) would be good as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region: it may be valid to provide both a region and an endpoint for some providers. v4 signing includes the region in its signature.

docs: added blurb and example.

@gaul
Copy link
Author

gaul commented Jan 4, 2017

Rebased onto master and addressed all outstanding review comments.

@marcinwyszynski
Copy link

Any news here? We'd love to start using it! If you need any help please give me a shout.

@gaul
Copy link
Author

gaul commented Jan 27, 2017

Rebased onto master.

@ph What remains to merge this pull request?

@sboulkour
Copy link

sboulkour commented Jan 27, 2017

@jordansissel is there anything missing to update your code review status ?

@robbat2
Copy link

robbat2 commented Jan 27, 2017

@andrewgaul can you separate the force_path_style from the endpoint usage? It is not always going to be required even with Ceph, and in some cases will generate a needless redirect (to the subdomain calling format).

@gaul
Copy link
Author

gaul commented Feb 2, 2017

@robbat2 Done. Can you test the latest commit?

@jordansissel @ph What remains to merge this pull request?

@ph
Copy link
Contributor

ph commented Feb 2, 2017

@andrewgaul The change seems OK to me, can we add some unit tests to make sure we don't break it in a future release. Also this branch seems to broke all the tests see #100 ? Can you verify that and I will do a review of the code.

As a side node, this change enable to use third party S3 implementation, but for now we have no
immediate plan to include theses implementation in our test cycle and we will need the community to support and tests theses changes.

@robbat2
Copy link

robbat2 commented Feb 2, 2017

@ph for the test suite, do you have a way to handle credentials and keep them private? If so, I can set up a test account for you on a large public Ceph deployment.

@ph
Copy link
Contributor

ph commented Feb 3, 2017

@robbat2 We were currently not running the integration suite on Travis we usually do it locally when we develop a feature, I know that Travis can keep credentials hidden in the environment, but I haven't tried to implement it yet.

If you want to test the current suite with Celph, you can look at the content of this file
https://github.com/logstash-plugins/logstash-output-s3/blob/master/spec/supports/helpers.rb, this setup the general configuration used by all the integration suite.

And we you run the test with the appropriate tag like this:

bundle exec rspec --tag integration

@@ -106,6 +109,13 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base
# S3 bucket
config :bucket, :validate => :string, :required => true

# Specify a custom endpoint for use with non-AWS S3 implementations, e.g.,
# Ceph. Provide a URL in the format http://127.0.0.1:8080/
config :endpoint, :validate => :string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "endpoint" the best name?

Is http the right scheme for the url? Citing existing exsample:

  • s3cmd allows s3:// uri
  • s3cmd config has host_base and host_bucket for setting hostnames different from AWS S3.
  • hadoop uses s3a:// (and older s3n and s3)
  • boto allows users to set the host for the endpoint, but not a URL (based on my probably-incomplete research)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Endpoint is the name in the AWS documentation, but it's NOT a URL, it's a host and port.
  • Eg boto.s3.S3Connection has a host parameter that takes an optional port, eg 127.0.0.1:8080.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users want HTTP on trusted networks and others HTTPS on the wider Internet.

config :endpoint, :validate => :string

# When false, specify the bucket in the subdomain. When true, specify the bucket in the path.
config :force_path_style, :validate => :bool, :default => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"force path style" feels like the wrong name for this setting.

It feels weird to have users specify the bucket in the path. Here are some citations of existing examples:

  • hadoop uses the "host" part of a URI for the bucket, such as s3a://bucket-name/...
  • s3cmd lets you specify how to take a bucket name and turn it into a hostname (host_bucket)

The name of the setting "force path style" doesn't indicate clearly enough what it does. We need a more informative name for this setting. Alternately, we can step back and look at the problem space from a different perspective and maybe come up with a solution that doesn't need this kind of flag.

:bool is not a correct validation. Did you mean :boolean ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boto uses CallingFormat, Boto3 uses addressing_style (https://github.com/boto/botocore/blob/develop/botocore/client.py#L156).

More significantly, with Boto3 & AWSv4 signatures, it almost always uses path-style, and suffer the 301 redirect in some cases: https://github.com/boto/botocore/blob/develop/botocore/utils.py#L649

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "path style" thing seems like an easy way to confuse users. Is it necessary? The linked boto code seems to transform user input (a path with a bucket as the first part) and turn it into a dns name. Am I reading this right? If so, why do we need two ways to specify an s3 path? Having multiple ways feels like an easy way to confuse users and make troubleshooting harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up agreeing that this "path style" thing is necessary, then I propose naming it in a way that more strongly indicates the actual behavior ("path style" is not an obvious behavior). For example, bucket_name_in_path instead.

My first preference would be to remove the setting entirely, so let's figure out if we can do that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree the naming is terrible, but it's inherited from the AWS documentation.

There are two inputs: S3-server-address (host+port), bucket.

There are two common formats:

  • Path: http://host:port/bucket/, https://host:port/bucket/
    ** AWS documentation uses: Path-style in AWS documentation.
  • Subdomain: http://bucket.host:port/, https://bucket.host:port/
    ** AWS documentation uses both: Subdomain calling format and Virtual Hosted–Style
  • (and some more exotic variations in calling format that are not relevant to us)

To use the Subdomain format, there are a LOT of conditions:

  • If there is a DNS wildcard for *.host AND the bucket name is DNS-compatible [lowercase-only, no characters invalid in DNS]
  • If there is SSL on the port [which might NOT be port 443], there ALSO needs to be a wildcard SSL certificate and the bucket name ALSO must not contain periods [which would cause the cert to never match [or you have to write custom HTTPS hostname validation].

If you're running something like minio or a small ceph install as a local S3 server, and thus have an IP+port only for connecting, eg http://127.0.0.1:8080/, then you MUST use path-style access. You cannot use Subdomain access at all. This is also relevant if you don't have a DNS wildcard.

if you're connecting to a larger S3 implementation, including AWS & multi-region/federated Ceph deployments, and the protocol ALLOWS you to be issued a 307 redirection to another host, usually one where your data is actually sorted (this can be seen on AWS if you create an EU s3 bucket, then try to access it via the US endpoints): this redirect can add significant latency in some cases, so hitting the correct host right away is important, and is more common in path-style access than subdomain.

@jordansissel
Copy link
Contributor

jordansissel commented Feb 24, 2017

This PR has needs unit tests and most importantly needs integration tests.

Because this PR adds commentary indicating things like CEPH will work with this plugin, and given this is a plugin is on our support matrix, I want to resolve the lack of integration tests against CEPH before we can merge this.

Next steps:

  • Add unit tests to verify the new behavior
  • Add integration tests to verify this plugin works correctly against CEPH

I've also added some in-line comments about the new config settings which I'd like to work towards resolving.

CHANGELOG.md Outdated
@@ -1,3 +1,6 @@
## 4.0.6
- Support for non-AWS endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds new functionality, so instead of a patch/bugfix version bump (4.0.5 -> 4.0.6) this needs to be a minor bump (4.0.5 -> 4.1.0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jordansissel
Copy link
Contributor

Let's also get the documentation on this plugin updated to indicate that non-AWS S3 endpoints are a best effort as we (logstash team) have no subject matter expertise in CEPH or any other S3-compatible system.

@gaul gaul force-pushed the custom-endpoint branch from ca79415 to bc3035b Compare April 5, 2017 21:12
@gaul
Copy link
Author

gaul commented Apr 5, 2017

I addressed the trivial comments but I lack both interest and ability to test this further as I no longer use logstash. Happy for you to close this PR or merge it if you want to address a popular user request.

@jordansissel
Copy link
Contributor

@andrewgaul Thank you for your efforts! I think @ph and I can carry it forward from here.

@av3ri
Copy link

av3ri commented Apr 20, 2017

I just wanted to say this feature is extremely useful! I have managed to successfully test this commit with logstash 5.3.0 uploading to a local ceph cluster. Thanks @andrewgaul

@harobed
Copy link

harobed commented Apr 28, 2017

@av3ri what is the endpoint parameter name?

If I use endpoint with logstash:5.3:

output {
    s3 {
        access_key_id => "..."
        secret_access_key => "..."
        endpoint => "server-elk:9000"
        bucket => "logstash-output"
        size_file => 2048
        time_file => 1
    }
}

I have this error:

16:07:43.667 [LogStash::Runner] ERROR logstash.outputs.s3 - Unknown setting 'endpoint' for s3

@harobed
Copy link

harobed commented Apr 28, 2017

Ok, this commit isn't merged :(

@harobed
Copy link

harobed commented Apr 28, 2017

@av3ri
Copy link

av3ri commented Apr 30, 2017

@harobed
Yea this commit isn't merged but you can still get it working:

  1. Clone the plugin and switch to the custom-endpoint branch.
  2. Install the plugin into logstash by editing the Gemfile in logstash home to point to your cloned plugin path (gem "logstash-output-s3", :path => "/your/local/logstash-output-s3") and running bin/logstash-plugin install --no-verify.
  3. The endpoint setting in the conf refers to a ceph endpoint like a rados gateway.

Hope this helps.

@harobed
Copy link

harobed commented Apr 30, 2017

@av3ri thanks

Now I have this issue:

With :

output {
    s3 {
        access_key_id => "..."
        secret_access_key => "..."
        region => "us-east-1"
        endpoint => "http://server-elk:9000"
        bucket => "logstash-output"
        size_file => 2048
        time_file => 1
    }
}

And:

$ mc policy myminio/logstash-output
Access permission for `myminio/logstash-output` is `public

I have this error:

logstash_1       | 17:33:31.867 [[main]-pipeline-manager] ERROR logstash.outputs.s3 - Error validating bucket write permissions! {:message=>"initialize: name or service not known", :class=>"Seahorse::Client::NetworkingError"}
logstash_1       | 17:33:31.875 [[main]-pipeline-manager] ERROR logstash.pipeline - Error registering plugin {:plugin=>"#<LogStash::OutputDelegator:0x658e4bec @namespaced_metric=#<LogStash::Instrument::NamespacedMetric:0x52d39b2f @metric=#<LogStash::Instrument::Metric:0x38c9d17c @collector=#<LogStash::Instrument::Collector:0x7b32f93d @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x6cf40c9c @store=#<Concurrent::Map:0x22b0e8ce @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x54849fd5>, @fast_lookup=#<Concurrent::Map:0x38937f69 @default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs, :\"56b33192987085b38406625cf59d63a3515214e5-2\"]>, @metric=#<LogStash::Instrument::NamespacedMetric:0x519c74c4 @metric=#<LogStash::Instrument::Metric:0x38c9d17c @collector=#<LogStash::Instrument::Collector:0x7b32f93d @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x6cf40c9c @store=#<Concurrent::Map:0x22b0e8ce @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x54849fd5>, @fast_lookup=#<Concurrent::Map:0x38937f69 @default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs]>, @logger=#<LogStash::Logging::Logger:0x2dd6879 @logger=#<Java::OrgApacheLoggingLog4jCore::Logger:0x2b374da4>>, @strategy=#<LogStash::OutputDelegatorStrategies::Shared:0x3cb62422 @output=<LogStash::Outputs::S3 access_key_id=>\"XH1ZMECXD3BI27S0LUPN\", secret_access_key=>\"F6Ar1JQKouAMrR/M0fHAhDFpVH8uFYM0noSq97OQ\", region=>\"us-east-1\", endpoint=>\"http://server-elk:9000\", bucket=>\"logstash-output\", size_file=>2048, time_file=>1, id=>\"56b33192987085b38406625cf59d63a3515214e5-2\", enable_metric=>true, codec=><LogStash::Codecs::Line id=>\"line_836a06bc-6dd4-445f-b1d8-b7a43c1cf167\", enable_metric=>true, charset=>\"UTF-8\", delimiter=>\"\\n\">, workers=>1, force_path_style=>false, restore=>true, canned_acl=>\"private\", server_side_encryption=>false, server_side_encryption_algorithm=>\"AES256\", storage_class=>\"STANDARD\", temporary_directory=>\"/tmp/logstash\", upload_workers_count=>1, upload_queue_size=>2, encoding=>\"none\", rotation_strategy=>\"size_and_time\", validate_credentials_on_root_bucket=>true>>, @id=\"56b33192987085b38406625cf59d63a3515214e5-2\", @metric_events=#<LogStash::Instrument::NamespacedMetric:0x2c28362a @metric=#<LogStash::Instrument::Metric:0x38c9d17c @collector=#<LogStash::Instrument::Collector:0x7b32f93d @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x6cf40c9c @store=#<Concurrent::Map:0x22b0e8ce @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x54849fd5>, @fast_lookup=#<Concurrent::Map:0x38937f69 @default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs, :\"56b33192987085b38406625cf59d63a3515214e5-2\", :events]>, @output_class=LogStash::Outputs::S3>", :error=>"Logstash must have the privileges to write to root bucket `logstash-output`, check you credentials or your permissions."}
logstash_1       | 17:33:31.881 [[main]-pipeline-manager] ERROR logstash.agent - Pipeline aborted due to error {:exception=>#<LogStash::ConfigurationError: Logstash must have the privileges to write to root bucket `logstash-output`, check you credentials or your permissions.>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-s3-4.0.6/lib/logstash/outputs/s3.rb:211:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:8:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator.rb:41:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:257:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:268:in `register_plugins'", "org/jruby/RubyArray.java:1613:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:268:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:277:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:207:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:389:in `start_pipeline'"]}

What is my mistake?

Best regards,
Stéphane

@harobed
Copy link

harobed commented Apr 30, 2017

Fixed with this option:

output {
    s3 {
        ...
        force_path_style => true
        ...
    }
}

@RemiDesgrange
Copy link

RemiDesgrange commented Jan 30, 2018

Do you have an ETA for merging ?

@fulder
Copy link

fulder commented Feb 20, 2018

Could you maybe merge the newest master (4.0.13) into this PR if there is still long before it could be merged to master?

This is useful for local Ceph S3 deployments.  Fixes logstash-plugins#10.  Fixes logstash-plugins#65.
@gaul
Copy link
Author

gaul commented Feb 20, 2018

@fulder I rebased the change but lack insight into if or when this might merge.

@fulder
Copy link

fulder commented Feb 20, 2018

@gaul thanks! Im currently using this PR and there were some bugs fixed at master not present in this PR before the rebase. Looking forward to see it merged to master in the future ;)

@jordansissel
Copy link
Contributor

@acchen97 - If this gets merged, because we are lacking staffing to support CEPH and other technologies, we'll need to update the support matrix to specifically exclude the use of this new endpoint setting (meaning only AWS S3 is commercially supported, not any other endpoints like CEPH, etc)

@acchen97
Copy link

@jordansissel I'm good with that.

@jordansissel
Copy link
Contributor

jordansissel commented Feb 23, 2018

Outstanding items needing further review:

  • force_path_style -- it is unclear to me if this is the most appropriate name for this option. If it is, the documentation comment above it is confusing to me and does not help me (as a reader) understand when I should set this value. Is this setting required for this PR, or can it be removed and sent in a separate PR?
  • Documentation -- I don't see any docs for this issue. See docs/index.asciidoc for where you can add docs for this PR.
  • Tests -- I don't see any accompanying tests for this issue. Tests help us ensure that new features stay working. Without tests, we cannot have confidence that this feature will continue working over time.

If you have questions or want guidance on any of the above items, we can help! :)

@jordansissel
Copy link
Contributor

Do you have an ETA for merging ?

@RemiDesgrange Please see my above comment for things missing, but necessary, before we can merge this.

@bitva77
Copy link

bitva77 commented Feb 26, 2018

I noticed the custom-endpoint branch is no more. Does that mean the merge is to happen soon-ish? :)

@@ -106,6 +109,13 @@ class LogStash::Outputs::S3 < LogStash::Outputs::Base
# S3 bucket
config :bucket, :validate => :string, :required => true

# Specify a custom endpoint for use with non-AWS S3 implementations, e.g.,
# Ceph. Provide a URL in the format http://127.0.0.1:8080/
config :endpoint, :validate => :string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be available through the mixin all the aws like plugins use

So no need to implement it here.

Only the force_path_style setting below is s3 specific, can you respin this PR to only add that setting?

@jsvd
Copy link
Member

jsvd commented Mar 31, 2018

endpoint customization has been done in https://github.com/logstash-plugins/logstash-mixin-aws/blob/master/lib/logstash/plugin_mixins/aws_config/generic.rb#L29-L30
the force_path_style can be done after #173 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.