Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request - Please provide an OpenSearch destination #11738

Open
ryn9 opened this issue Mar 8, 2022 · 27 comments
Open

Enhancement request - Please provide an OpenSearch destination #11738

ryn9 opened this issue Mar 8, 2022 · 27 comments
Labels
domain: sinks Anything related to the Vector's sinks provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality.

Comments

@ryn9
Copy link

ryn9 commented Mar 8, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Elastic made it latest libraries not work with OpenSearch
Please make a dedicated OpenSearch destination using the latest OpenSearch libraries.

Attempted Solutions

n/a

Proposal

Elastic made it latest libraries not work with OpenSearch
Please make a dedicated OpenSearch destination using the latest OpenSearch libraries.

References

Please have a look the blog post about client libraries available for OpenSearch: https://opensearch.org/blog/community/2021/08/community-clients/

Version

n/a

@ryn9 ryn9 added the type: feature A value-adding code addition that introduce new functionality. label Mar 8, 2022
@jszwedko jszwedko added the domain: sinks Anything related to the Vector's sinks label Mar 9, 2022
@protochron
Copy link
Contributor

protochron commented Jun 6, 2022

One way to get this going in the meantime is to use the AWS sigv4-proxy to sign the requests to OpenSearch. It's the same workaround outlined in #6204, so it's not ideal but it does work.

@jszwedko jszwedko added the provider: aws Anything `aws` service provider related label Jun 6, 2022
@zamazan4ik
Copy link
Contributor

@jszwedko I suppose we need some discussion here.

OpenSearch is getting more and more popular in the community for different reasons, so I think OpenSearch is kinda important for Vector.

If we agree to support OpenSearch, the main remaining question here - how it should be implemented in Vector. I suggest create dedicated source/sink for OpenSearch, even if it will share right now a lot of codebase with an existing ElasticSearch source/sink. In the future I guess OpenSearch and ElasticSearch will diverge more and more.

What do you think?

@jszwedko
Copy link
Member

jszwedko commented Nov 4, 2022

I'm open to creating a new opensearch sink. At present, I think it could largely just wrap the elasticsearch sink and set some defaults. If the HTTP APIs between the two products diverge more, than we could split up the implementation more, but I think just wrapping would be sufficient for now.

@ryn9
Copy link
Author

ryn9 commented Nov 4, 2022

Forgive me for not knowing what's under the hood - but please know that elastic supplied libraries after 7.14 have be modified to specifically not work with OpenSearch. So if you are using these libraries - it would be best to start diverging sooner than later.

@zamazan4ik
Copy link
Contributor

@jszwedko as @ryn9 mentioned earlier, it is not possible since Elasticsearch team explicitly broken OpenSearch support in their libraries.

@jszwedko
Copy link
Member

jszwedko commented Nov 4, 2022

Forgive me for not knowing what's under the hood - but please know that elastic supplied libraries after 7.14 have be modified to specifically not work with OpenSearch. So if you are using these libraries - it would be best to start diverging sooner than later.

Ah, yes, meant to mention in my other comment that we don't rely on any SDKs for Elasticsearch but just make HTTP calls directly using the hyper crate.

@zamazan4ik
Copy link
Contributor

Hmmm, in this case I tend to agree with @jszwedko approach to create opensearch sink which just wraps with some defaults an existing elasticsearch sink. Later, if will be a need, we will be able to divide opensearch and elasticsearch sinks step by step.

@ryn9
Copy link
Author

ryn9 commented Nov 4, 2022

Eventually there will be feature divergence for ingestion. When that the time comes - the OpenSearch project does maintain this library https://github.com/opensearch-project/opensearch-rs. The OpenSearch maintainers would love to hear your feedback on it and/or have you speak at one of their meetups: https://www.meetup.com/opensearch/

EDIT:
And that lib has aws sigv4 built in :)

@zamazan4ik
Copy link
Contributor

@ryn9 you are right - eventually these projects will diverge a lot. We right now could start with already implemented elasticsearch sink and just wrap it as opensearch. Later, step by step, we can rewrite it with Opensearch-specific details in mind. E.g. start to use opensearch-rs in opensearch sink instead of raw hyper-based requests.

@zamazan4ik
Copy link
Contributor

@ryn9 by the way, did you already try to use elasticsearch sink with Opensearch installation? Did you notice any problem?

@ryn9
Copy link
Author

ryn9 commented Nov 7, 2022

Apologies - I have not tried for a while - but I believe it was working when I last tested against an OpenSearch 1.x release

@protochron
Copy link
Contributor

I use the existing elasticsearch sink with OpenSearch 1.x in AWS and it works fine, with the caveat that I handle signing separately

@ryn9
Copy link
Author

ryn9 commented Dec 19, 2022

@jszwedko when the elasticsearch output code is updated is it also being tested against opensearch?

I see that in 0.26 the following change was made to vector:
-The elasticsearch sink now supports an api_version option to specify the API version the targeted Elasticsearch instance exposes. This replaces and deprecates the suppress_type_name option which was previously used for controlling Elasticsearch version compatibility.
-It can be set to auto to attempt to automatically determine the Elasticsearch version by querying the Elasticsearch version endpoint.

Opensearch 2.x mimics the Elasticsearch 7.x line protocol - but like Elasticsearch 8.x - will not accept _type.
I am not sure what other changes are applied when using Elasticsearch 8.x line protocol (ie - if it is just _type removal), but at this point, without testing, I cannot be sure 0.26 is compatible with Opensearch 2.x

@jszwedko
Copy link
Member

@jszwedko when the elasticsearch output code is updated is it also being tested against opensearch?

I see that in 0.26 the following change was made to vector: -The elasticsearch sink now supports an api_version option to specify the API version the targeted Elasticsearch instance exposes. This replaces and deprecates the suppress_type_name option which was previously used for controlling Elasticsearch version compatibility. -It can be set to auto to attempt to automatically determine the Elasticsearch version by querying the Elasticsearch version endpoint.

Opensearch 2.x mimics the Elasticsearch 7.x line protocol - but like Elasticsearch 8.x - will not accept _type. I am not sure what other changes are applied when using Elasticsearch 8.x line protocol (ie - if it is just _type removal), but at this point, without testing, I cannot be sure 0.26 is compatible with Opensearch 2.x

Setting api_version to 7 should suppress _type as well. Only api_version 6 should send it.

I see OpenSearch has a docker image, https://hub.docker.com/r/opensearchproject/opensearch, so it seemingly wouldn't be too hard to add it to our integration tests to ensure continued compatibility.

@ryn9
Copy link
Author

ryn9 commented Dec 19, 2022

For anyone stumbling upon this thread ... writing back to confirm that 0.26 is suppressing _type to opensearch 2.3, and successfully pushing messages.

live with a config that looks like this..

  sink_opensearch:
    type: elasticsearch
    inputs:
      - transform_remap_for_opensearch
    compression: gzip
    healthcheck: false
    endpoints:
      - "https://<CLUSTERNAME>.<REGION>.es.amazonaws.com:443"
    auth:
      strategy: "basic"
      user: "<USERNAME>"
      password: "<PASSWORD>"
    distribution:
      retry_max_duration_secs: 300
    bulk:
      index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

@nike21oct
Copy link

Hi, I have a AWS opensearch cluster on AWS which is having fine grained access control enabled which is having credential and EKS cluster which is having vector installed so implemented the below syntax in configmap of vector to get the logs on opensearch. Is my syntax is correct for this implementation as I cannot able to see any index or logs on opensearch.
Need your support on this.
sinks:
elasticsearch:
type: elasticsearch
inputs: [kubernetes_logs]
healthcheck: false
endpoints:
auth:
strategy: "basic"
user: ""
password: ""

@nike21oct
Copy link

is it possible to pass the username and password in configmap as an secret , because it is not good idea to keep the credential directly into configmap

@jszwedko
Copy link
Member

is it possible to pass the username and password in configmap as an secret , because it is not good idea to keep the credential directly into configmap

I think you can use normal Kubernetes secrets mechanisms unless I'm missing something.

@nike21oct
Copy link

I have another question
i have a vector installed in EKS cluster which is sending logs to AWS opensearch but when i see index in opensearch it is showing index only with name vector, so is there any way to configure index in config file of vector and the same we can see in opensearch.
Can you please help me into this?

@ryn9
Copy link
Author

ryn9 commented Apr 6, 2023

I have another question i have a vector installed in EKS cluster which is sending logs to AWS opensearch but when i see index in opensearch it is showing index only with name vector, so is there any way to configure index in config file of vector and the same we can see in opensearch. Can you please help me into this?

https://vector.dev/docs/reference/configuration/sinks/elasticsearch/#bulk.index

@nike21oct
Copy link

Hello , i have configured vector in kubernetes cluster and it is taking kubernetes logs as a source and sinks as a elasticsearch, so just wanted to know is logs transferring to elasticsearch instantly ?
As i cannot see latest logs of my applications it is showing old logs for one month old
can you please help me into this?

@bruceg
Copy link
Member

bruceg commented Apr 24, 2023

Technically, Vector won't be sending it instantly, but it should be close enough given the above configuration. The default batch timeout for elasticsearch is just one second, after which it would send anything that has been queued up.

If you run vector with debugging enabled, do you see requests being sent out to the elasticsearch server?

@nike21oct
Copy link

For anyone stumbling upon this thread ... writing back to confirm that 0.26 is suppressing _type to opensearch 2.3, and successfully pushing messages.

live with a config that looks like this..

  sink_opensearch:
    type: elasticsearch
    inputs:
      - transform_remap_for_opensearch
    compression: gzip
    healthcheck: false
    endpoints:
      - "https://<CLUSTERNAME>.<REGION>.es.amazonaws.com:443"
    auth:
      strategy: "basic"
      user: "<USERNAME>"
      password: "<PASSWORD>"
    distribution:
      retry_max_duration_secs: 300
    bulk:
      index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

can we create multiple index from this like:
bulk:
index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"
index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

@ryn9
Copy link
Author

ryn9 commented May 11, 2023

can we create multiple index from this like: bulk: index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V" index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

You need to create multiple sinks, each with their own index definition

@spencergilbert
Copy link
Contributor

can we create multiple index from this like: bulk: index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V" index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

As @ryn9 pointed out, that's not a valid configuration. However I don't understand what you're trying to do with the two indices that the template fields don't already do. Any unique set of field1, field2, field3 (and the timestamp) will create a new index.

Additionally, please open a new Discussion for questions unrelated to the original issue - thanks.

@spencergilbert
Copy link
Contributor

Adding this issue as a difference between OS and ES we need to handle:
#17690

@sandervandegeijn
Copy link

I'm using the elastic sink with the v7 api definition. It does work, but we would welcome an specific opensearch sink as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks provider: aws Anything `aws` service provider related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

8 participants