Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for external tools for managing Elasticsearch #6283

Open
5 tasks
yurishkuro opened this issue Dec 1, 2024 · 16 comments
Open
5 tasks

Remove the need for external tools for managing Elasticsearch #6283

yurishkuro opened this issue Dec 1, 2024 · 16 comments
Labels
area/storage good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch v2

Comments

@yurishkuro
Copy link
Member

PR #5797 is a good example of eliminating the need for schema initialization script for Cassandra by moving the logic directly into Jaeger binary.

We have three standalone tools for managing ES that we can also merge into Jaeger binary directly. All of them are already implemented in Go, so are fairly straightforward to embed in the Jaeger binary (fewer binary artifacts to release).

  • cmd/es-index-cleaner/ is used as a cron job to delete old ES indices. The logic can instead be executed from the Jaeger binary on a timer.
  • cmd/es-rollover/ - similar to cleaner but for rolling over indices.
  • esmapping-generator/ - just prints the index mappings to stdout (probably for manual installation)

Tasks

  • extends configuration for ES in Jaeger v2 to support additional parameters for the cleaner & rollover scripts, such as frequency of running, including enabled properties for each.
  • enable index cleaner capability in v2, update integration tests to use that
  • enable index rollover capability in v2, update integration tests to use that
  • add a sub-command es-mappings to jaeger v2 binary to print es mappings
  • update documentation accordingly to no longer reference the separate scripts
@yurishkuro yurishkuro added good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement labels Dec 1, 2024
@Rohanraj123
Copy link
Contributor

I am working on it

@akstron
Copy link
Contributor

akstron commented Dec 1, 2024

Hi @yurishkuro , for cmd/es-index-cleaner/, we would only require the filters to be added in the config, rest other fields used for client creation would already be available to us.
Below config should suffice it I believe:

type Cleaner struct {
	// Separator between date fragments.
	IndexDateSeparator string
	// Whether to filter archive indices.
	Archive bool
	// Whether to filter rollover indices.
	Rollover bool
	// Indices created before this date will be deleted.
	DeleteBeforeThisDate time.Time
}

Does it sound good?

@yurishkuro
Copy link
Member Author

IndexDateSeparator - if the main storage doesn't need this argument, why does cleaner need it?

Archive bool - archive storage is configured as separate storage in v2, so it can have its own cleaner if desired.

Rollover bool - not clear. What's the alternative?

DeleteBeforeThisDate time.Time - doesn't make sense, cleaner is a permanent periodic job, not something that takes an absolute time. We already have max span age parameter iirc, is it not sufficient to drive the behavior of the cleaner?

@akstron
Copy link
Contributor

akstron commented Dec 2, 2024

I mostly took out the filters used in the current implementation.

IndexDateSeparator - if the main storage doesn't need this argument, why does cleaner need it?

Yes, I don't think so we need it. There is DateLayout in index config, which we can probably use to find indices here.

Archive bool - archive storage is configured as separate storage in v2, so it can have its own cleaner if desired.

Makes sense. Thanks

Rollover bool - not clear. What's the alternative?

I was thinking maybe I can use the exact filter used for the rollover case, in the current implementation. But I probably have to read more about how this is being used here before commenting further. If you can provide me with some pointers, that would be really helpful? Thanks

DeleteBeforeThisDate time.Time - doesn't make sense, cleaner is a permanent periodic job, not something that takes an absolute time. We already have max span age parameter iirc, is it not sufficient to drive the behavior of the cleaner?

Was mostly referring to the time calculated before which we should delete indices. This calculation should actually be done based on a time.Duration i.e., spanMaxAge you are referring to.

Also, looks like rohan has already started working on it.

@akstron
Copy link
Contributor

akstron commented Dec 2, 2024

Also, shouldn't this be added as an extension?

@yurishkuro
Copy link
Member Author

it could be an extension but I think it's very closely coupled with the storage implementation that it makes more sense for it to be just an option in the config

@akstron
Copy link
Contributor

akstron commented Dec 3, 2024

Rollover bool - not clear. What's the alternative?

Just wanted to add this info. We won't need this config as well. Rollover indices can be treated as index-per-day pattern, where we just need to delete those based on the creationTime.
In conclusion, I believe maxAge should be more than enough for cleaning indices.

@akstron
Copy link
Contributor

akstron commented Dec 18, 2024

Hi @Rohanraj123 , let me know if you are working on it? I would like to take up merging cmd/es-index-cleaner/. Thanks

@Rohanraj123
Copy link
Contributor

Hi @Rohanraj123 , let me know if you are working on it? I would like to take up merging cmd/es-index-cleaner/. Thanks

yeah go ahead please!

@akstron
Copy link
Contributor

akstron commented Dec 23, 2024

Thanks, I will start working on it.

@akstron
Copy link
Contributor

akstron commented Dec 23, 2024

Hi @yurishkuro , I see that for cmd/es-index-cleaner, the IndicesClient wraps a simple http client underneath instead of wrapping the es go client library. Do you know if there is any particular reason for it? If not, should we refactor it first to use the es go client library?

@yurishkuro
Copy link
Member Author

ideally all logic should go through regular clients

@Manik2708
Copy link
Contributor

Manik2708 commented Dec 30, 2024

@yurishkuro I am little confused on the question, whether after this change, will we still be giving the cmds (For those who still want to keep these services in their control) or should we have to discard the cmd and move the code to pkg or a mixture of these two, keeping the cmd as it is but moving the code to a sharerd package for es and cmd?

@yurishkuro
Copy link
Member Author

@Manik2708 they should become sub-commands of jaeger binary, i.e. added to this list:

$ go run ./cmd/jaeger help
2024/12/30 11:06:11 application version: git-commit=, git-version=, build-date=
Jaeger backend v2

Usage:
  jaeger [flags]
  jaeger [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  components  Outputs available components in this collector distribution
  docs        Generates documentation
  help        Help about any command
  validate    Validates the config without running the collector
  version     Print the version.

@Manik2708
Copy link
Contributor

@yurishkuro Also is there any connection between RolloverFrequency and es-rollover?

@yurishkuro
Copy link
Member Author

@Manik2708 you have to check the code where/how it's used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch v2
Projects
None yet
Development

No branches or pull requests

4 participants