-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]? prometheus-exporter-plugin-for-opensearch shows the cluster status red, but the background view cluster status is always yellow. #278
Comments
@east4ming Hi, thanks for reporting. Before I investigate further I have some questions:
In other words is it possible that Prometheus scraped the metrics right after the index was created but before even the primary shards were allocated (thus the state would be red 🔴 )? This can happen for a very short period of time but if that is the moment Prometheus scrapes the metrics then the next update of metric will come with the next scraping cycle. |
Thank you for your quick reply. Answer: Q: What action caused the cluster health state to change? Was it index creation? Q: And what is the Prometheus scraping interval? global:
evaluation_interval: 1m
scrape_interval: 1m
scrape_timeout: 10s
scrape_configs:
- job_name: 'log_opensearch'
scrape_timeout: 30s
static_configs:
- targets:
- 192.168.1.1:9200
metrics_path: "/_prometheus/metrics"
basic_auth:
username: 'xxxx'
password: 'xxxxxxxx'
metric_relabel_configs:
- action: keep
regex: opensearch_circuitbreaker_tripped_count|opensearch_cluster_datanodes_number|opensearch_cluster_nodes_number|opensearch_cluster_pending_tasks_number|opensearch_cluster_shards_active_percent|opensearch_cluster_shards_number|opensearch_cluster_status|opensearch_cluster_task_max_waiting_time_seconds|opensearch_fs_io_total_read_bytes|opensearch_fs_io_total_write_bytes|opensearch_fs_path_free_bytes|opensearch_fs_path_total_bytes|opensearch_index_fielddata_evictions_count|opensearch_index_flush_total_count|opensearch_index_flush_total_time_seconds|opensearch_index_indexing_delete_current_number|opensearch_index_indexing_index_count|opensearch_index_indexing_index_current_number|opensearch_index_indexing_index_failed_count|opensearch_index_indexing_index_time_seconds|opensearch_index_merges_current_size_bytes|opensearch_index_merges_total_docs_count|opensearch_index_merges_total_stopped_time_seconds|opensearch_index_merges_total_throttled_time_seconds|opensearch_index_merges_total_time_seconds|opensearch_index_querycache_evictions_count|opensearch_index_querycache_hit_count|opensearch_index_querycache_memory_size_bytes|opensearch_index_querycache_miss_number|opensearch_index_refresh_total_count|opensearch_index_refresh_total_time_seconds|opensearch_index_requestcache_evictions_count|opensearch_index_requestcache_hit_count|opensearch_index_requestcache_memory_size_bytes|opensearch_index_requestcache_miss_count|opensearch_index_search_fetch_count|opensearch_index_search_fetch_current_number|opensearch_index_search_fetch_time_seconds|opensearch_index_search_query_count|opensearch_index_search_query_current_number|opensearch_index_search_query_time_seconds|opensearch_index_search_scroll_count|opensearch_index_search_scroll_current_number|opensearch_index_search_scroll_time_seconds|opensearch_index_segments_memory_bytes|opensearch_index_segments_number|opensearch_index_shards_number|opensearch_index_store_size_bytes|opensearch_index_translog_operations_number|opensearch_indices_indexing_index_count|opensearch_indices_store_size_bytes|opensearch_ingest_total_count|opensearch_ingest_total_failed_count|opensearch_ingest_total_time_seconds|opensearch_jvm_bufferpool_number|opensearch_jvm_bufferpool_total_capacity_bytes|opensearch_jvm_bufferpool_used_bytes|opensearch_jvm_gc_collection_count|opensearch_jvm_gc_collection_time_seconds|opensearch_jvm_mem_heap_committed_bytes|opensearch_jvm_mem_heap_used_bytes|opensearch_jvm_mem_nonheap_committed_bytes|opensearch_jvm_mem_nonheap_used_bytes|opensearch_jvm_threads_number|opensearch_jvm_uptime_seconds|opensearch_os_cpu_percent|opensearch_os_mem_used_percent|opensearch_os_swap_free_bytes|opensearch_os_swap_used_bytes|opensearch_threadpool_tasks_number|opensearch_threadpool_threads_number|opensearch_transport_rx_bytes_count|opensearch_transport_server_open_number|opensearch_transport_tx_bytes_count|up|opensearch_os_cpu_percent And, my alert rule see below(I thought you might want to know): - alert: opensearchClusterRed
expr: opensearch_cluster_status == 2
for: 0m
labels:
severity: emergency
annotations:
summary: opensearch Cluster Red (instance {{ $labels.instance }}, node {{ $labels.node }})
description: "Elastic Cluster Red status\n VALUE = {{ $value }}" |
Thanks for more details.
Would you mind sharing a bit more information about this process please?
I am trying to see if we can recreate the sequence of steps to reliably replicate this issue, that is why I ask all the questions. Thanks a lot! |
Hi Lukáš, Answers: Thanks |
@east4ming Thanks for the details. I know that a red state can happen (for a short period) when a new index is created but based on your explanation this is not the case because you are actually closing and then deleting indices. I do not think there is any reason why the cluster should become red in this scenario. Q: Just for clarity, do you make sure the index |
Yes , close operation finished, delete operation will start. These operation is called by opensearch-curator tool. |
Backgrounds
We've just switched from ES to OpenSearch not long ago.
Recently, we found that appear such a case:
The prometheus-exporter-plugin-for-opensearch shows that the cluster status is red, but the direct view of the cluster status in the background (via the OpenSearch api) is always yellow(no transition to red).
See the following figure for details:
Note: UTC is 07:00 - 07:05, corresponding to UTC +8 is 15:00-15:05. The two pictures above are at the same time.
Details
cur -XGET -u username:password localhost:9200/_cat/health?v
Other
If you need more details, please reply. I'll attach it in due course.
Thank you, sir.
The text was updated successfully, but these errors were encountered: