-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not recognizing worker in celery-exporter logs. #262
Comments
The cleanup of metrics happens when workers are timed out, it helps to lower label cardinality but this can be disabled. However, it happens by default after a timeout of 5 minutes, you do not see any metrics in your prometheus tsdb? |
The timeout happens for me on startup and going forward. I have access to some metrics but not all of them. This is what i see from the metrics endpoint. But even the metrics that are being sent is not including the complete list of tasks which should be firing.
|
Are your workers picking up the tasks and executing them? The metrics above just indicate that the client has sent tasks to the queue (note Enabled metrics for the workers?
|
Yeah I have metrics enabled for the worker and the tasks are being executed. The tasks are being picked up correctly through flower as well. |
Try running it with debug logging. |
These are the logs. It registers the heartbeat but still times out the worker. I have the workers sending a heartbeat every 10 seconds. ` 2023-08-08 16:48:25 2023-08-08 23:48:25.806 | DEBUG | src.exporter:track_worker_heartbeat:324 - Updated gauge='celery_worker_tasks_active' value='0' 2023-08-08 16:48:25 2023-08-08 23:48:25.806 | DEBUG | src.exporter:track_worker_heartbeat:327 - Updated gauge='celery_worker_up' value='1' 2023-08-08 16:48:26 2023-08-08 23:48:26.368 | INFO | src.exporter:track_timed_out_workers:188 - Have not seen 7e7bf8bb1670 for 25200.56 seconds. Removing from metrics 2023-08-08 16:48:26 2023-08-08 23:48:26.368 | DEBUG | src.exporter:forget_worker:147 - Updated gauge='celery_worker_tasks_active' value='0' 2023-08-08 16:48:26 2023-08-08 23:48:26.368 | DEBUG | src.exporter:forget_worker:150 - Updated gauge='celery_worker_up' value='0' The individual tasks also end up getting registered in the statistics, but the metrics aren't showing up correctly. ` 2023-08-08 16:48:47 2023-08-08 23:48:47.277 | DEBUG | src.exporter:track_task_event:257 - Received event='task-sent' for task='scheduler.tasks.send_notification' I did notice that the everytime it does time out the worker, the time is always 25000+ seconds. I'm curious if that might be the issue. |
It seems to be working fine, events are received and metrics seem to be generated. Why your worker times out, I'm not sure of - but nothing to do with the exporter (the exporter just marks it not up and cleans up the metrics for it). I'd still expect metrics in your Prometheus TSDB if you are scraping the exporter. |
I run everything in my environment inside containers, so for me, it had something to do with timezone. src.exporter.track_timed_out_workers:
so if the exporter timezone (time.time()) is something different than reported by celery (worker_status["ts"]), you may have a problem. I've fixed setting the exporter timezone environment to mine (TZ=America/Detroit) |
Something similar happened to me.
And in the logs I saw:
where 10800 is 3 hours difference (I'm in Adding a |
Hello, I am building out a monitoring stack for a django project. I am using celery, redis, and flower. Everything is running in docker containers.
The worker is recognized through flower, but when I check the logs for the celery exporter, I get this message
2023-08-03 15:54:22 2023-08-03 22:54:22.160 | INFO | src.exporter:track_timed_out_workers:188 - Have not seen cc2ae99afb58 for 25201.57 seconds. Removing from metrics
The tasks are being sent and recorded correctly in flower, but when I do the celery_tasks_received_total metric, none of them are registering.
I have events enables and have the worker sending out a heartbeat every 10 seconds. celery-exporter connects to the broker. I am not sure what is going on or how to even approach this anymore. Hoping someone could guide me in the right direction.
The text was updated successfully, but these errors were encountered: