Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Grafana dashboards are outdated and make stats is failing #4346

Closed
2 tasks done
devictr opened this issue Nov 1, 2023 · 2 comments · Fixed by #4382
Closed
2 tasks done

[BUG] Grafana dashboards are outdated and make stats is failing #4346

devictr opened this issue Nov 1, 2023 · 2 comments · Fixed by #4382
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working

Comments

@devictr
Copy link

devictr commented Nov 1, 2023

Describe the bug

The Grafana dashboards here are outdated. It seems like they were last updated 2 years ago and that a lot of metrics referenced there have been renamed or are not being produced anymore.
I tried running make stats to generate an updated version, but I'm getting the following error:

Traceback (most recent call last):
  File "/home/victor.delepine/.local/bin/generate-dashboard", line 8, in <module>
    sys.exit(generate_dashboard_script())
  File "/home/victor.delepine/.local/lib/python3.8/site-packages/grafanalib/_gen.py", line 242, in generate_dashboard_script
    run_script(generate_dashboard)
  File "/home/victor.delepine/.local/lib/python3.8/site-packages/grafanalib/_gen.py", line 80, in run_script
    sys.exit(f(sys.argv[1:]))
  File "/home/victor.delepine/.local/lib/python3.8/site-packages/grafanalib/_gen.py", line 223, in generate_dashboard
    dashboard = loader(opts.dashboard)
  File "/home/victor.delepine/.local/lib/python3.8/site-packages/grafanalib/_gen.py", line 74, in loader
    raise DefinitionError(
grafanalib._gen.DefinitionError: Definition /tmp/flyte/stats/flytepropeller_dashboard.py does not define a variable '/tmp/flyte/stats/flytepropeller_dashboard'
make: *** [Makefile:62: stats] Error 1

(Please disregard the /tmp/ path, I was trying to check something else)

Example of outdated metrics: all the flyte:admin:database:postgres:repositories:* are now under flyte:admin:admin:database:*

Most metrics that measure durations like flyte:propeller:all:workflow:failure_duration_ms need to be prefixed by "unlabeled", like: flyte:propeller:all:workflow:failure_duration_unlabeled_ms.
A lot of Flyte Admin metrics need a second "admin" prefix: flyte:admin:list_launch_plan:codes:OK becomes flyte:admin:admin:list_launch_plan:codes:OK

Expected behavior

The Grafana dashboards should be in sync with the state of prometheus metrics in the repo

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@devictr devictr added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Nov 1, 2023
@eapolinario eapolinario added backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Nov 2, 2023
@Tom-Newton
Copy link
Contributor

I managed to get the generation scripts working and updated them where necessary. I can probably make a PR.

@Tom-Newton
Copy link
Contributor

I raised a PR for review which I think resolves this #4382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants