Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested Changes to SQLite Database #274

Open
spacehamster87 opened this issue Jun 27, 2024 · 2 comments · May be fixed by #289
Open

Suggested Changes to SQLite Database #274

spacehamster87 opened this issue Jun 27, 2024 · 2 comments · May be fixed by #289
Labels
backend backend API related issues enhancement New feature or request
Milestone

Comments

@spacehamster87
Copy link
Contributor

spacehamster87 commented Jun 27, 2024

Collected here for documentation

  1. Add Index for job.project (and maybe other columns)
  • Lokal Bench:
SELECT * FROM job WHERE project = 'abcd100'
  
--SCAN TABLE job -> Run Time: real 4.769 user 1.152004 sys 3.609234
--SEARCH TABLE job USING INDEX job_by_project (project=?) -> Run Time: real 0.019 user 0.011652 sys 0.007310
  1. Add accelerator metric average column
  • acc_utilization minimum
  • acc_mem_used, nv_mem_util would be helpful as well
  • Will allow new job filtering in frontend
  1. Add variable JSON based footprint column / metadata column JSON field (might include accelerator metrics from 2. instead of stat specific columns)
  • Remove current specific stat columns like load_avg ?

Additionally: Provide Migration from/to new DB version

@spacehamster87 spacehamster87 added enhancement New feature or request backend backend API related issues labels Jun 27, 2024
@spacehamster87 spacehamster87 added this to the Release 1.4.0 milestone Jun 27, 2024
@giesselmann
Copy link
Collaborator

Hi,
We've been talking about the clustercockpit db internally, and wonder what the roadmap here is.
Adding columns for new fields doesn't seem to scale, acc_util, job_energy, etc.
A future nice to have feature would be, to allow custom user meta data such as application timers for instance. This won't work with the current static approach.

We were wondering if moving from sqlite to postgres would be a thing. As far as I can tell the json and jsonb column types look very well suited for the cc use case.
You can index these columns and have a lot more flexibility with changing schemas.

@moebiusband73
Copy link
Member

moebiusband73 commented Jun 27, 2024

We already have a json column with metadata. I internally tested to make the footprint metric columns a json column, means to put the stats metrics into a json. Then it can be flexible to add and remove metrics and you can have different set of metrics for eg different clusters.

sqlite has builtin json functions you can use in a query and in initial tests putting the metrics into a json column came with almost zero overhead. I tested this on our current production DB.
You even can put an index on a key in the json document. I already started to implement this. It will also solve the issues with hardcoded metrics.

We currently support sqlite and MySQL/MariaDB. I think it should be possible to support Postgres as a third option. The infrastructure to support multiple databases already exists.

I still think that an in process DB is always faster than an external DB. I also did tests with DuckDB, the new kid on the block. While Statistic queries where slightly faster with DuckDB, as soon as you have an index sqlite is twice as fast. At least in my tests. Also the count() queries, something we frequently need are blazingly fast with sqlite, with DuckDB the take as long as a standard query. sqlite seems to do something special for the count() queries.

@spacehamster87 spacehamster87 linked a pull request Sep 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend backend API related issues enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants