You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Index for job.project (and maybe other columns)
Lokal Bench:
SELECT * FROM job WHERE project = 'abcd100'
--SCAN TABLE job -> Run Time: real 4.769 user 1.152004 sys 3.609234
--SEARCH TABLE job USING INDEX job_by_project (project=?) -> Run Time: real 0.019 user 0.011652 sys 0.007310
Add accelerator metric average column
acc_utilization minimum
acc_mem_used, nv_mem_util would be helpful as well
Will allow new job filtering in frontend
Add variable JSON based footprint column / metadata column JSON field (might include accelerator metrics from 2. instead of stat specific columns)
Remove current specific stat columns like load_avg ?
Additionally: Provide Migration from/to new DB version
The text was updated successfully, but these errors were encountered:
Hi,
We've been talking about the clustercockpit db internally, and wonder what the roadmap here is.
Adding columns for new fields doesn't seem to scale, acc_util, job_energy, etc.
A future nice to have feature would be, to allow custom user meta data such as application timers for instance. This won't work with the current static approach.
We were wondering if moving from sqlite to postgres would be a thing. As far as I can tell the json and jsonb column types look very well suited for the cc use case.
You can index these columns and have a lot more flexibility with changing schemas.
We already have a json column with metadata. I internally tested to make the footprint metric columns a json column, means to put the stats metrics into a json. Then it can be flexible to add and remove metrics and you can have different set of metrics for eg different clusters.
sqlite has builtin json functions you can use in a query and in initial tests putting the metrics into a json column came with almost zero overhead. I tested this on our current production DB.
You even can put an index on a key in the json document. I already started to implement this. It will also solve the issues with hardcoded metrics.
We currently support sqlite and MySQL/MariaDB. I think it should be possible to support Postgres as a third option. The infrastructure to support multiple databases already exists.
I still think that an in process DB is always faster than an external DB. I also did tests with DuckDB, the new kid on the block. While Statistic queries where slightly faster with DuckDB, as soon as you have an index sqlite is twice as fast. At least in my tests. Also the count() queries, something we frequently need are blazingly fast with sqlite, with DuckDB the take as long as a standard query. sqlite seems to do something special for the count() queries.
Collected here for documentation
job.project
(and maybe other columns)acc_utilization
minimumacc_mem_used
,nv_mem_util
would be helpful as wellfootprint
column / metadata column JSON field (might include accelerator metrics from 2. instead of stat specific columns)load_avg
?Additionally: Provide Migration from/to new DB version
The text was updated successfully, but these errors were encountered: