Suggested Changes to SQLite Database #274

spacehamster87 · 2024-06-27T08:51:04Z

Collected here for documentation

Add Index for job.project (and maybe other columns)

Lokal Bench:

SELECT * FROM job WHERE project = 'abcd100'
  
--SCAN TABLE job -> Run Time: real 4.769 user 1.152004 sys 3.609234
--SEARCH TABLE job USING INDEX job_by_project (project=?) -> Run Time: real 0.019 user 0.011652 sys 0.007310

Add accelerator metric average column

acc_utilization minimum
acc_mem_used, nv_mem_util would be helpful as well
Will allow new job filtering in frontend

Add variable JSON based footprint column / metadata column JSON field (might include accelerator metrics from 2. instead of stat specific columns)

Remove current specific stat columns like load_avg ?

Additionally: Provide Migration from/to new DB version

The text was updated successfully, but these errors were encountered:

giesselmann · 2024-06-27T09:27:25Z

Hi,
We've been talking about the clustercockpit db internally, and wonder what the roadmap here is.
Adding columns for new fields doesn't seem to scale, acc_util, job_energy, etc.
A future nice to have feature would be, to allow custom user meta data such as application timers for instance. This won't work with the current static approach.

We were wondering if moving from sqlite to postgres would be a thing. As far as I can tell the json and jsonb column types look very well suited for the cc use case.
You can index these columns and have a lot more flexibility with changing schemas.

moebiusband73 · 2024-06-27T09:39:01Z

We already have a json column with metadata. I internally tested to make the footprint metric columns a json column, means to put the stats metrics into a json. Then it can be flexible to add and remove metrics and you can have different set of metrics for eg different clusters.

sqlite has builtin json functions you can use in a query and in initial tests putting the metrics into a json column came with almost zero overhead. I tested this on our current production DB.
You even can put an index on a key in the json document. I already started to implement this. It will also solve the issues with hardcoded metrics.

We currently support sqlite and MySQL/MariaDB. I think it should be possible to support Postgres as a third option. The infrastructure to support multiple databases already exists.

I still think that an in process DB is always faster than an external DB. I also did tests with DuckDB, the new kid on the block. While Statistic queries where slightly faster with DuckDB, as soon as you have an index sqlite is twice as fast. At least in my tests. Also the count() queries, something we frequently need are blazingly fast with sqlite, with DuckDB the take as long as a standard query. sqlite seems to do something special for the count() queries.

spacehamster87 added enhancement New feature or request backend backend API related issues labels Jun 27, 2024

spacehamster87 added this to the Release 1.4.0 milestone Jun 27, 2024

spacehamster87 linked a pull request Sep 24, 2024 that will close this issue

Update Q4 2024 #289

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested Changes to SQLite Database #274

Suggested Changes to SQLite Database #274

spacehamster87 commented Jun 27, 2024 •

edited

Loading

giesselmann commented Jun 27, 2024

moebiusband73 commented Jun 27, 2024 •

edited

Loading

Suggested Changes to SQLite Database #274

Suggested Changes to SQLite Database #274

Comments

spacehamster87 commented Jun 27, 2024 • edited Loading

giesselmann commented Jun 27, 2024

moebiusband73 commented Jun 27, 2024 • edited Loading

spacehamster87 commented Jun 27, 2024 •

edited

Loading

moebiusband73 commented Jun 27, 2024 •

edited

Loading