Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support PostgreSQL #23

Closed
wants to merge 14 commits into from

Conversation

bluthg
Copy link
Contributor

@bluthg bluthg commented Feb 10, 2020

The code that ended up in the Github is not the actually working version. It has a badly named alias that leads to NULL results sometimes!

I also added the pg_tap tests with a small README.md.

)
RETURNS float
LANGUAGE SQL
id INTEGER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change from bigint to int for id? AFAICT object_id is still bigint or bigserial, so this would require casting.

@bluthg
Copy link
Contributor Author

bluthg commented Feb 11, 2020

Maybe this just didn't age well, while the rest of the code went on... it's been a while ;-)

I'll fix that.

@mbanck
Copy link
Contributor

mbanck commented Feb 12, 2020

Apart from #22 (i.e. if I change state > 1 AS down to state > 0 AS down), it seems I am getting correct host SLAs with this version now, thanks!

@@ -0,0 +1 @@
../get_sla_ok_percent_postgres.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a symlink, but ../get_sla_ok_percent_postgres.sql does not exist in this branch, it's ../get_sla_ok_percent.sql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mbanck
Copy link
Contributor

mbanck commented Feb 12, 2020

I think there is a problem if a host stays down, then the SLA for a random timerange since it went down does not seem to be 0.

For example for icinga_statehistory:

COPY icinga_statehistory FROM STDIN;
2019-02-01 00:00:00+01	1	0	1
2019-02-05 11:00:00+01	1	3	0
2019-02-05 12:00:00+01	1	3	1
\.

The host went down hard at 12:00 and never came back up.

Not sure it matters, but I also set icinga_hoststatus as such:

COPY icinga_hoststatus FROM STDIN;
0	2019-02-10 15:00:00+01	1	1
\.

Then I get a failure for the following as t/07-test-func.sql:

SELECT plan(3);
SELECT ok(idoreports_get_sla_ok_percent(1,'2019-02-05 11:00', '2019-02-05 13:00') = 50.0,'Host 1 was down 1 out of 2 hours');
SELECT ok(idoreports_get_sla_ok_percent(1,'2019-02-05 12:00', '2019-02-05 13:00') = 0.0,'Host 1 was down during that period');
SELECT ok(idoreports_get_sla_ok_percent(1,'2019-02-05 13:00', '2019-02-05 14:00') = 0.0,'Host 1 was down during that period');

The test passes if I set the expected result of the third (but not the second!) test to 100, which looks clearly wrong to me.

@mbanck
Copy link
Contributor

mbanck commented Feb 12, 2020

I think the problem might be this line in the relevant CTE:

WHERE down != next_down OR down != prev_down

If I keep it in, before and after correctly show the host as down, however, due to no other events the WHERE conditions seems to filter that out, so relevant returns nothing, resulting in a 100% SLA.

If I comment out the above WHERE condition, I seem to get the correct SLA for my cases, but it is then probably incorrect for cases with one or more hard events during the considered timeperiod.

@bluthg
Copy link
Contributor Author

bluthg commented Feb 12, 2020

That WHERE was a shortcut (idea was to keep the number of rows smaller). The end result is still correct when omitting it, the CTE just has to look at more rows.
So it is now gone, and all the tests pass.

@mbanck
Copy link
Contributor

mbanck commented Feb 12, 2020

Cool! give it another round of tests tomorrow hopefully.

@mbanck
Copy link
Contributor

mbanck commented Feb 13, 2020

I've tested it again now and it seems to work fine for me. I've deployed it for a customer and asked them to check as well, but not sure whether we get much feedback soon.

In any case, this is much better than what's currently in feature/postgresql

@bluthg
Copy link
Contributor Author

bluthg commented Mar 3, 2020

As pointed out by @mbanck, the referred tables all use TIMESTAMP, not TIMESTAMP WITH TIME ZONE, which leads ti skewed results in Grafana.
So, use TIMESTAMP for the function too.

@mbanck mbanck mentioned this pull request Mar 27, 2020
bluthg and others added 2 commits March 27, 2020 11:34
… time (@mbanck found a corner case where that was not the case). Also re-syncs the "plain-sql" file (as it _is_ useful ;-)
@mbanck
Copy link
Contributor

mbanck commented Mar 27, 2020

So I implemented similar test cases for mysql, see #28. However, I had to include quite a few more columns in the test data as they are referenced from the corresponding mysql function:

icinga_statehistory.last_state, icinga_statehistory.last_hard_state,
icinga_servicestatus.last_state_change, icinga_servicestatus.last_hard_state_change,
icinga_servicestatus.last_hard_state, icinga_hoststatus.last_state_change,
icinga_hoststatus.last_hard_state_change, icinga_hoststatus.last_hard_state

That makes me wonder whether there are still corner cases lurking. I am not saying that there is a problem, just that there might be.

@lippserd
Copy link
Member

@mbanck Yeah, there may be cases which the PostgreSQL procedure is not covering at the moment. Many of the MySQL code is for older Icinga 2 versions where we had slight problems with the history tables. I do think that this is sufficient for the moment.

I took the liberty to polish the commits a little bit and created a new PR #30

Thanks everybody for the effort. Great work 👍

@bluthg You should have commit rights in the new PR, if needed.

Closing here.

@lippserd lippserd closed this Mar 31, 2020
@lippserd lippserd added the duplicate This issue or pull request already exists label Mar 31, 2020
@lippserd lippserd changed the title Feature/postgresql WIP: Support PostgreSQL Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants