Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: identify and expose when connections are being closed or crashing constantly #101

Open
viniarck opened this issue Feb 15, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@viniarck
Copy link
Member

viniarck commented Feb 15, 2023

Problem:

Network operators who are deploying Kytos-ng in production and using of_core need to be able to identify (and hook it on external healthcheck mechanisms) when OpenFlow connections aren't getting stable either because of packets/handshake or a generalized crashes. Our python runtime shouldn't not struggle handling connections as long as it's a reasonable value, if it is, then of_core should expose that this is happening (maybe through and endpoint) just so this can be used externally to spun up and switchover to a different kytosd instance, this can help for recoverable errors.

Other than that, outside of code related implementation, network operators should also have alerts for how many errors or tracebacks have happened overtime, we can have this readily available on ES with Kibana, although alerts are premium ES feature, but the data is there, so a script could also poll or query that:

20230215_150853

20230215_150842

cc'ing @italovalcy for his info

This issue still needs further discution, but overall that's the problem we need to solve.

@viniarck viniarck added enhancement New feature or request future_release Planned for the next release labels Feb 15, 2023
@viniarck viniarck changed the title feat: Identify when connections are being closed or crashing constantly feat: identify when connections are being closed or crashing constantly Feb 15, 2023
@viniarck viniarck changed the title feat: identify when connections are being closed or crashing constantly feat: identify and expose when connections are being closed or crashing constantly Feb 15, 2023
@italovalcy
Copy link

I agree, @viniarck. This feature can be part of a watchdog Napp or something like this, which consolidates all validations (not only of_core) and translates into an operational status (which could indicate success, failure, or partial failure - includingg failure in non-critical components, so on)

@viniarck viniarck removed the future_release Planned for the next release label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants