-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus add nodes gauge for SQS mode #1083
base: main
Are you sure you want to change the base?
Conversation
6e910d0
to
91e34ac
Compare
@stevehipwell can you please give me a help on this. Now i only stuck at writing unit test for Do you think i should refactor this file along with this PR or should I open another separate PR (eg: make opentelemetry.go testable) |
@phuhung273 I'm not a maintainer here but I like to see untestable code refactored to be testable, following the boy scout rule. If you do the work in this PR you can always split the refactoring to a separate PR before it's merged. @LikithaVemulapalli what do you think? |
Yes I agree with @stevehipwell here, let's separate refactoring PR to have a clear idea on the changes made. @phuhung273 if you want to test your changes for this PR let me know I will approve and run so for the future commits you can verify if the existing tests are working, if there are any conflicts or not, once you change the PR status to ready I will run the workflow, appreciate for your contribution. Thanks! |
This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this PR to never become stale, please ask a maintainer to apply the "stalebot-ignore" label. |
/remove-lifecycle stale |
This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this PR to never become stale, please ask a maintainer to apply the "stalebot-ignore" label. |
Hi @phuhung273, could you resolve the conflicts in |
b9e1617
to
95fc5a9
Compare
95fc5a9
to
f4ae6de
Compare
Thanks and Happy new year @tiationg-kho. Conflicts resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @phuhung273,
Thanks for resolving the conflicts. I have left some comments.
Also, we can run the e2e test in local (docker desktop) to make sure our modification is valid.
- Run all e2e test cases:
make e2e-test
- Run certain e2e test case:
./test/k8s-local-cluster-test/run-test -a ./test/e2e/<test-case> -d -b e2e-test
Thanks @tiationg-kho for your instruction on the e2e test, took a while to realize where does it fail. I will try to include the 2 new metrics in e2e. |
e2e metrics test in SQS mode added, phew... Please let me know your thought @tiationg-kho, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding a new e2e test case! Just left few comments.
@LikithaVemulapalli could you please take a look too?
@@ -215,6 +215,12 @@ func main() { | |||
} | |||
log.Debug().Msgf("AWS Credentials retrieved from provider: %s", creds.ProviderName) | |||
|
|||
ec2Client := ec2.New(sess) | |||
|
|||
if nthConfig.EnablePrometheus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add nil check for metrics
here in case we have any error during observability.InitMetrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to, but the compiler doesn't like it. Can you guys have a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, "nil check" is incorrect. Since we return a zero-value struct when err exists.
Could we rename the error then check that error here?
metrics, initMetricsErr := observability.InitMetrics(nthConfig.EnablePrometheus, nthConfig.PrometheusPort)
if initMetricsErr != nil {
nthConfig.Print()
log.Fatal().Err(initMetricsErr).Msg("Unable to instantiate observability metrics,")
}
if [[ $cordoned -eq 0 ]]; then | ||
echo "❌ Worker node was not cordoned" | ||
else | ||
echo "❌ regular-pod-test was not evicted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not quite correct in the context of this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I just tried to test the 2 new metrics. Later found out that https://github.com/aws/aws-node-termination-handler/blob/main/test/e2e/prometheus-metrics-test only test the metrics in IMDS mode.
So I decided to also include other metrics as well. Thats why the file name is prometheus-metrics-sqs
: test all metrics in SQS mode. How do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could remove this if-else check, and replace by adding a status variable before TAINT_CHECK_CYCLES
for-loop, and check it after the for-loop to prevent the misleading scenario here (even we already cordoned and evicted but still get a "not evicted" in our log)?
The metric part is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some inline comments, thanks for your contribution :)
@@ -215,6 +215,12 @@ func main() { | |||
} | |||
log.Debug().Msgf("AWS Credentials retrieved from provider: %s", creds.ProviderName) | |||
|
|||
ec2Client := ec2.New(sess) | |||
|
|||
if nthConfig.EnablePrometheus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
654d8f1
to
a95474e
Compare
Issue #, if available:
Close #785
Description of changes:
k get node
return 5aws ec2 describe-instances
return 2Identify 3 nodes no longer under NTH control
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.