Some strange things while OOM #283

wwng2333 · 2024-11-13T13:05:17Z

Version of hub/agent: 0.8.0
1.My machine was oom during that time(about 1 hour30m), and the alert was sent over 100 emails. Maybe we need a limit?

2.The disk I/O speed is not normal while oom, that may cause of incurrent kernel status.

henrygd · 2024-11-13T16:27:40Z

Was the machine running the hub OOM or was it a remote agent?

This should not happen. If it was fully down for the whole period, only one status alert should be triggered. I'll look into it to see if there's a bug.
I'll add a check for this. If values are that extreme then we can assume that something is wrong and reset the stats.

wwng2333 · 2024-11-14T01:55:18Z

Was the machine running the hub OOM or was it a remote agent?

The machine just running the agent, hub running at another machine.

This should not happen. If it was fully down for the whole period, only one status alert should be triggered. I'll look into it to see if there's a bug.

Yep, I got 120+ mails report that machine was down, but no mail say it's up, that's why i feel strange.

I'll add a check for this. If values are that extreme then we can assume that something is wrong and reset the stats.

OK, thank you sir.

henrygd · 2024-11-14T02:18:03Z

Thank you. Can you please tell me if the notifications were all sent at the same time? Or were they spaced out throughout the downtime?

wwng2333 · 2024-11-14T02:28:34Z

Thank you. Can you please tell me if the notifications were all sent at the same time? Or were they spaced out throughout the downtime?

I will show you the smtp record under.

wwng2333 · 2024-11-17T03:18:11Z

I met same situation(lots of mail about down) at another vm, the problem may caused by the bad network connection between US and China. The vm running normal without any omm situation, and the agent was running in docker.
After i pause and unpause the agent, it works again.
I received about 13 emails for that.

Update: Issue resolved.The problem was caused by network, i use hostname to resolve that host, and the host was added by IPv6 last night, i added the AAAA record for that, and i forgot to allow firewall for port 45876 via IPv6, but IPv4 works fine.After DNS update, the hub try to connect my agent via IPv6, then it's not able to connect. After i allow port at firewall, it automatic recovered.
Problem: If the hostname is pointing both IPv4 and IPv6 host, should the hub connnect them both via IPv4 and IPv6?

henrygd · 2024-11-17T06:18:34Z

I think by default it should fall back to ipv4 if ipv6 doesn't work, but I need to look further into it. Maybe there's a conflict with how we're handling the errors.

I'm very sorry about the emails. This definitely looks like a bug, but I haven't had time to work on it. I should have time in the next few days. I want to redo the status alerts anyway to allow specifying a time period like the other alerts.

wwng2333 changed the title ~~Some strange thigns while OOM~~ Some strange things while OOM Nov 13, 2024

henrygd added the bug Something isn't working label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some strange things while OOM #283

Some strange things while OOM #283

wwng2333 commented Nov 13, 2024

henrygd commented Nov 13, 2024

wwng2333 commented Nov 14, 2024

henrygd commented Nov 14, 2024

wwng2333 commented Nov 14, 2024

wwng2333 commented Nov 17, 2024 •

edited

Loading

henrygd commented Nov 17, 2024

Some strange things while OOM #283

Some strange things while OOM #283

Comments

wwng2333 commented Nov 13, 2024

henrygd commented Nov 13, 2024

wwng2333 commented Nov 14, 2024

henrygd commented Nov 14, 2024

wwng2333 commented Nov 14, 2024

wwng2333 commented Nov 17, 2024 • edited Loading

henrygd commented Nov 17, 2024

wwng2333 commented Nov 17, 2024 •

edited

Loading