Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stable monitor connections in c sshnpd #1628

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

XavierChanth
Copy link
Member

Pinned to atsdk PR: atsign-foundation/at_c#479

- What I did

First, some of the changes I did in at_c which contribute to this PR:

  • Separate sockets from connections (connection code contains specifics to atDirectory and atServers)
  • Ensure the connection message is always well formed, by changing the way we read from the socket
    • Previously we read whatever was available and null terminate the string at a new line,
    • Now, we read until the newline, ensuring the read buffer is always well aligned
  • Revised socket reading approach as part of the new socket layer

NoPorts changes

  • Uptake changes in at_c
  • Set monitor connection read timeout to 5 seconds
  • Add a timeout counter to the monitor loop
    • Monitor will try reconnect if it exceeds a threshold
    • Non-error reads reset the counter
    • Error reads set the counter above the threshold triggering a reconnect
    • This relies on the monitor heartbeat, to reset the counter before it reaches the threshold
      • heartbeat is ~15 seconds
      • threshold is 40 seconds
        (marginally greater, as there's some synchronous calls in between counts)
  • No more noops on the monitor connection!

- How I did it

- How to verify it

  • Run c sshnpd (in verbose mode) on a device with multiple internet adapters
  • Disable / disconnect the primary connection
  • Wait for the threshold and watch it reconnect

- Description for the changelog
fix: stable monitor connections in c sshnpd

@XavierChanth XavierChanth marked this pull request as ready for review December 19, 2024 21:35
@XavierChanth
Copy link
Member Author

XavierChanth commented Dec 19, 2024

@cpswan @cconstab from what I can tell this monitor connection is resilient to:

  • changes in network adapter
  • loss of connection

In both cases, it seems to recover once the timeout reaches the threshold (and network connectivity is restored on an adapter)

I'm also going to leave a long running session to see if I notice anything.

@XavierChanth
Copy link
Member Author

Looks like the e2e tests are failing, it's timing out on the Dart tests before it even get's to the C ones...

- monitor bug fix & atdirectory returns null bug fix
@XavierChanth
Copy link
Member Author

XavierChanth commented Dec 20, 2024

I've had a C daemon from commit f1a9132 running since last night.
I am keeping it up in tmux for as long as I can to determine if we have any long-term hangs.

edc2cb fixes two bugs:

  • When atDirectory responds with "@null", the prompt parser is still considered a non error, since it should be considered an error later on.
  • Make sure the notification RECV: debug log is printed with the correct length since it's not null terminated

I've also got a edc2cbe daemon running now, in addition to keeping the other one (with non-major bugs) alive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants