Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use environment var for ResolveIPAddressTimeout #2455

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

SmartManoj
Copy link
Contributor

@SmartManoj SmartManoj commented Jun 30, 2024

Like #1625; Resolves #2454


Use the CUSTOM_TIMEOUT environment variable to configure the timeout for resolving IP addresses.

  • Import the strconv package.
  • Add a new variable customTimeout to read the CUSTOM_TIMEOUT environment variable.
  • Parse the CUSTOM_TIMEOUT value and set the timeout variable accordingly.

For more details, open the Copilot Workspace session.

@SmartManoj
Copy link
Contributor Author

Will you follow this option?


How about adding a GitHub Actions workflow to sign off commits automatically if not done? This will help ensure compliance with the Developer Certificate of Origin (DCO) and streamline the contribution process.

Here is an example workflow that can be added to the repository:

name: Auto Sign-Off

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  signoff:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Set up Git
        run: |
          git config --global user.name "github-actions[bot]"
          git config --global user.email "github-actions[bot]@users.noreply.github.com"

      - name: Sign off commits
        run: |
          for commit in $(git rev-list --reverse HEAD ^origin/main); do
            git cherry-pick $commit --strategy=recursive -X theirs --signoff
          done

      - name: Push changes
        run: |
          git push origin HEAD:refs/heads/${{ github.head_ref }} --force
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

@AkihiroSuda
Copy link
Member

How about adding a GitHub Actions workflow to sign off commits automatically if not done? This will help ensure compliance with the Developer Certificate of Origin (DCO) and streamline the contribution process.

Nope, rather opposite, as automated signs are unlikely valid

@@ -74,7 +74,7 @@ func (c *Client) ResolveAndForwardSSH(ipAddr string, sshPort int) error {
}

func (c *Client) ResolveIPAddress(ctx context.Context, vmMacAddr string) (string, error) {
timeout := time.After(2 * time.Minute)
timeout := time.After(3 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it take more than 2 minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During #1625, Why does it take more than 1 minute?

Copy link
Member

@afbjorklund afbjorklund Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems rather arbitrary, since it started out as 500ms

Seeing the same type of issues when working with mDNS - sometimes my machine is really slow to respond (like when waking up from idle) - but I also don't want to wait several seconds for when the machine really is gone. For now that (other) issue just has a timeout workaround, when doing lookups in the .local domain.

Copy link
Member

@jandubois jandubois Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that the GitHub macOS runners are painfully slow, so I can see that you may need special accommodations for them. Maybe instead of relaxing timeouts generally, we should have some kind of setting, like LIMA_SLOW_HOST_FACTOR=2 that would double all the timeouts in Lima to be more forgiving?

Feels quite kludgy, but still better than making the timeouts longer for the regular use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SmartManoj Can you try setting --cpus=1 --memory=1 see this helps ??

We are also using macOS runners. 2mins as timeout is working fine. I feel like even after the above 3mins change you will see some failures like this.

Copy link
Contributor Author

@SmartManoj SmartManoj Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will reducing memory help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with github macos runners is that if it takes more memory they will terminate the vm process / take more time to start basically slowers all other stuff.

In this memory, cpu setup we saw better success rate with macOS runners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jandubois
jandubois previously approved these changes Jul 5, 2024
Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change, so am approving it.

As it is kind of a hack though, I won't merge it unless it gets another approval from another @lima-vm/reviewers

@jandubois jandubois requested a review from a team July 5, 2024 06:31
@afbjorklund
Copy link
Member

it is kind of a hack though, I won't merge it unless it gets another approval

It was already hack with 2 minutes, I don't see how 3 minutes make it worse...

The real hack is making GitHub Actions into a special case, is what I'm thinking

@SmartManoj
Copy link
Contributor Author

SmartManoj commented Jul 5, 2024

Why was the test failed with msg="did not receive an event with the \"running\" status"?

@jandubois
Copy link
Member

Why was the test failed with msg="did not receive an event with the \"running\" status"?

We can re-run the test to see if it is just a flake, but you also need to squash your commits together and sign it with a DCO before it can be considered for merging (assuming there will otherwise be consensus that the change is desirable).

@SmartManoj
Copy link
Contributor Author

Why was the test failed with msg="did not receive an event with the \"running\" status"?

Due to this timeout.

const DefaultWatchHostAgentEventsTimeout = 10 * time.Minute

@jandubois Are there any objections to changing this in this PR?

@jandubois
Copy link
Member

Due to this timeout.

const DefaultWatchHostAgentEventsTimeout = 10 * time.Minute

@jandubois Are there any objections to changing this in this PR?

I would like to first see some evidence that a longer timeout "fixes" the issue.

I have a suspicion that something is just wedged, and it will never get ready until it is killed and restarted. In that case increasing the timeout will just mean it takes even longer before CI eventually fails.

So maybe use a much longer timeout in a branch, and re-run the tests a bunch of times, to see if they will then always pass.

Only then can we meaningfully discuss if we want to extend the timeout, and if it should again be gates by running inside GitHub actions.

@SmartManoj
Copy link
Contributor Author

SmartManoj commented Jul 18, 2024

it will never get ready until it is killed and restarted.

first two attempts failed with msg="[hostagent] QEMU did not exit in 3m0s, forcibly killing QEMU"
third one with msg="error starting docker: cannot restart, VM not previously started" (inference => 1st essential requirement is satisfied in ~7mins (3+3+1))

here I didn't kill, stop, or delete it.

Action link


Same behavior even after stopped.

Action link

@SmartManoj
Copy link
Contributor Author

Use the CUSTOM_TIMEOUT environment variable to configure the timeout for resolving IP addresses.

  • Import the strconv package.
  • Add a new variable customTimeout to read the CUSTOM_TIMEOUT environment variable.
  • Parse the CUSTOM_TIMEOUT value and set the timeout variable accordingly.

For more details, open the Copilot Workspace session.

pkg/networks/usernet/client.go Outdated Show resolved Hide resolved
pkg/networks/usernet/client.go Outdated Show resolved Hide resolved
@SmartManoj SmartManoj changed the title Increase timeout for resolving IP Address. Use environment var for ResolveIPAddressTimeout Jul 23, 2024
@jandubois
Copy link
Member

@lima-vm/reviewers Where are we with this PR? I think we can merge it now, but I think the environment variable should be documented "somewhere", I just can't figure out where. Any suggestions?

@SmartManoj
Copy link
Contributor Author

@jandubois
Copy link
Member

https://lima-vm.io/docs/usage/#customization
Here?

I think that page is supposed to be simple and have just the high-level examples.

Maybe we need an extra page under the "Configuration Guide"?

@SmartManoj
Copy link
Contributor Author

Maybe we need an extra page under the "Configuration Guide"?

With the name "Environment Variables", documenting $LIMA_INSTANCE too?

@AkihiroSuda
Copy link
Member

Please squash the commits

@jandubois
Copy link
Member

Maybe we need an extra page under the "Configuration Guide"?

With the name "Environment Variables", documenting $LIMA_INSTANCE too?

Sounds good to me.

In that case include all the other variables too (LIMA_SHELL, LIMA_WORKDIR, LIMACTL), and document where/when they are used.

LIMA_INSTANCE and LIMACTL are supported by lima and lima.bat, but LIMA_SHELL and LIMA_WORKDIR only by lima. LIMA_INSTANCE is also supported by docker.lima and kubectl.lima... Please check all the places!

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <[email protected]>
@SmartManoj
Copy link
Contributor Author

Copilot Workspace for updating docs

It touches 8 files. First 2 is enough?

Should I update the docs in this PR itself?

SmartManoj added a commit to SmartManoj/lima that referenced this pull request Aug 2, 2024
Created a new PR to avoid the blockage of lima-vm#2455

Add documentation for environment variables used in Lima.

* **New Documentation Page**: Add `website/content/en/docs/config/environment-variables.md` to document environment variables `LIMA_INSTANCE`, `LIMA_SHELL`, `LIMA_WORKDIR`, `LIMACTL`, and `LIMA_USERNET_RESOLVE_IP_ADDRESS_TIMEOUT`.
* **Update Configuration Guide**: Modify `website/content/en/docs/config/_index.md` to include a link to the new 'Environment Variables' page.
* **Update README**: Modify `README.md` to include a link to the new 'Environment Variables' page.
* **Add Comments in Scripts**:
  - `cmd/lima`: Add comments documenting `LIMA_INSTANCE`, `LIMA_SHELL`, `LIMA_WORKDIR`, and `LIMACTL`.
  - `cmd/lima.bat`: Add comments documenting `LIMA_INSTANCE` and `LIMACTL`.
  - `cmd/docker.lima`: Add comments documenting `LIMA_INSTANCE`.
  - `cmd/kubectl.lima`: Add comments documenting `LIMA_INSTANCE`.
  - `cmd/limactl/usernet.go`: Add comments documenting `LIMA_USERNET_RESOLVE_IP_ADDRESS_TIMEOUT`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/lima-vm/lima/pull/2455?shareId=XXXX-XXXX-XXXX-XXXX).
SmartManoj added a commit to SmartManoj/lima that referenced this pull request Aug 2, 2024
Created a new PR to avoid the blockage of lima-vm#2455

Add documentation for environment variables used in Lima.

* **New Documentation Page**: Add `website/content/en/docs/config/environment-variables.md` to document environment variables `LIMA_INSTANCE`, `LIMA_SHELL`, `LIMA_WORKDIR`, `LIMACTL`, and `LIMA_USERNET_RESOLVE_IP_ADDRESS_TIMEOUT`.
* **Update Configuration Guide**: Modify `website/content/en/docs/config/_index.md` to include a link to the new 'Environment Variables' page.
* **Update README**: Modify `README.md` to include a link to the new 'Environment Variables' page.
* **Add Comments in Scripts**:
  - `cmd/lima`: Add comments documenting `LIMA_INSTANCE`, `LIMA_SHELL`, `LIMA_WORKDIR`, and `LIMACTL`.
  - `cmd/lima.bat`: Add comments documenting `LIMA_INSTANCE` and `LIMACTL`.
  - `cmd/docker.lima`: Add comments documenting `LIMA_INSTANCE`.
  - `cmd/kubectl.lima`: Add comments documenting `LIMA_INSTANCE`.
  - `cmd/limactl/usernet.go`: Add comments documenting `LIMA_USERNET_RESOLVE_IP_ADDRESS_TIMEOUT`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/lima-vm/lima/pull/2455?shareId=XXXX-XXXX-XXXX-XXXX).

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <[email protected]>
Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@jandubois
Copy link
Member

Should I update the docs in this PR itself?

No, separate PR is fine since it will document more than just the single variable in this PR.

@jandubois jandubois added this to the v0.23.0 milestone Aug 2, 2024
@SmartManoj
Copy link
Contributor Author

Please squash the commits

image

Isn't squash and merge not enabled for this repo?

@jandubois
Copy link
Member

Isn't squash and merge not enabled for this repo?

No, it is up to the PR author to combine commits and write a commit message that covers the squashed commits correctly.

Also (but rarely) there is justification for squashing commits into just a smaller number of commits instead of a single one. This should only be done to separate different areas of functionality, and not to maintain the individual steps that lead to the final change.

Typically this means that the 2 separate commits should have been 2 separate PRs.

@jandubois jandubois merged commit 200db6d into lima-vm:master Aug 8, 2024
26 of 27 checks passed
@AkihiroSuda
Copy link
Member

மனோஜ்குமார் பழனிச்சாமி

Next time please consider using Latin alphabets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[hostagent] Driver stopped due to error: \"usernet unable to resolve IP for SSH forwarding\"
6 participants