Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "[supervisor] set pod failure reason when supervisor is reaped" #20327

Merged
merged 5 commits into from
Oct 30, 2024

Conversation

mustard-mh
Copy link
Contributor

@mustard-mh mustard-mh commented Oct 29, 2024

Description

/hold

How to confirm root problem? Check AWS S3 content which is backed up from @kylos101 's reproduce workspace, unzip tar and check logs of /workspace/.gitpod/supervisor-termination.log, the log is print by process.kill [relevant code refer [1] [2]]. So once we receive signals, we should ignore the case (if dev/term/log is empty as well)

image

Related Issue(s)

Fixes CLC-877

How to test

Repeat steps in #20318 and:

  • Create 4 workspace from gitpod-io/gitpod repo + VS Code browser
  • Set timeout to 1m gp timeout set 1m
  • Check pod failure message after ☕

Compare with a reproducible preview env https://hw-clc-877-reproduce.preview.gitpod-dev.com/workspaces, you should be able to reproduce it with gitpod-io/gitpod repo (try with 4 workspaces). (You may need to recreate preview env if it's deleted with branch hw/CLC-877-reproduce > exec leeway run dev:preview

(Or use https://github.com/mustard-mh/test/tree/hw/hang-task instead, as gitpod repo is too large to start in preview env -> no space left)

Documentation

Preview status

Gitpod was successfully deployed to your preview environment.

Build Options

Build
  • /werft with-werft
    Run the build with werft instead of GHA
  • leeway-no-cache
  • /werft no-test
    Run Leeway with --dont-test
Publish
  • /werft publish-to-npm
  • /werft publish-to-jb-marketplace
Installer
  • analytics=segment
  • with-dedicated-emulation
  • workspace-feature-flags
    Add desired feature flags to the end of the line above, space separated
Preview Environment / Integration Tests
  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-gce-vm
    If enabled this will create the environment on GCE infra
  • /werft preemptible
    Saves cost. Untick this only if you're really sure you need a non-preemtible machine.
  • with-integration-tests=all
    Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh. If enabled, with-preview and with-large-vm will be enabled.
  • with-monitoring

/hold

Copy link
Member

@filiptronicek filiptronicek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get timed out correctly in the preview env and code changes make sense. Thanks!

@kylos101
Copy link
Contributor

@mustard-mh if you enable with-large-vm, and disable preemptible, the preview VM will be much more stable:

image

@mustard-mh
Copy link
Contributor Author

@filiptronicek @kylos101 thank you for your helps! I plan to:

  1. First hot-deploy buggy changes to catfood that make sure I can reproduce it
  2. Then hot-deploy this fixing with reproduce steps from step 1 to verify if it's fixed there

Will update you if I made any progress

Copy link

New and removed dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
golang/github.com/gitpod-io/[email protected] None 0 9.19 kB
golang/github.com/ramr/[email protected] None 0 11.2 kB

🚮 Removed packages: golang/github.com/ramr/[email protected]

View full report↗︎

@mustard-mh
Copy link
Contributor Author

We did some manual testings on a production installation [internal chat], with the latest changes and reproduce steps, 25/25 workspaces are working well with gitpod-io/gitpod repo.

/unhold

@roboquat roboquat merged commit 825fb44 into main Oct 30, 2024
24 of 27 checks passed
@roboquat roboquat deleted the hw/CLC-877-2 branch October 30, 2024 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants