-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v15] Handle SSM erros and add stdout/stderr and invocation url to audit log #41664
Conversation
🤖 Vercel preview here: https://docs-lc47qlsi8-goteleport.vercel.app/docs/ver/preview |
5eb6f85
to
72245ce
Compare
🤖 Vercel preview here: https://docs-584qblc19-goteleport.vercel.app/docs/ver/preview |
🤖 Vercel preview here: https://docs-8qug9tz73-goteleport.vercel.app/docs/ver/preview |
72245ce
to
5352262
Compare
🤖 Vercel preview here: https://docs-dyczbqqmn-goteleport.vercel.app/docs/ver/preview |
3470543
to
ffaa7e2
Compare
🤖 Vercel preview here: https://docs-mnxwoheix-goteleport.vercel.app/docs/ver/preview |
🤖 Vercel preview here: https://docs-o08ff08kn-goteleport.vercel.app/docs/ver/preview |
ffaa7e2
to
5e2e8ba
Compare
🤖 Vercel preview here: https://docs-cp6lwnr0q-goteleport.vercel.app/docs/ver/preview |
5e2e8ba
to
e71e71f
Compare
🤖 Vercel preview here: https://docs-azzhk96dr-goteleport.vercel.app/docs/ver/preview |
* Improve unavailable messages when EC2 Auto Discover with SSM fails EC2 Auto Discover calls ssm:SendCommand to install teleport in a set of EC2 Instances. This requires that the SSM Agent to be running and reporting back to the AWS SSM Service. This PR adds a new API call which is used to query the current status of the SSM agent in the target EC2 instance. If the agent did not register, is not currently online or the EC2 instance is running an unsupported operating system, an error is reported. The specific error is returned and the user can see this in the Audit Log. As an example, let's say we have 3 instances: - i-A: missing IAM permissions to connect to SSM - i-B: SSM ran but is now unhealthy - i-C: instance is running Windows Previously we had the following observable output after running the Discovery Service: i-A (missing iam permissions) Log message with stack trace indicating that "instance is not valid for account" with link for further troubleshoot. No audit log was emitted i-B (SSM is unhealthy) No app log, but audit log with status:failed and exit_code:-1 i-C (windows instance) No app log, but audit log with status:success and exit_code:0 After this PR, the following is reported: i-A (missing iam permissions) No app log Audit log with a clear status message (see code/tests) i-B (SSM is unhealthy) No app log Audit log with a clear status message (see code/tests) i-C (windows instance) No app log Audit log with a clear status message (see code/tests) If any other error happens, it will still be reported in the generic handler for the SendCommand API call. Given this is a new API call, if the Role does not allow it, a log warning is emitted and the behavior is the same as before. * best effort on emitting events * improve maxresults param
This PR adds two new fields to the SSMRun audit events: -stdout -stderr This will help diagnose the failures of teleport installations in EC2 instances using SSM (EC2 Auto Discover).
This PR adds a new field in the SSMRun audit event: invocation url. EC2 Auto Discover uses SSM to install teleport in the target instance. An invocation is the execution of a Command in an Instance. This URL points to that invocation and users can more easily debug what went wrong and how they can fix in case of a failure.
…#41479) This PRs fills in the stdout and stderr fields of the SSMRun audit event. The script to install teleport in ec2 instances has two steps: download and run shell script. This will help diagnose what failed during the auto discover of ec2 instances.
For agentless installations we would send an extra param to the ssm:SendCommand API. Customers can create and use custom SSM Documents, however, when using the default one, that parameter does not exist. The ssm:SendCommand API returns an error if an extra param is sent. This PR does a best-effort to accomodate for that: if a known error is returned and the known extra param was sent, remove it and try again.
This PR adds the invocation URL into the audit log when running the teleport installer script during EC2 Auto Discover.
) This PR uses a new AWS API that list the steps of the current invocation. After listing them, it will ask for the output of each one. Previously, we were using a static list of steps: those defined in the default SSM Document. However, for custom documts with different list of steps that would fail. If the client does not have access to this new API, we will fallback to the list of steps that exist in the default SSM Document. If we ask for a status of one of those steps, and we receive a known error indicating that the step does not exist, instead of failing we will emit the overall invocation result (which doesnt include stdout/stderr, but better than nothing)
e71e71f
to
8d110e6
Compare
🤖 Vercel preview here: https://docs-92vahzw5v-goteleport.vercel.app/docs/ver/preview |
Backport #41465, #41478, #41663, #41479, #41532, #41689 and #41706 to v15
changelog: Improves EC2 Auto Discovery by adding the SSM script output and more explicit error messages.