Skip to content

Commit

Permalink
[v15] Handle SSM erros and add stdout/stderr and invocation url to au…
Browse files Browse the repository at this point in the history
…dit log (#41664)

* Improve messages when EC2 Auto Discover with SSM fails (#41465)

* Improve unavailable messages when EC2 Auto Discover with SSM fails

EC2 Auto Discover calls ssm:SendCommand to install teleport in a set of
EC2 Instances.
This requires that the SSM Agent to be running and reporting back to the
AWS SSM Service.

This PR adds a new API call which is used to query the current status of
the SSM agent in the target EC2 instance.

If the agent did not register, is not currently online or the EC2
instance is running an unsupported operating system, an error is
reported.
The specific error is returned and the user can see this in the Audit
Log.

As an example, let's say we have 3 instances:
- i-A: missing IAM permissions to connect to SSM
- i-B: SSM ran but is now unhealthy
- i-C: instance is running Windows

Previously we had the following observable output after running the
Discovery Service:
i-A (missing iam permissions)
Log message with stack trace indicating that "instance is not valid for
account" with link for further troubleshoot.
No audit log was emitted

i-B (SSM is unhealthy)
No app log, but audit log with status:failed and exit_code:-1

i-C (windows instance)
No app log, but audit log with status:success and exit_code:0

After this PR, the following is reported:
i-A (missing iam permissions)
No app log
Audit log with a clear status message (see code/tests)

i-B (SSM is unhealthy)
No app log
Audit log with a clear status message (see code/tests)

i-C (windows instance)
No app log
Audit log with a clear status message (see code/tests)

If any other error happens, it will still be reported in the generic
handler for the SendCommand API call.

Given this is a new API call, if the Role does not allow it, a log
warning is emitted and the behavior is the same as before.

* best effort on emitting events

* improve maxresults param

* Add SSM Commands stdout/err to audit log (#41478)

This PR adds two new fields to the SSMRun audit events:
-stdout
-stderr

This will help diagnose the failures of teleport installations in EC2
instances using SSM (EC2 Auto Discover).

* SSMRun Audit Event: add invocation url (#41663)

This PR adds a new field in the SSMRun audit event: invocation url.

EC2 Auto Discover uses SSM to install teleport in the target instance.
An invocation is the execution of a Command in an Instance.
This URL points to that invocation and users can more easily debug what
went wrong and how they can fix in case of a failure.

* EC2 Auto Discover with SSM: add script stdout and stderr to audit log (#41479)

This PRs fills in the stdout and stderr fields of the SSMRun audit
event.
The script to install teleport in ec2 instances has two steps: download
and run shell script.

This will help diagnose what failed during the auto discover of ec2
instances.

* Fix EC2 Auto Discover SSM failure when sending an extra param (#41532)

For agentless installations we would send an extra param to the
ssm:SendCommand API.
Customers can create and use custom SSM Documents, however, when using
the default one, that parameter does not exist.
The ssm:SendCommand API returns an error if an extra param is sent.

This PR does a best-effort to accomodate for that: if a known error is
returned and the known extra param was sent, remove it and try again.

* EC2 Auto Discover with SSM: add invocation url to audit log (#41689)

This PR adds the invocation URL into the audit log when running the
teleport installer script during EC2 Auto Discover.

* EC2 Auto Discover SSM: add support for debugging custom SSM Docs (#41706)

This PR uses a new AWS API that list the steps of the current
invocation.
After listing them, it will ask for the output of each one.

Previously, we were using a static list of steps: those defined in the
default SSM Document.

However, for custom documts with different list of steps that would
fail.

If the client does not have access to this new API, we will fallback to
the list of steps that exist in the default SSM Document.

If we ask for a status of one of those steps, and we receive a known
error indicating that the step does not exist, instead of failing we
will emit the overall invocation result (which doesnt include
stdout/stderr, but better than nothing)
  • Loading branch information
marcoandredinis authored May 22, 2024
1 parent a27a4ab commit f68078d
Show file tree
Hide file tree
Showing 11 changed files with 1,732 additions and 944 deletions.
12 changes: 12 additions & 0 deletions api/proto/teleport/legacy/types/events/events.proto
Original file line number Diff line number Diff line change
Expand Up @@ -5201,6 +5201,18 @@ message SSMRun {

// Region is the AWS region the command was ran in.
string Region = 7 [(gogoproto.jsontag) = "region"];

// StandardOutput contains the stdout of the executed command.
// Only the first 24000 chars are returned.
string StandardOutput = 8 [(gogoproto.jsontag) = "stdout"];

// StandardError contains the stderr of the executed command.
// Only the first 24000 chars are returned.
string StandardError = 9 [(gogoproto.jsontag) = "stderr"];

// InvocationURL is a link to AWS Web Console for this invocation.
// An invocation is the execution of a Command in an Instance.
string InvocationURL = 10 [(gogoproto.jsontag) = "invocation_url"];
}

// CassandraSession is emitted when a Cassandra client sends
Expand Down
1,926 changes: 1,034 additions & 892 deletions api/types/events/events.pb.go

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/pages/auto-discovery/servers/ec2-discovery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,9 @@ AWS
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:ListCommandInvocations",
"ssm:SendCommand"
],
"Resource": [
Expand All @@ -183,7 +185,9 @@ AWS
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:ListCommandInvocations",
"ssm:SendCommand"
],
"Resource": [
Expand Down
1 change: 1 addition & 0 deletions lib/cloud/aws/policy_statements.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ func StatementForEC2SSMAutoDiscover() *Statement {
"ec2:DescribeInstances",
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:ListCommandInvocations",
"ssm:SendCommand",
},
Resources: allResources,
Expand Down
7 changes: 7 additions & 0 deletions lib/cloud/aws/ssm_documents.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,10 @@ mainSteps:
}

const EC2DiscoveryPolicyName = "TeleportEC2Discovery"

// EC2DiscoverySSMDocumentSteps is the list of Steps defined in the default SSM Document for Teleport Discovery.
// Used to query step results after executing a command using SSM.
var EC2DiscoverySSMDocumentSteps = []string{
"downloadContent",
"runShellScript",
}
8 changes: 6 additions & 2 deletions lib/configurators/aws/aws_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,9 @@ func TestAWSIAMDocuments(t *testing.T) {
"ec2:DescribeInstances",
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:SendCommand"},
"ssm:ListCommandInvocations",
"ssm:SendCommand",
},
Resources: []string{"*"},
},
},
Expand All @@ -595,7 +597,9 @@ func TestAWSIAMDocuments(t *testing.T) {
"ec2:DescribeInstances",
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:SendCommand"},
"ssm:ListCommandInvocations",
"ssm:SendCommand",
},
Resources: []string{"*"},
},
},
Expand Down
6 changes: 5 additions & 1 deletion lib/integrations/awsoidc/ec2_ssm_iam_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,14 +124,18 @@ func NewEC2SSMConfigureClient(ctx context.Context, region string) (EC2SSMConfigu
}

// ConfigureEC2SSM creates the required resources in AWS to enable EC2 Auto Discover using script mode..
// It creates an embedded policy with the following permissions:
// It creates an inline policy with the following permissions:
//
// Action: List EC2 instances where teleport is going to be installed.
// - ec2:DescribeInstances
//
// Action: Get SSM Agent Status
// - ssm:DescribeInstanceInformation
//
// Action: Run a command and get its output.
// - ssm:SendCommand
// - ssm:GetCommandInvocation
// - ssm:ListCommandInvocations
//
// Besides setting up the required IAM policies, this method also adds the SSM Document.
// This SSM Document downloads and runs the Teleport Installer Script, which installs teleport in the target EC2 instance.
Expand Down
7 changes: 6 additions & 1 deletion lib/srv/discovery/discovery.go
Original file line number Diff line number Diff line change
Expand Up @@ -461,9 +461,14 @@ func (s *Server) initAWSWatchers(matchers []types.AWSMatcher) error {
s.caRotationCh = make(chan []types.Server)

if s.ec2Installer == nil {
s.ec2Installer = server.NewSSMInstaller(server.SSMInstallerConfig{
ec2installer, err := server.NewSSMInstaller(server.SSMInstallerConfig{
Emitter: s.Emitter,
})
if err != nil {
return trace.Wrap(err)
}

s.ec2Installer = ec2installer
}

lr, err := newLabelReconciler(&labelReconcilerConfig{
Expand Down
3 changes: 3 additions & 0 deletions lib/srv/server/ec2_watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,9 @@ const (
)

// awsEC2APIChunkSize is the max number of instances SSM will send commands to at a time
// This is used for limiting the number of instances for API Calls:
// ssm:SendCommand only accepts between 0 and 50.
// ssm:DescribeInstanceInformation only accepts between 5 and 50.
const awsEC2APIChunkSize = 50

func newEC2InstanceFetcher(cfg ec2FetcherConfig) *ec2InstanceFetcher {
Expand Down
Loading

0 comments on commit f68078d

Please sign in to comment.