Skip to content
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.

Unable to perform SSH cases - HTTP 401 being returned from a wrong region #285

Open
VinnieGogniti opened this issue Dec 13, 2016 · 26 comments

Comments

@VinnieGogniti
Copy link

Hello Everyone,

I've been stuck with this issue for a week now. I've looked around all the threads related to this issue and apparently it's an open issue and there is no definitive solution yet.

The issue is - even though I have the region to use (in my client config) is "us-west-2", my SSH cases are failing with a HTTP 401 at a wrong region.
I scanned through the entire code and replaced all the "us-east-1" references to "us-west-2", but still I'm unable to get around this issue. I believe the code as got to be making an AWS SDK call to fetch the current region via API and somehow getting "us-east-1" retuned and overrides my config.
This has absolutely baffled me for days now.

Please, anyone who had resolved this earlier or can think of a better solution, help me resolve this. Following is the error log. Thank you!

2016-12-13 05:24:05.356 - INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 2 from 2 instances, excluding null
2016-12-13 05:24:07.084 - WARN ChaosInstance - [ChaosInstance.java:105] Error making SSH connection to instance
org.jclouds.rest.AuthorizationException: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 -> HTTP/1.1 401 Unauthorized
.
.
.
.
2016-12-13 05:24:07.089 - WARN ScriptChaosType - [ScriptChaosType.java:61] Strategy disabled because SSH credentials failed
2016-12-13 05:24:07.089 - WARN BasicChaosMonkey - [BasicChaosMonkey.java:124] No chaos type was applicable to the instance: i-009863xxxxxx
2016-12-13 05:24:07.205 - WARN ChaosInstance - [ChaosInstance.java:105] Error making SSH connection to instance
org.jclouds.rest.AuthorizationException: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 -> HTTP/1.1 401 Unauthorized
at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.refineException(ParseAWSErrorFromXmlContent.java:122)

@ebukoski
Copy link
Contributor

ebukoski commented Dec 13, 2016 via email

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 13, 2016 via email

@ebukoski
Copy link
Contributor

ebukoski commented Dec 13, 2016 via email

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 13, 2016

Thanks for responding. I did replace that part (and everywhere else it's hardcoded too), but it still doesn't appear to work. Here is how I have it set in my code.

String defaultRegion = "us-west-2";
Region currentRegion = Regions.getCurrentRegion();

if (currentRegion != null) {
   //  defaultRegion = currentRegion.getName();
   defaultRegion = "us-west-2";
}

region = config.getStrOrElse("simianarmy.client.aws.region", defaultRegion);
GLOBAL_OWNER_TAGKEY = config.getStrOrElse("simianarmy.tags.owner", "owner");

===========================================================================
And of course, I didn't overlook the property that deals with region in client config. It is set for us-west-2 and it does get consumed by chaos monkey. I see that it fetches all the available auto-scaling groups in the west region and gets as far as to picking an instance randomly (in the specified ASG) to SSH into. But that's where it gets thrown a HTTP 401 from the east region, as you can see in the log in my first post.

simianarmy.client.aws.region = us-west-2

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 13, 2016

@ebukoski I somehow feel that this is the particular piece of code that deals with the construction of ec2 client end-point - ec2.us-east-1.amazonaws.com, the one that was thrown in the HTTP 401 error. Please review and let me know your thoughts.
https://github.com/Netflix/SimianArmy/blob/master/src/main/java/com/netflix/simianarmy/client/aws/AWSClient.java#L215

Error log:
org.jclouds.rest.AuthorizationException: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 -> HTTP/1.1 401 Unauthorized
.
.
.
2016-12-13 05:24:07.089 - WARN ScriptChaosType - [ScriptChaosType.java:61] Strategy disabled because SSH credentials failed
2016-12-13 05:24:07.089 - WARN BasicChaosMonkey - [BasicChaosMonkey.java:124] No chaos type was applicable to the instance: i-009863xxxxxx
2016-12-13 05:24:07.205 - WARN ChaosInstance - [ChaosInstance.java:105] Error making SSH connection to instance
org.jclouds.rest.AuthorizationException: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 -> HTTP/1.1 401 Unauthorized

@VinnieGogniti
Copy link
Author

I am wiling to pay a reasonable amount for anyone who can fix this.

@jsuh555
Copy link

jsuh555 commented Dec 14, 2016

I'm having the same problem, but I get
Caused by: org.jclouds.http.HttpResponseException: request: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 [Action=DescribeRegions] failed with response: HTTP/1.1 401 Unauthorized

I'm trying to see if there is something wrong with my IAM user or role permissions.

I wonder if the temporary cred retrieved from the Amazon STS service aren't valid right away and maybe require some time (few seconds?) before they work with the ec2 describe-regions api? Just guessing, I'm not an AWS expert by any means.

@VinnieGogniti
Copy link
Author

If that HTTP 401 is being thrown from a different region than the one in your client config, then it certainly is a bug and it has been open and unaddressed for a very long time.

@Ten48BASE
Copy link
Contributor

Have you two ensured that this property exists in your properties file and is being consumed by Chaos Monkey as Ed suggested:
simianarmy.client.aws.region

Also, check out this Region Detection feature: #233

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 14, 2016

Yes, I can see it consuming the region and detecting all auto-scaling groups available in that region, during startup. It actually gets as far as to picking an instance for executing a termination strategy, in that region.
But that's where it gets thrown a HTTP 401 from a different region (us-east-1). I'm attaching the logs again for your reference.

AWSClient - [AWSClient.java:360] Got 37 auto-scaling groups in region us-west-2.
.
.
.
INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 1 from 2 instances, excluding null
INFO ScriptChaosType - [ScriptChaosType.java:73] Running script for BurnCpu on instance i-0995xxxx
ERROR BasicChaosMonkey - [BasicChaosMonkey.java:201] failed to terminate instance i-0995xxxx
org.jclouds.rest.AuthorizationException: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 -> HTTP/1.1 401 Unauthorized

@Ten48BASE
Copy link
Contributor

Taking a shot in the dark here; if you look at the error it is an authentication error to the AWS API, not an error trying to actually make the SSH connection.

When connecting via SSH, Chaos Monkey sends only the instanceId to the connectSsh method, not the instanceId and region. It may be possible that the Apache Jcloud is querying multiple regions in an effort to locate the region of your instance so that it can query the instance to populate the NodeMetaData. Check this method: https://github.com/Netflix/SimianArmy/blob/master/src/main/java/com/netflix/simianarmy/client/aws/AWSClient.java#L880

Is it possible the IAM credentials your Monkey is using doesn't have read access to the API in the us-east-1 region? Are you restricting the regions to which the Monkey is allowed to query?

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 15, 2016

Not that I'm aware of. I'm able to manually do a "aws ec2 describe-instances --region us-east-1" from the monkey instance on east region without any issues.
Chaos Monkey instance role has full ec2 permissions over all regions and not restricted by any region as far as I can tell.

aws ec2 describe-instances --region us-east-1
Output: (Since nothing is running on east)
{
"Reservations": []
}

Is it possible to restrict Apache Jcloud to query only on the region specified in the AWS Client config, which is us-west-2 in this case?

@jsuh555
Copy link

jsuh555 commented Dec 15, 2016

AWS tech tried to replicate my issue. They were only able to during use of IAM roles, but not when using regular user access key and secret key. He also couldn't see any api request being made, so it appears there is something wrong with the signature used when making the api request for describe-regions.

I should also mention I only get the 401 error when trying to elicit a terminate on demand via http POST

@VinnieGogniti
Copy link
Author

When I use my AWS access and secret keys, it ends up failing at the step in creating SimpleDB domain at the wrong region (us-east-1), again.
It doesn't seem to recognize that I have "us-west-2" region in my client config. Is there any way to make this monkey work at all?

WARN SimpleDBRecorder - [SimpleDBRecorder.java:287] Error while trying to auto-create SimpleDB domain
com.amazonaws.services.simpledb.model.AmazonSimpleDBException: User (arn:aws:iam::xxxxx:user/xxxx) does not have permission to perform (sdb:ListDomains) on resource (arn:aws:sdb:us-east-1:xxxx:domain/). Contact account owner. (Service: AmazonSimpleDB; Status Code: 403; Error Code: AuthorizationFailure;

@jsuh555
Copy link

jsuh555 commented Dec 16, 2016

There is something wrong with your amazon permissions.
I have no problems writing to and reading from simpleDB. I'm in region us-west-2, but this is not specified in my client.properties

Try doing this for your permissions. see attachment
simpleDB_permissions.txt

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 16, 2016

I got the following permissions, which basically has full EC2, ASG and SDB permissions, regardless of the region.
My problem is - it is attempting to create SDB domain on the wrong region than the one specified in my client config, only when I used my AWS access and secret key for permissions.

{
"Statement": [
{
"Sid": "Globals",
"Action": [
"autoscaling:",
"ec2:
",
"elasticloadbalancing:",
"sdb:
",
"ses:SendEmail"
],
"Effect": "Allow",
"Resource": "*"
}
]
}

Error:
User (arn:aws:iam::xxxxx:user/xxxx) does not have permission to perform (sdb:ListDomains) on resource (arn:aws:sdb:**us-east-1:**xxxx:domain/).

@VinnieGogniti
Copy link
Author

VinnieGogniti commented Dec 19, 2016

I ran the build with extended logging enabled and I'm now able to see some new useful stack trace information which wasn't exposed before.
At this point, I'm almost certain that the issue is within the Apache JCloud library, from where the code tries to make an AWS SDK call via the API - with the instance ID and credentials but somehow gets back a wrong region (may be default at the API) and gets thrown a 401 "unauthorized error" for the east region.
May be the Apache JCloud or the AWS/EC2 API needs to be updated, but would that really solve the issue?

Any two useful cents, from anyone? How do I override it in the code to return "us-west-2"?

at org.jclouds.aws.ec2.compute.strategy.AWSEC2ListNodesStrategy.pollRunningInstances(AWSEC2ListNodesStrategy.java:65)
22:02:39.476 [QUIET] [system.out] at org.jclouds.ec2.compute.strategy.EC2ListNodesStrategy.listDetailsOnNodesMatching(EC2ListNodesStrategy.java:107)
22:02:39.476 [QUIET] [system.out] at org.jclouds.ec2.compute.strategy.EC2ListNodesStrategy.listNodes(EC2ListNodesStrategy.java:86)
22:02:39.476 [QUIET] [system.out] at org.jclouds.ec2.compute.strategy.EC2ListNodesStrategy.listNodes(EC2ListNodesStrategy.java:58)
22:02:39.476 [QUIET] [system.out] at org.jclouds.compute.internal.BaseComputeService.listNodes(BaseComputeService.java:335)
22:02:39.477 [QUIET] [system.out] at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsNode(AWSClient.java:906)
22:02:39.477 [QUIET] [system.out] at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:886)
22:02:39.477 [QUIET] [system.out] at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123)
22:02:39.477 [QUIET] [system.out] at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101)
22:02:39.477 [QUIET] [system.out] at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:60)
22:02:39.478 [QUIET] [system.out] at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141)
22:02:39.478 [QUIET] [system.out] at
.
.
.
22:02:39.480 [QUIET] [system.out] Caused by: org.jclouds.http.HttpResponseException: request: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 [Action=DescribeRegions] failed with response: HTTP/1.1 401 Unauthorized

@mlafeldt
Copy link
Contributor

To find the source of this problem, it might also help to use an artifact that is known to work, e.g. this Docker image: https://github.com/mlafeldt/docker-simianarmy

@jsuh555
Copy link

jsuh555 commented Jan 5, 2017

This error (401 unauthorized) only occurs if I use IAM roles, but if I use the normal user access key and secret key, there are NO problems.

I created a basic jcloud project and I got the same issue if I use the access key for the role and the normal user id. Tried with normal user key and listNode() worked.

    ComputeServiceContext jcloudsContext = ContextBuilder//newBuilder("aws-ec2").newBuilder("aws-ec2").credentials("ASdsdsYdsdsdssdQ", "DdBz/PMcpr6Fkmpsdsdsds0Hxje")            .buildView(ComputeServiceContext.class);

    ComputeService client = jcloudsContext.getComputeService();
    Set<? extends ComputeMetadata> x = null;
    try {
        x = client.listNodes();
    }
    catch (Exception e){
        System.out.println("error");
    }

Maybe a bug in jclouds? Maybe a bug in aws sdk?

@VinnieGogniti
Copy link
Author

That's what I think too!

@jsuhhome
Copy link

I don't have much time to look into it further, but I here are two things:

  1. when using a IAM role, simian army needs to pass the access id, secret key and the session token.
  2. From my non-exhaustive look, it appears simian army is doing this correctly. It's just that jclouds isn't sending the session token to amazon.

@darrendao
Copy link

I'm having the same problem and I'm thinking it might has to do with the fact that my chaos monkey is in a private subnet and has to go through a proxy to talk to AWS. For people having problem, is your setup similar?

@jsuhhome
Copy link

jsuhhome commented Feb 8, 2017

I didn't use a proxy. Are you using IAM roles or users? It works for me when using users

@darrendao
Copy link

Overall, there were multiple issues I ran into

  1. The way Chaos Monkey is using JClouds, it doesn't pass in the proxy info. So if Chaos Monkey is running behind a proxy, it will timeout when using JClouds to query AWS for instances to SSH into. I tried updating Chaos Monkey to pass in the proxy info into JClouds but wasn't able to successfully do it.
  2. JClouds doesn't seem to support implicit IAM roles. I have to end up updating client.properties to include the IAM access key and password.
  3. doMonkeyBusiness() method didn't seem to do anything. Same problem here: BasicChaosMonkey.doMonkeyBusiness() method exit without finishing its job #274. Workaround in that thread works for me.

@vermapratyush
Copy link

I am facing the same problem.
From what I was able to debug #274 mentions to exclude the dependency injection library to fix a version mismatch error.
This possibly results in no values being injected from the properties file into org.jclouds library. Hence it defaults to us-east-1

@ksolie
Copy link

ksolie commented Feb 21, 2018

Is there any update on this issue? I am seeing similar failures running in us-east-1 region but I believe my issues are seen because jcloud doesn't use session tokens.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants