Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure AKS actions display 'No AKS Cluster found' or 'No virtual machines' #117

Open
nikola1011 opened this issue Jun 23, 2020 · 4 comments

Comments

@nikola1011
Copy link

Hello,

I am trying to perform a chaostoolkit experiment on a Azure Kubernetes Service, but it seems that my chaostoolkit-azure extension does not see my running cluster. Credentials used to connect to Azure are generated with az ad sp create-for-rbac --sdk-auth > credentials.json command (as specified in documentation).
Cluster is running and available (obvious from Azure portal).

Attached are two experiment files, one using credentials file specified by AZURE_AUTH_LOCATION env, and the other one with credentials (secrets and configuration) placed directly inside the experiment file (real values replaced with 'xxx').
env-experiment.txt
secrets-experiment.txt

Experiment files both generate the same output (therefore I don't think it's the credentials problem):

[2020-06-23 14:37:19 INFO] Validating the experiment's syntax
[2020-06-23 14:37:19 INFO] Experiment looks valid
[2020-06-23 14:37:19 INFO] Running experiment: ...
[2020-06-23 14:37:19 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:37:19 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:37:19 INFO] Steady state hypothesis is met!
[2020-06-23 14:37:19 INFO] Action: restart-aks-node-at-random
[2020-06-23 14:37:21 WARNING] No virtual machines found
[2020-06-23 14:37:21 ERROR]   => failed: chaoslib.exceptions.ActivityFailed: No virtual machines found
[2020-06-23 14:37:21 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:37:21 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:37:21 INFO] Steady state hypothesis is met!
[2020-06-23 14:37:21 INFO] Let's rollback...
[2020-06-23 14:37:21 INFO] No declared rollbacks, let's move on.
[2020-06-23 14:37:21 INFO] Experiment ended with status: completed

Even if I add the 'filter' parameter "filter": "where resourceGroup=='myResourceGroup' and name=='myFlaskCluster'" to the 'restart_node' function the output simply changes the error message (from No virtual machines found to No AKS clusters found):

[2020-06-23 14:51:21 INFO] Validating the experiment's syntax
[2020-06-23 14:51:21 INFO] Experiment looks valid
[2020-06-23 14:51:21 INFO] Running experiment: ...
[2020-06-23 14:51:21 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:51:21 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:51:21 INFO] Steady state hypothesis is met!
[2020-06-23 14:51:21 INFO] Action: restart-aks-node-at-random
[2020-06-23 14:51:23 WARNING] No AKS clusters found
[2020-06-23 14:51:23 ERROR]   => failed: chaoslib.exceptions.ActivityFailed: No AKS clusters found
[2020-06-23 14:51:23 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:51:23 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:51:23 INFO] Steady state hypothesis is met!
[2020-06-23 14:51:23 INFO] Let's rollback...
[2020-06-23 14:51:23 INFO] No declared rollbacks, let's move on.
[2020-06-23 14:51:23 INFO] Experiment ended with status: completed

Would please be kind to check if I am missing something obvious or if this is an issue ?

@nikola1011
Copy link
Author

I have checked chaostoolkit.log file and verified that configuration parameters (azure_subscription_id and filter parameter) are correctly passed to the restart_node function and thus to fetch_resources function.

@nikola1011
Copy link
Author

Finally, these are the versions that I am using (latest releases, if I am not mistaken):

NAME                VERSION   
CLI                 1.4.2     
Core library        1.10.0  
NAME                                    VERSION   LICENSE                       DESCRIPTION                                       
chaostoolkit-azure                      0.8.3     Apache License Version 2.0    Microsoft Azure                                   
chaostoolkit-kubernetes                 0.22.0    Apache License Version 2.0    Kubernetes   

@PranayWankhede
Copy link

PranayWankhede commented Nov 4, 2020

@nikola1011 @HemantAHK @buderre @xpdable is there any update on this ? I am also facing similar issue. thanks!

@nikola1011
Copy link
Author

@PranayWankhede unfortunately no. I haven't been able to solve it, thus moved my development to a local cluster only.
Note that virtual machines are visible to Chaostoolkit, but the actual AKS Cluster is not. Maybe there is a workaround there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants