Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starter Project #4 Differentiate between misconfiguration and bugs #221

Open
Spedoske opened this issue Jun 9, 2023 · 1 comment
Open

Comments

@Spedoske
Copy link
Collaborator

Spedoske commented Jun 9, 2023

What we met

We found that some test cases generated by Acto may contain misconfiguration. Here is an example of a mutation from state 0 to state 1. In the following example (See CRD Definition), Acto add an override of livenessProbe to the custom resource, which is invalid because rabbitmq will not use the port 8500. Therefore, Kubernetes will constantly kill the pod because the pod cannot pass the liveness check.

There are also many similar cases in the alarm report, such as an invalid image name and a missing field. The issue is intended to solve this problem, or at least mitigate the problem.

What we could do

  • Improve the test cases generated by Acto.
  • Collect events and logs from kubernetes, and classify the alarms.

Improve the test cases generated by Acto

TBD

Collect events (and logs) from kubernetes, and classify the alarms.

The event indicates that the pod has a invalid config and could not be created, which is different from a crash event.
We think such kind of event may indicate a misconfiguration.

  Warning  FailedCreate      50s (x19 over 5m40s)  statefulset-controller  create Pod test-cluster-server-2 in StatefulSet test-cluster-server failed error: Pod "test-cluster-server-2" is invalid: spec.containers[0].image: Required value

CRD Definition

Mutation:

$ diff mutated-0.yaml mutated-1.yaml 
>   override:
>     statefulSet:
>       spec:
>         template:
>           spec:
>             containers:
>             - livenessProbe:
>                 httpGet:
>                   port: 8500
>                 initialDelaySeconds: 10
>               name: b

Use the following custom resource to demonstrate.
State 0:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: test-cluster
  namespace: rabbitmq-system
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: null
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - test-cluster
        topologyKey: kubernetes.io/hostname
  image: null
  imagePullSecrets: null
  persistence:
    storage: 50Gi
  rabbitmq:
    additionalConfig: 'cluster_partition_handling = pause_minority

      vm_memory_high_watermark_paging_ratio = 0.99

      disk_free_limit.relative = 1.0

      collect_statistics_interval = 10000

      '
  replicas: 3
  resources:
    limits:
      cpu: 1
      memory: 4Gi
    requests:
      cpu: 1
      memory: 4Gi
  secretBackend: null
  service:
    type: ClusterIP
  skipPostDeploySteps: false
  terminationGracePeriodSeconds: 1024
  tls:
    caSecretName: null
    disableNonTLSListeners: false
    secretName: null
  tolerations: null

State 1:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: test-cluster
  namespace: rabbitmq-system
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: null
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - test-cluster
        topologyKey: kubernetes.io/hostname
  image: null
  imagePullSecrets: null
  override:
    statefulSet:
      spec:
        template:
          spec:
            containers:
            - livenessProbe:
                httpGet:
                  port: 8500
                initialDelaySeconds: 10
              name: b
  persistence:
    storage: 50Gi
  rabbitmq:
    additionalConfig: 'cluster_partition_handling = pause_minority

      vm_memory_high_watermark_paging_ratio = 0.99

      disk_free_limit.relative = 1.0

      collect_statistics_interval = 10000

      '
  replicas: 3
  resources:
    limits:
      cpu: 1
      memory: 4Gi
    requests:
      cpu: 1
      memory: 4Gi
  secretBackend: null
  service:
    type: ClusterIP
  skipPostDeploySteps: false
  terminationGracePeriodSeconds: 1024
  tls:
    caSecretName: null
    disableNonTLSListeners: false
    secretName: null
  tolerations: null
@Spedoske
Copy link
Collaborator Author

Spedoske commented Jun 9, 2023

To do list as for 6/9/2023

  • Find out the reason why the test cases generated by quantity_increase does not contain field image: ubuntu.
  • Add more information about the failure phase (In preparation / setup phase or in the test case phase)
  • Store container log for debug.

@tianyin tianyin changed the title [WIP] [Starter Project #1] Differentiate between misconfiguration and bugs Starter Project #4 Differentiate between misconfiguration and bugs Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant