Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error missing label of app on certain deployment #264

Open
namanjain98 opened this issue Apr 12, 2024 · 20 comments
Open

Error missing label of app on certain deployment #264

namanjain98 opened this issue Apr 12, 2024 · 20 comments

Comments

@namanjain98
Copy link

Describe the bug
i have deployed KRR in my kubernetes cluster
while running a simple test getting error which says the label of app on certain deployment is missing
not able to to find which deployment is that

Attaching the error message

also i am receiving this error
ERROR An unexpected error occurred runner.py:332
Traceback (most recent call last):
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 325, in run
result = await self._collect_result()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 279, in _collect_result
scans = await asyncio.gather([self._gather_object_allocations(k8s_object) for k8s_object in workloads])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 243, in _gather_object_allocations
recommendation = await self._calculate_object_recommendations(k8s_object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/runner.py", line 177, in calculate_object_recommendations
object.pods = await self.k8s_loader.load_pods(object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 543, in load_pods
return await cluster_loader.list_pods(object)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 118, in list_pods
ret: V1PodList = await loop.run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/namanjain/Documents/krr/robusta_krr/core/integrations/kubernetes/init.py", line 120, in
lambda: self.core.list_namespaced_pod(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 15697, in list_namespaced_pod
return self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 15812, in list_namespaced_pod_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/rest.py", line 241, in GET
return self.request("GET", url,
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/lib/python3.11/site-packages/kubernetes/client/rest.py", line 235, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6c010d84-8974-4781-84cb-fd7d95a65e45', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'c6fecb2f-6615-45e0-8e84-073e86bb3e81',
'X-Kubernetes-Pf-Prioritylevel-Uid': '6e55d744-0422-4e52-a4ce-de0d56fbdd33', 'Date': 'Wed, 10 Apr 2024 08:53:18 GMT', 'Content-Length': '465'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to parse requirement: values[0][matchLabels]: Invalid value: "{'app':": a valid label must be an empty string or consist of
alphanumeric characters, '-', '
' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is
'(([A-Za-z0-9][-A-Za-z0-9
.]
)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

As the error shows that there is some isue with the deployment labels but which deployment it is not showing
since i have 100s of deployment running in my cluster i am not able to find the right one can please help checking this

@aantn
Copy link
Contributor

aantn commented Apr 15, 2024

Hey, if you run krr with --verbose does that help figure it out?

@pavangudiwada pavangudiwada changed the title hi @here Error missing label of app on certain deployment Apr 15, 2024
@aantn
Copy link
Contributor

aantn commented Jun 15, 2024

Hey, would it be possible to provide more information (e.g. the app label from the problematic pod) to help us fix this?

Without more information, we're going to close until we can replicate.

@anil-repos
Copy link

getting similar error :

 ERROR    An unexpected error occurred                                                                                  runner.py:332
                    Traceback (most recent call last):
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 325, in run
                        result = await self._collect_result()
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 279, in
                    _collect_result
                        scans = await asyncio.gather(*[self._gather_object_allocations(k8s_object) for k8s_object in workloads])
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 243, in
                    _gather_object_allocations
                        recommendation = await self._calculate_object_recommendations(k8s_object)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\runner.py", line 177, in
                    _calculate_object_recommendations
                        object.pods = await self._k8s_loader.load_pods(object)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 545, in load_pods
                        return await cluster_loader.list_pods(object)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 119, in list_pods
                        ret: V1PodList = await loop.run_in_executor(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Program
                    Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\concurrent\futures\
                    thread.py", line 58, in run
                        result = self.fn(*self.args, **self.kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File "C:\Users\anil.kumar\Desktop\krr\robusta_krr\core\integrations\kubernetes\__init__.py",
                    line 121, in <lambda>
                        lambda: self.core.list_namespaced_pod(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api\core_v1_api.py", line 15697, in list_namespaced_pod
                        return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api\core_v1_api.py", line 15812, in
                    list_namespaced_pod_with_http_info
                        return self.api_client.call_api(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 348, in call_api
                        return self.__call_api(resource_path, method,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 180, in __call_api
                        response_data = self.request(
                                        ^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\api_client.py", line 373, in request
                        return self.rest_client.GET(url,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\rest.py", line 241, in GET
                        return self.request("GET", url,
                               ^^^^^^^^^^^^^^^^^^^^^^^^
                      File
                    "C:\Users\anil.kumar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\loc
                    al-packages\Python311\site-packages\kubernetes\client\rest.py", line 235, in request
                        raise ApiException(http_resp=r)
                    kubernetes.client.exceptions.ApiException: (400)
                    Reason: Bad Request
                    HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c0fbfc29-134c-4255-b585-fa66e718eb2a', 'Cache-Control':
                    'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
                    '8cd1bb38-a9b2-4817-9dde-b2ef4e71ad9e', 'X-Kubernetes-Pf-Prioritylevel-Uid':
                    'e1fbf700-9281-4251-b87c-603e091edec1', 'Date': 'Fri, 14 Jun 2024 07:32:44 GMT', 'Content-Length': '465'})
                    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to
                    parse requirement: values[0][matchLabels]: Invalid value: \"{'app':\": a valid label must be an empty string
                    or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character
                    (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is
                    '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

Getting this error for almost all k8s resource jobs, cronjobs, daemonsets, statefulsets, rollout and deployment.
Checked via one-by-one commenting k8s resources from line 64-70 of this file
..\krr\robusta_krr\core\integrations\kubernetes_init_.py

The label on these resource is as simple as

app: my-application

@aantn
Copy link
Contributor

aantn commented Jun 18, 2024

What is the full CLI command that you ran krr with? Are you passing a label selector?

@anil-repos
Copy link

I am executing this command

python krr.py simple --verbose

similar error with brew.
Also i am not passing any label/selector.

@aantn
Copy link
Contributor

aantn commented Jun 20, 2024

Does this also occur on the branch prometheus-workload-loader? If so, does it occur on that branch if you run krr with --mode prometheus?

@aantn
Copy link
Contributor

aantn commented Jun 20, 2024

And if that still does not solve the problem, please try the branch debug-build-anil-repos and share logs lines starting with Listing pods for namespace=.

We aren't able to replicate ourselves, but with your help I hope that we can get to the bottom of this!

@anil-repos
Copy link

Hi @aantn
Realized, running krr against particular namespace is not throwing any error
I checked on both main and prometheus-workload-loader branch

python krr.py simple --namespace myns

Thanks for your help !

@aantn
Copy link
Contributor

aantn commented Jun 21, 2024

Any chance you can still run debug-build-anil-repos with verbose logging and see if you spot the problem? I assume the bug still exists so it would be great to solve it.

@headyj
Copy link

headyj commented Jun 27, 2024

@aantn I just had the same issue on v1.11.0. I tried to run the following command on both prometheus-workload-loader and debug-build-anil-repo branches and had the same error on both (with Python 3.12.3):

python krr.py simple --logtostderr \
 -f json \
 --history_duration 720 \
 --allow-hpa \
 --verbose
ERROR    An unexpected error occurred                                                         runner.py:349
Traceback (most recent call last):                                                                
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
342, in run                                                                                       
    result = await self._collect_result()                                                         
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                         
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
297, in _collect_result                                                                           
    scans = await asyncio.gather(*[self._gather_object_allocations(k8s_object) for                
k8s_object in workloads])                                                                         
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              
^^^^^^^^^^^^^^^^^^^^^^^^                                                                          
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
236, in _gather_object_allocations                                                                
    recommendation = await self._calculate_object_recommendations(k8s_object)                     
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/runner.py", line              
156, in _calculate_object_recommendations                                                         
    object.pods = await cluster_loader.load_pods(object)                                          
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                          
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/__init__.py", line 160, in load_pods                                       
    return await self._workload_loaders[object.kind].list_pods(object)                            
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                            
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/loaders/base.py", line 79, in list_pods                                    
    ret: V1PodList = await loop.run_in_executor(                                                  
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                  
    File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run                        
    result = self.fn(*self.args, **self.kwargs)                                                   
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                   
    File                                                                                            
"/tmp/github/robusta-dev/krr/robusta_krr/core/integrations/kub              
ernetes/cluster_loader/loaders/base.py", line 81, in <lambda>                                     
    lambda: self.core.list_namespaced_pod(                                                        
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                        
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py"              
, line 15697, in list_namespaced_pod                                                              
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa:                  
E501                                                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py"              
, line 15812, in list_namespaced_pod_with_http_info                                               
    return self.api_client.call_api(                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 348, in call_api                                                                             
    return self.__call_api(resource_path, method,                                                 
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                 
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 180, in __call_api                                                                           
    response_data = self.request(                                                                 
                    ^^^^^^^^^^^^^                                                                 
    File                                                                                            
"/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py",                  
line 373, in request                                                                              
    return self.rest_client.GET(url,                                                              
            ^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
    File "/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/rest.py",                 
line 241, in GET                                                                                  
    return self.request("GET", url,                                                               
            ^^^^^^^^^^^^^^^^^^^^^^^^                                                               
    File "/home/jni/.local/lib/python3.12/site-packages/kubernetes/client/rest.py",                 
line 235, in request                                                                              
    raise ApiException(http_resp=r)                                                               
kubernetes.client.exceptions.ApiException: (400)                                                  
Reason: Bad Request                                                                               
HTTP response headers: HTTPHeaderDict({'Audit-Id':                                                
'16d946e1-6076-4479-b7e5-7e38fb6711bb', 'Cache-Control': 'no-cache, private',                     
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':                             
'3669e6fd-765f-49b2-80b6-c36fbf14ed0b', 'X-Kubernetes-Pf-Prioritylevel-Uid':                      
'3bc3f8ff-716f-4dc2-b998-101d11246242', 'Date': 'Thu, 27 Jun 2024 08:47:19 GMT',                  
'Content-Length': '489'})                                                                         
HTTP response body:                                                                               
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unabl              
e to parse requirement: values[0][matchLabels]: Invalid value:                                    
\"{'app.kubernetes.io/component':\": a valid label must be an empty string or                     
consist of alphanumeric characters, '-', '_' or '.', and must start and end with an               
alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for               
validation is                                                                                     
'(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

@aantn
Copy link
Contributor

aantn commented Jun 27, 2024

Thanks, on the branch debug-build-anil-repos do you have any lines of output starting with Listing pods for namespace.

The contents of that log will help us figure out what the issue is.

@headyj
Copy link

headyj commented Jun 27, 2024

Yep, I do (sorry I had to anonymise most of the names :-( ):

[14:55:09] INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=namespace-1-pod-1                                                     __init__.py:118
[14:55:10] DEBUG    Gathering PercentileCPULoader metric for StatefulSet namespace-1/namespace-1-pod-1/my-helm-pod-1                                                                                        prometheus_metrics_service.py:191
[14:55:13] INFO     Listing pods for namespace=namespace-2 and label_selector=batch.kubernetes.io/controller-uid=b77960f2-6231-41c6-a538-7cb0ce0ae219                                                                                                               __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-2/namespace-2-pod-3-1704989752/pod-3                                                                                                                     prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-3 and label_selector=batch.kubernetes.io/controller-uid=6a2b4574-60c2-457d-9c46-3d44c5a89073                                                                                                    __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-3/namespace-3-company-update/update                                                                                                                     prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-4 and label_selector=batch.kubernetes.io/controller-uid=4667ac78-0548-4ff7-b16d-f07907a2c046                                                                                                   __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Job namespace-4/namespace-4-pod-3-1705047673/pod-3                                                                                             prometheus_metrics_service.py:191
[14:56:26] INFO     Listing pods for namespace=namespace-5 and label_selector=app.kubernetes.io/component=backend,app.kubernetes.io/instance=namespace-5,app.kubernetes.io/name=company,name=namespace-5-backend                                                  __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-5/namespace-5-backend/backend                                                                                                                               prometheus_metrics_service.py:191
[14:56:27] INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=namespace-1-nginx                                                            __init__.py:118
           INFO     Listing pods for namespace=namespace-1 and label_selector=app.kubernetes.io/instance=namespace-1,app.kubernetes.io/name=my-helm,name=pod-2                                                                          __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-1/namespace-1-nginx/my-helm-nginx                                                                                                       prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-1/pod-2/my-helm-pod-2                                                                                                                 prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-6 and label_selector=app.kubernetes.io/instance=namespace-6,app.kubernetes.io/name=my-helm,name=pod-2                                                                        __init__.py:118
           INFO     Listing pods for namespace=namespace-6 and label_selector=app.kubernetes.io/instance=namespace-6,app.kubernetes.io/name=my-helm,name=namespace-6-nginx                                                         __init__.py:118
[14:56:28] DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-6/namespace-6-nginx/my-helm-nginx                                                                                                     prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-6/pod-2/my-helm-pod-2                                                                                                                prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-7 and label_selector=app.kubernetes.io/component=backend,app.kubernetes.io/instance=namespace-7,app.kubernetes.io/name=company,name=namespace-7-pod-4                                     __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-7/namespace-7-pod-4/pod-4                                                                                                             prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=prometheus and label_selector=app=prometheus,app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=pushprox,component=pushprox,release=prometheus                                                       __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment prometheus/prometheus-pushprox/pushprox                                                                                                                             prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-8 and label_selector=app=namespace-8                                                                                                                                               __init__.py:118
           INFO     Listing pods for namespace=namespace-8 and label_selector=app=curl                                                                                                                                                               __init__.py:118
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-8/namespace-8/namespace-8                                                                                                      prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for Deployment namespace-8/curl/curl                                                                                                                                      prometheus_metrics_service.py:191
           INFO     Listing pods for namespace=namespace-9 and label_selector=matchLabels={'app.kubernetes.io/component': 'backend', 'app.kubernetes.io/instance': 'namespace-9', 'app.kubernetes.io/name': 'company', 'name':                         __init__.py:118
                    'namespace-9-pod-5'}

What I can also tell you, which maybe is related, is that even if the JSON export is working if I select a subset of namespaces, the generated JSON is usually not valid. Somewhere on the JSON (depending on which namespace I execute) I always have this description, which breaks the JSON validity:

[...]
"description": "Simple Strategy\n\nCPU request: 95.0% percentile, limit: unset\nMemory request: max + 15.0%, limit: max + 15.0%\nHistory: 720.0 hours\nStep: 1.25 minutes\n\nAll parameters can be customized. For example: `krr simple --cpu_percentile=90 
--memory_buffer_percentage=15 --history_duration=24 --timeframe_duration=0.5`\n\nLearn more: https://github.com/robusta-dev/krr#algorithm",
  "strategy": {
    "name": "simple",
    "settings": {
      "history_duration": 720.0,
      "timeframe_duration": 1.25,
      "cpu_percentile": 95.0,
      "memory_buffer_percentage": 15.0,
      "points_required": 100,
      "allow_hpa": true,
      "use_oomkill_data": false,
      "oom_memory_buffer_percentage": 25.0
    }
  },
[....]

This is not limited to any of the branch above. All of them have the same issue as far as I tested.

@aantn
Copy link
Contributor

aantn commented Jun 27, 2024

Thank you. Did you include all the matching log lines or only some of them? I am particularly interested in the last log line before the exception.

I am trying to figure out which listing of pods had an invalid app.kubernetes.io/component value and why. From your original log:

\"{'app.kubernetes.io/component':\": a valid label must be an empty string or                     
consist of alphanumeric characters, '-', '_' or '.', and must start and end with an               
alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for               
validation is                                                                                     
'(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"BadRequest","code":400}

The mystery is what was the label value that broke things and how it is possible.

Regarding the JSON export, can you please open a separate ticket?

@headyj
Copy link

headyj commented Jun 27, 2024

Did you include all the matching log lines or only some of them? I am particularly interested in the last log line before the exception.

Actually I missed only one, which is the last one before the exception:

INFO     Listing pods for namespace=my-project and label_selector=matchLabels={'app.kubernetes.io/component': 'backend', 'app.kubernetes.io/instance': 'my-project', 'app.kubernetes.io/name': 'company', 'name':                         __init__.py:118
                    'my-project-gateway'}                                                                                                                                                                                                                               
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-4-lts/core-2/db                                                                                                                                           prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-4-lts/core-3/db                                                                                                                                           prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet prometheus/prometheus-alertmanager/alertmanager                                                                                                                    prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet postgres/postgres-postgresql/postgresql                                                                                                                  prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet db-standalone-4-lts-ci/db-standalone-4-lts-ci/db                                                                                                          prometheus_metrics_service.py:191
           DEBUG    Gathering PercentileCPULoader metric for StatefulSet rabbitmq/rabbitmq-server/rabbitmq                                                                                                                                  prometheus_metrics_service.py:191
           ERROR    An unexpected error occurred

And then comes the exception posted above.

Regarding the JSON export, can you please open a separate ticket?

Yep, I will

@aantn
Copy link
Contributor

aantn commented Jun 27, 2024

Thank you, we are very close to fixing this. I've narrowed it down to the problematic code, but I am still unable to reproduce myself. What is the kind of the Kubernetes workload (e.g. Deployment, StatefulSet) and what are the contents of spec.matchLabels?

@aantn
Copy link
Contributor

aantn commented Jun 27, 2024

Sorry, I mean what are the contents of spec.selector?!

@headyj
Copy link

headyj commented Jun 28, 2024

Actually it's a Rollout (from Argo Rollouts) but it's very close to a Deployment and I don't think it makes a difference. The content of spec.selector is quite standard:

selector:
    matchLabels:
      app.kubernetes.io/component: backend
      app.kubernetes.io/instance: my-project
      app.kubernetes.io/name: company
      name: my-project-gateway

@aantn
Copy link
Contributor

aantn commented Jun 28, 2024

Thanks, that was actually very important information! The kubernetes python client renames matchLabels to Deployment.spec.match_labels but that renaming does not occur for CRDs!

I have created fix here - #308
Can you confirm that it works?

@headyj
Copy link

headyj commented Jul 1, 2024

Yes I can confirm that it is working with Rollouts 👍

@aantn
Copy link
Contributor

aantn commented Jul 1, 2024

Wonderful, thanks for the confirmation. I've merged the changes into the prometheus-workload-loader branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants