Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect pprof data from data-collection and gateway components when diagnose #2114

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
14dcfbe
feat: add OdigletAppLabelKey const to Odiglet
AvihuHenya Dec 29, 2024
2e3cade
feat: expose pprof port on all network interfaces
AvihuHenya Dec 31, 2024
55de5ef
feat: expose pprof on all network interfaces for collectors
AvihuHenya Dec 31, 2024
d1e25d1
feat: auto generate data-collection and gateway pprof data in diagnos…
AvihuHenya Dec 31, 2024
82f3642
fix api package version
AvihuHenya Jan 1, 2025
a897650
PR fixes
AvihuHenya Jan 1, 2025
f4e2388
remove comment
AvihuHenya Jan 1, 2025
a81f777
feat: remove unused struct declaration
AvihuHenya Jan 5, 2025
7dc0385
feat: improve documentation
AvihuHenya Jan 6, 2025
594bf91
feat: update test to expect pprof endpoint be at port 1777 in host ne…
AvihuHenya Jan 6, 2025
9c5fc82
feat: fix test
AvihuHenya Jan 6, 2025
57a279a
feat: support variable formatting in log message
AvihuHenya Jan 6, 2025
bf19dbd
Merge branch 'main' into diagnose-collector
AvihuHenya Jan 6, 2025
36fdbd0
feat: space fix in tests
AvihuHenya Jan 6, 2025
761395a
Merge branch 'diagnose-collector' of github.com:AvihuHenya/odigos int…
AvihuHenya Jan 6, 2025
4f7fa7e
feat: remove connector field
AvihuHenya Jan 6, 2025
6d1adc1
Merge remote-tracking branch 'upstream/main' into diagnose-collector
AvihuHenya Jan 6, 2025
412fcf4
Merge remote-tracking branch 'upstream/main' into diagnose-collector
AvihuHenya Jan 7, 2025
1567add
fix: add envronment formating in log
AvihuHenya Jan 7, 2025
8e0f82f
Merge branch 'main' into diagnose-collector
tamirdavid1 Jan 7, 2025
52240b8
Merge branch 'main' into diagnose-collector
tamirdavid1 Jan 7, 2025
bb6c064
Merge branch 'main' into diagnose-collector
tamirdavid1 Jan 7, 2025
134f2a6
Merge branch 'main' into diagnose-collector
tamirdavid1 Jan 8, 2025
1b60222
Merge branch 'main' into diagnose-collector
tamirdavid1 Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,16 @@
"cwd": "${workspaceFolder}/cli",
"args": ["uninstall", "--yes"],
"buildFlags": "-tags=embed_manifests"
},
{
"name": "cli diagnose",
"type": "go",
"request": "launch",
"mode": "debug",
"program": "${workspaceFolder}/cli",
"args": ["diagnose"],
"cwd": "${workspaceFolder}/cli",
"buildFlags": "-tags=embed_manifests"
}
]
}
Expand Down
4 changes: 3 additions & 1 deletion autoscaler/controllers/datacollection/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,9 @@ func calculateConfigMapData(nodeCG *odigosv1.CollectorsGroup, sources *odigosv1.
"health_check": config.GenericMap{
"endpoint": "0.0.0.0:13133",
},
"pprof": config.GenericMap{},
"pprof": config.GenericMap{
"endpoint": "0.0.0.0:1777",
AvihuHenya marked this conversation as resolved.
Show resolved Hide resolved
},
},
Service: config.Service{
Pipelines: map[string]config.Pipeline{
Expand Down
135 changes: 89 additions & 46 deletions cli/cmd/diagnose_util/profiling_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,49 @@ import (
"bytes"
"context"
"fmt"
"github.com/odigos-io/odigos/cli/cmd/resources"
"github.com/odigos-io/odigos/cli/pkg/kube"
"github.com/odigos-io/odigos/k8sutils/pkg/consts"
"io"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"os"
"path/filepath"
"strconv"
"sync"

"github.com/odigos-io/odigos/cli/cmd/resources"
"github.com/odigos-io/odigos/cli/pkg/kube"
"github.com/odigos-io/odigos/k8sutils/pkg/consts"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
)

var ProfilingMetricsFunctions = []ProfileInterface{CPUProfiler{}, HeapProfiler{}, GoRoutineProfiler{}, AllocsProfiler{}}

type ProfilingPodConfig struct {
Port int32
Selector labels.Selector
}

// servicesProfilingMetadata is a map that associates service names with their corresponding pprof endpoint ports and selectors.
// To add new Odigos services to be profiled, include the service name along with the appropriate port and selectors in this map.
// Note: Since HostNetwork is set to true in DaemonSet services, pods expose ports on the node itself.
// Therefore, the same port cannot be used for multiple services on the same node.
var servicesProfilingMetadata = map[string]ProfilingPodConfig{
"odiglet": {
Port: consts.OdigletPprofEndpointPort,
Selector: labels.Set{
"app.kubernetes.io/name": resources.OdigletAppLabelValue}.AsSelector(),
},
"data-collection": {
Port: consts.CollectorsPprofEndpointPort,
Selector: labels.Set{
consts.OdigosCollectorRoleLabel: string(consts.CollectorsRoleNodeCollector)}.AsSelector(),
},
"gateway": {
Port: consts.CollectorsPprofEndpointPort,
Selector: labels.Set{
consts.OdigosCollectorRoleLabel: string(consts.CollectorsRoleClusterGateway)}.AsSelector(),
},
}

type ProfileInterface interface {
GetFileName() string
GetUrlSuffix() string
Expand Down Expand Up @@ -63,61 +93,74 @@ func (h AllocsProfiler) GetUrlSuffix() string {
}

func FetchOdigosProfiles(ctx context.Context, client *kube.Client, profileDir string) error {

odigosNamespace, err := resources.GetOdigosNamespace(client, ctx)
if err != nil {
return nil
}

odigletPods, err := client.CoreV1().Pods(odigosNamespace).List(ctx, metav1.ListOptions{
LabelSelector: "app.kubernetes.io/name=odiglet",
})
if err != nil {
return err
}

var wg sync.WaitGroup

for _, odigletPod := range odigletPods.Items {
fmt.Printf("Fetching profile for node: %v", odigletPod.Spec.NodeName)
nodeFilePath := filepath.Join(profileDir, odigletPod.Spec.NodeName)
err := os.Mkdir(nodeFilePath, os.ModePerm)
var podsWaitGroup sync.WaitGroup
for _, service := range servicesProfilingMetadata {
selector := service.Selector
podsToProfile, err := client.CoreV1().Pods(odigosNamespace).List(ctx, metav1.ListOptions{
LabelSelector: selector.String(),
})
if err != nil {
fmt.Printf("Error creating directory for node: %v, because: %v", nodeFilePath, err)
continue
return err
}

for _, profileMetricFunction := range ProfilingMetricsFunctions {
metricFilePath := filepath.Join(nodeFilePath, profileMetricFunction.GetFileName())
metricFile, err := os.OpenFile(metricFilePath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0666)
if err != nil {
fmt.Printf("Error creating file: %v, because: %v", metricFilePath, err)
continue
}
defer metricFile.Close()

wg.Add(1)

go func() {
defer wg.Done()
podName := odigletPod.Name
err = captureProfile(ctx, client, podName, odigosNamespace, metricFile, profileMetricFunction)
for _, pod := range podsToProfile.Items {
fmt.Printf("Fetching profile for Pod: %s\n", pod.Name)
podsWaitGroup.Add(1)
go func(pod v1.Pod, pprofPort int32) {
defer podsWaitGroup.Done()

directoryName := fmt.Sprintf("%s-%s", pod.Name, pod.Spec.NodeName)
nodeFilePath := filepath.Join(profileDir, directoryName)
err := os.Mkdir(nodeFilePath, os.ModePerm)
if err != nil {
fmt.Printf("Error Getting Profile Data of: %v, because: %v\n", profileMetricFunction, err)
fmt.Printf("Error creating directory for node: %v, because: %v", nodeFilePath, err)
return
}
}()
// Inner WaitGroup for profiling functions of this pod
var profileWaitGroup sync.WaitGroup
for _, profileMetricFunction := range ProfilingMetricsFunctions {
profileMetricFunction := profileMetricFunction // Capture range variable
metricFilePath := filepath.Join(nodeFilePath, profileMetricFunction.GetFileName())

profileWaitGroup.Add(1)

go func(metricFilePath string, profileFunc ProfileInterface) {
defer profileWaitGroup.Done()
metricFile, err := os.OpenFile(metricFilePath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0666)
if err != nil {
fmt.Printf("Error creating file: %v, because: %v\n", metricFilePath, err)
return
}
defer metricFile.Close()

err = captureProfile(ctx, client, pod.Name, pprofPort, odigosNamespace, metricFile, profileFunc)
if err != nil {
fmt.Printf(
"Failed to capture profile data for Pod: %s, Node: %s, Profile Type: %s. Reason: %v\n",
pod.Name,
pod.Spec.NodeName,
profileFunc.GetFileName(),
err,
)
}
}(metricFilePath, profileMetricFunction)
}
// Wait for all profiling tasks of pod to complete
profileWaitGroup.Wait()
}(pod, service.Port)
}

wg.Wait()
}

// Wait for all pod-level tasks to complete
podsWaitGroup.Wait()
return nil
}

func captureProfile(ctx context.Context, client *kube.Client, podName string, namespace string, metricFile *os.File, profileInterface ProfileInterface) error {
proxyURL := fmt.Sprintf("/api/v1/namespaces/%s/pods/%s:6060/proxy/debug/pprof%s", namespace, podName, profileInterface.GetUrlSuffix())
func captureProfile(ctx context.Context, client *kube.Client, podName string, pprofPort int32, namespace string, metricFile *os.File, profileInterface ProfileInterface) error {
proxyURL := fmt.Sprintf("/api/v1/namespaces/%s/pods/%s:%d/proxy/debug/pprof%s", namespace, podName, pprofPort, profileInterface.GetUrlSuffix())

// Make the HTTP GET request via the API server proxy
request := client.Clientset.CoreV1().RESTClient().
Get().
AbsPath(proxyURL).
Expand Down Expand Up @@ -178,7 +221,7 @@ func collectMetrics(ctx context.Context, client *kube.Client, odigosNamespace st
var wg sync.WaitGroup

for _, collectorPod := range collectorPods.Items {
fmt.Printf("Fetching metrics for pod: %v", collectorPod.Name)
fmt.Println("Fetching metrics for pod:", collectorPod.Name)
metricFilePath := filepath.Join(metricsDir, collectorPod.Name)
metricFile, err := os.OpenFile(metricFilePath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0666)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion cli/cmd/resources/namespace.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ func GetOdigosNamespace(client *kube.Client, ctx context.Context) (string, error

func IsErrNoOdigosNamespaceFound(err error) bool {
return errors.Is(err, errNoOdigosNamespaceFound)
}
}
4 changes: 3 additions & 1 deletion common/config/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,9 @@ func getBasicConfig(memoryLimiterConfig GenericMap) (*Config, []string) {
"health_check": GenericMap{
"endpoint": "0.0.0.0:13133",
},
"pprof": GenericMap{},
"pprof": GenericMap{
"endpoint": "0.0.0.0:1777",
},
},
Exporters: map[string]interface{}{},
Connectors: map[string]interface{}{},
Expand Down
3 changes: 2 additions & 1 deletion common/config/testdata/debugexporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ processors:
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof: {}
pprof:
endpoint: 0.0.0.0:1777
service:
extensions:
- health_check
Expand Down
3 changes: 2 additions & 1 deletion common/config/testdata/minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ processors:
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof: {}
pprof:
endpoint: 0.0.0.0:1777
service:
extensions:
- health_check
Expand Down
5 changes: 5 additions & 0 deletions k8sutils/pkg/consts/consts.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ const (
CollectorsRoleNodeCollector CollectorRole = "NODE_COLLECTOR"
)

const (
OdigletPprofEndpointPort int32 = 6060
CollectorsPprofEndpointPort int32 = 1777
)

const (
// OdigosInjectInstrumentationLabel is the label used to enable the mutating webhook.
OdigosInjectInstrumentationLabel = "odigos.io/inject-instrumentation"
Expand Down
Loading