Persist and Serve TaskRun Logs #198

adambkaplan · 2022-06-17T15:01:00Z

Feature request

Enhance Results to do the following:

Store TaskRun step logs.
Provide an API where logs for each TaskRun step can be served.
Use the existing Results RBAC controls to ensure logs are only served to those who have been granted the right permissions.

Use case

CI/CD users expect to view the full logs of any given step in a build/pipeline process.
This is primarily driven by two use cases:

Debugging a failed build/pipeline.
Auditing a build/pipeline process.

This is done in Tekton today by serving the underlying container logs from a TaskRun pod. These are stored on the host node and can be lost due to TaskRun pruning, cluster maintenance, or other mechanisms that delete the underlying pod.
For auditing purposes, build logs may need to be retained for the life a particular version of software is supported.

The most common means of persisting Kubernetes logs today is with log forwarding tools like fluentd and analysis engines like ElasticSearch, Amazon CloudWatch, and Grafana Loki.
These stacks are optimized to stream logs across systems for analysis in real time (this is a good thing!).
They are not built to retain and serve individual log files.

This feature request proposes that the Results watcher and apiserver be extended to store logs for TaskRun steps.
These logs can then be fetched by the apiserver from an API endpoint.

The text was updated successfully, but these errors were encountered:

adambkaplan · 2022-06-17T15:03:22Z

Credit to @CathalOConnorRH who did a lot of research on our end with fluentd and Loki that led to this feature request.

cc @wlynch - this follows up the "Logs with Tekton Results" item discussed during the Tekton Community Summit.

khrm · 2022-06-22T11:50:49Z

The most common means of persisting Kubernetes logs today is with log forwarding tools like fluentd and analysis engines like ElasticSearch, Amazon CloudWatch, and Grafana Loki. These stacks are optimized to stream logs across systems for analysis in real time (this is a good thing!). They are not built to retain and serve individual log files.

ELK stack is optimized for storing logs longtime. We are using it in Openshift Logging. At present, we can view pipelineruns and taskruns logs in Openshift Logging. There are some issues with this UX.

This feature request proposes that the Results watcher and apiserver be extended to store logs for TaskRun steps. These logs can then be fetched by the apiserver from an API endpoint.

The problem with storing logs in Postgres/MySQL is that they aren't build for this.

I will start working on this problem next week in two phases. We had a discussion on this in slack but unfortunately, it got lost.

First phase: A kube rest api service/proxy which gives us data from Tekton results.
Second phase: Design a plugin architecture in this service for fetching logs from various sources like ELK, Splunk, etc.

vdemeester · 2022-06-22T12:34:41Z

@khrm we discussed this a bit offline as well. There is two things that could be done as part of tektoncd/results, independent of ELK or anything:

add something to the results api to fetch the logs, so that we don't need another API for the logs
where we store the logs — this could be pluggable, and we could think of different "storage" (standard file, elk, …)

khrm · 2022-06-22T12:39:32Z

Yes, that's what I am planning to do after adding a proxy service.

sayan-biswas · 2022-06-23T09:57:51Z

@khrm I have added a REST proxy for the existing GRPC server, as part of some changes required to work with KCP.

https://github.com/sayan-biswas/tekton-results/blob/33d111248f3c6f7400a001030d7c3170d8aef174/cmd/api/main.go#L153

This branch has the proxy changes without the KCP changes. If you are thinking of implementing something like this, then I can create a PR next week to merge this.

jb-2020 · 2022-06-23T15:32:08Z

One note, the dashboard team has a minio walkthrough for log persistence. I only bring this up as with this change in context of #82 [integrations] Tekton Dashboard - would be very nice if this change to results is also usable in the dashboard.

adambkaplan · 2022-07-05T23:30:51Z

I have submitted #203 as an initial proof of concept. There is a lot here - @khrm @vdemeester do you think this warrants a TEP?

daniel-maganto · 2022-07-07T12:49:55Z

From Allianz Direct we have created a solution to get logs from S3 and show them in Tekton Dashboard for long-term logs when you need to delete a task in your cluster.
https://github.com/allianz-direct/tekton-s3-log-reader

adambkaplan · 2022-07-12T18:05:57Z

@afrittoli also pointed out that Tekton's dogfooding CI manually forwards logs to GCS with Tekton Tasks. It looks like we have minimally the following use cases:

No log forwarding or retrieval (configure watcher and/or apiserver to forward logs)
Retrieve logs through results API, log forwarding done externally (example - Fluentd to Elasticsearch, Loki, etc.). "Driver" understands how to retrieve logs from log aggregator.
Retrieve logs through results API, log forwarding done by watchers. APIServer supports drivers that retrieve and store logs from:
1. Local disk
2. S3 (AWS or compatible object storage service)
3. GCS
4. Azure, other cloud provider object storage

adambkaplan · 2022-08-22T19:07:02Z

Update: this feature was captured in TEP-0117, which was approved as a provisional proposal.

tekton-robot · 2022-11-20T19:09:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

adambkaplan · 2022-11-23T23:56:43Z

/remove-lifecycle stale

vdemeester · 2023-02-15T15:12:36Z

/area roadmap

adambkaplan · 2023-02-16T04:10:52Z

@tektoncd/results-maintainers I think we can call this "done" and mark TEP-0117 as implemented. Thoughts?

adambkaplan · 2023-04-27T16:40:22Z

/close

This was implemented in v0.5.0

tekton-robot · 2023-04-27T16:40:24Z

@adambkaplan: Closing this issue.

In response to this:

/close

This was implemented in v0.5.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adambkaplan added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 17, 2022

jb-2020 mentioned this issue Jun 21, 2022

[integrations] Tekton Dashboard #82

Open

sayan-biswas mentioned this issue Jun 28, 2022

Added REST proxy over existing RPCs #200

Merged

adambkaplan mentioned this issue Jul 5, 2022

Proof of Concept: Persist and Serve TaskRun Logs #203

Closed

xchapter7x added this to Tekton Community Roadmap Sep 12, 2022

xchapter7x moved this to Todo in Tekton Community Roadmap Sep 12, 2022

AlanGreene mentioned this issue Oct 16, 2022

Log persistence documentation is out of date and configuration does not work with current version of operator tektoncd/dashboard#2526

Closed

tekton-robot added the lifecycle/stale label Nov 20, 2022

tekton-robot removed the lifecycle/stale label Nov 23, 2022

tekton-robot added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Feb 15, 2023

tekton-robot closed this as completed Apr 27, 2023

github-project-automation bot moved this from Todo to Done in Tekton Community Roadmap Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist and Serve TaskRun Logs #198

Persist and Serve TaskRun Logs #198

adambkaplan commented Jun 17, 2022

adambkaplan commented Jun 17, 2022

khrm commented Jun 22, 2022 •

edited

Loading

vdemeester commented Jun 22, 2022

khrm commented Jun 22, 2022 •

edited

Loading

sayan-biswas commented Jun 23, 2022

jb-2020 commented Jun 23, 2022

adambkaplan commented Jul 5, 2022

daniel-maganto commented Jul 7, 2022

adambkaplan commented Jul 12, 2022

adambkaplan commented Aug 22, 2022

tekton-robot commented Nov 20, 2022

adambkaplan commented Nov 23, 2022

vdemeester commented Feb 15, 2023

adambkaplan commented Feb 16, 2023

adambkaplan commented Apr 27, 2023

tekton-robot commented Apr 27, 2023

Persist and Serve TaskRun Logs #198

Persist and Serve TaskRun Logs #198

Comments

adambkaplan commented Jun 17, 2022

Feature request

Use case

adambkaplan commented Jun 17, 2022

khrm commented Jun 22, 2022 • edited Loading

vdemeester commented Jun 22, 2022

khrm commented Jun 22, 2022 • edited Loading

sayan-biswas commented Jun 23, 2022

jb-2020 commented Jun 23, 2022

adambkaplan commented Jul 5, 2022

daniel-maganto commented Jul 7, 2022

adambkaplan commented Jul 12, 2022

adambkaplan commented Aug 22, 2022

tekton-robot commented Nov 20, 2022

adambkaplan commented Nov 23, 2022

vdemeester commented Feb 15, 2023

adambkaplan commented Feb 16, 2023

adambkaplan commented Apr 27, 2023

tekton-robot commented Apr 27, 2023

khrm commented Jun 22, 2022 •

edited

Loading

khrm commented Jun 22, 2022 •

edited

Loading