[Bug] [admin] When using a highly available K8s cluster, the jobId for the same task is the same every time it is executed. #4089

jiangwwwei · 2024-12-24T07:30:33Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

The logic in Flink's source code: When high availability is used, unless manually configured with PipelineOptionsInternal.PIPELINE_FIXED_JOB_ID, the default is to generate jobId based on the cluster id.

When Dinky submits tasks to Kubernetes, the cluster id is the fixed task name, which results in the same jobId for the same task being executed each time.

When the HistoryServer retrieves results, it does not re-fetch jobIds that already exist, which prevents the retrieval of results for newly submitted tasks.

On the other hand, Dinky's JobRefreshHandler will overwrite the information of tasks that have failed/canceled in the HistoryServer with the same jobId as tasks that are currently running.

What you expected to happen

The jobId should change for each new instance of the submitted task

How to reproduce

Utilize the Flink Kubernetes cluster and configure high availability in the cluster settings, setting jobmanager.archive.fs.dir to the address specified by the HistoryServer.
Launch the HistoryServer to ensure it operates normally.
When submitting tasks to the Kubernetes application, it is observed that the job id for the same task remains fixed upon each submission.

->

The task jobs obtained by the HistoryServer do not update with the completion of each task.
The task statuses in the operations center all change to failed or canceled states in the HistoryServer, triggering alerts (even though the tasks are running normally), and refreshing is ineffective. Disabling the HistoryServer and then refreshing allows the task statuses to return to normal.

Anything else

No response

Version

1.2.0

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

github-actions · 2024-12-24T07:30:51Z

Hello @jiangwwwei, this issue is about K8S, so I assign it to @gaoyan1998 and @zackyoungh. If you have any questions, you can comment and reply.

你好 @jiangwwwei, 这个 issue 是关于 K8S 的，所以我把它分配给了 @gaoyan1998 和 @zackyoungh。如有任何问题，可以评论回复。

github-actions · 2024-12-24T07:30:53Z

Hello @jiangwwwei, this issue is about CDC/CDCSOURCE, so I assign it to @aiwenmo. If you have any questions, you can comment and reply.

你好 @jiangwwwei, 这个 issue 是关于 CDC/CDCSOURCE 的，所以我把它分配给了 @aiwenmo。如有任何问题，可以评论回复。

jiangwwwei added Bug Something isn't working Waiting for reply Waiting for reply labels Dec 24, 2024

github-actions bot assigned gaoyan1998 and zackyoungh Dec 24, 2024

github-actions bot assigned aiwenmo Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [admin] When using a highly available K8s cluster, the jobId for the same task is the same every time it is executed. #4089

[Bug] [admin] When using a highly available K8s cluster, the jobId for the same task is the same every time it is executed. #4089

jiangwwwei commented Dec 24, 2024 •

edited

Loading

github-actions bot commented Dec 24, 2024

github-actions bot commented Dec 24, 2024

[Bug] [admin] When using a highly available K8s cluster, the jobId for the same task is the same every time it is executed. #4089

[Bug] [admin] When using a highly available K8s cluster, the jobId for the same task is the same every time it is executed. #4089

Comments

jiangwwwei commented Dec 24, 2024 • edited Loading

Search before asking

What happened

What you expected to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

github-actions bot commented Dec 24, 2024

github-actions bot commented Dec 24, 2024

jiangwwwei commented Dec 24, 2024 •

edited

Loading