Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

系统任务执行状态自监控能力支持 #7457

Open
normal-wls opened this issue May 15, 2024 · 3 comments · May be fixed by #7516
Open

系统任务执行状态自监控能力支持 #7457

normal-wls opened this issue May 15, 2024 · 3 comments · May be fixed by #7516
Assignees
Labels

Comments

@normal-wls
Copy link
Member

  1. 梳理需要监控的指标
  2. 指标数据获取方案(获取方案,性能评估)
  3. 开发方案(开发结构设计、sops/engine)
  4. 产品前端支持(需要先沟通)
@lTimej
Copy link
Collaborator

lTimej commented May 20, 2024

监控指标:
    1.执行中失败的任务
    2.执行中的任务
    3.schedule轮训次数较多的数据

@lTimej
Copy link
Collaborator

lTimej commented May 31, 2024

执行失败任务sql

EXPLAIN SELECT
	tt.id,cp.name, pp.name
FROM
	`taskflow3_taskflowinstance` AS tt,
	`core_project` AS cp,
	`pipeline_pipelineinstance` AS pp,
	`eri_state` AS es
WHERE
		pp.instance_id = es.root_id 
		AND tt.pipeline_instance_id=pp.id
		AND tt.project_id=cp.id
		AND pp.is_deleted = 0 
		AND pp.is_expired = 0 
		AND pp.is_finished = 0 
		AND pp.is_revoked = 0 
		AND pp.is_started = 1 
		AND es.NAME = "FAILED" 
ORDER BY
pp.id DESC

执行失败任务ORM

        sql = """
            EXPLAIN SELECT
	tt.id,cp.name, pp.name
FROM
	`taskflow3_taskflowinstance` AS tt,
	`core_project` AS cp,
	`pipeline_pipelineinstance` AS pp,
	`eri_state` AS es
WHERE
		pp.instance_id = es.root_id 
		AND tt.pipeline_instance_id=pp.id
		AND tt.project_id=cp.id
		AND pp.is_deleted = 0 
		AND pp.is_expired = 0 
		AND pp.is_finished = 0 
		AND pp.is_revoked = 0 
		AND pp.is_started = 1 
		AND es.NAME = "FAILED" 
ORDER BY
pp.id DESC
        """
        with connection.cursor() as cursor:
            cursor.execute(sql)
            results = cursor.fetchall()
            print(results)

执行中任务sql

explain SELECT
	es.root_id,
	COUNT( es.root_id ) AS total_count,
	SUM( CASE WHEN es.NAME = 'Failed' THEN 1 ELSE 0 END ) AS failed_count,
	SUM( CASE WHEN es.NAME = 'FINISHED' THEN 1 ELSE 0 END ) AS finished_count 
FROM
	pipeline_pipelineinstance AS pp
	INNER JOIN eri_state AS es ON pp.instance_id = es.root_id 
WHERE
	pp.is_started = 1 
	AND pp.is_finished = 0 
	AND pp.is_revoked = 0 
	AND pp.is_expired = 0 
	AND is_deleted = 0 
GROUP BY
	es.root_id
HAVING
	failed_count = 0 
	AND finished_count != total_count 
ORDER BY
	pp.id
limit 1000

执行中任务ORM

sql = """
            SELECT
                es.root_id,
                COUNT( es.root_id ) AS total_count,
                SUM( CASE WHEN es.NAME = 'Failed' THEN 1 ELSE 0 END ) AS failed_count,
                SUM( CASE WHEN es.NAME = 'FINISHED' THEN 1 ELSE 0 END ) AS finished_count 
            FROM
                pipeline_pipelineinstance AS pp
                INNER JOIN eri_state AS es ON pp.instance_id = es.root_id 
            WHERE
                pp.is_started = 1 
                AND pp.is_finished = 0 
                AND pp.is_revoked = 0 
                AND pp.is_expired = 0 
                AND is_deleted = 0 
            GROUP BY
                es.root_id
            HAVING
                failed_count = 0 
                AND finished_count != total_count 
            ORDER BY
                pp.id
            limit 1000
        """
        with connection.cursor() as cursor:
            cursor.execute(sql)
            results = cursor.fetchall()
            print(results)

schedule轮训次数

EXPLAIN SELECT
                pp.creator,
                pp.id,
                esc.schedule_times 
            FROM
                eri_schedule AS esc,
                eri_state AS es,
                pipeline_pipelineinstance AS pp 
            WHERE
                esc.node_id = es.node_id 
                AND es.root_id = pp.instance_id 
                AND esc.scheduling = 0 
            ORDER BY
                esc.schedule_times DESC
            LIMIT 1000;
sql = """
            EXPLAIN SELECT
                pp.creator,
                pp.id,
                esc.schedule_times 
            FROM
                eri_schedule AS esc,
                eri_state AS es,
                pipeline_pipelineinstance AS pp 
            WHERE
                esc.node_id = es.node_id 
                AND es.root_id = pp.instance_id 
                AND esc.scheduling = 0 
            ORDER BY
                esc.schedule_times DESC
            LIMIT 1000;
        """
        with connection.cursor() as cursor:
            cursor.execute(sql)
            results = cursor.fetchall()
            print(results)

@normal-wls
Copy link
Member Author

补充:

  1. sql 性能优化:可以考虑拆分查询,减少联表
  2. mq 数据获取:通过 client 获取 mq 数据,并提供查询接口
  3. 聚合数据接口:某段时间(1天/3天)正在执行/失败任务数聚合

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants