Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ava/advisor): init new advisor pipeline & built-in modules to plugin-based architecture #784

Open
wants to merge 6 commits into
base: refactor-advisor-pipeline
Choose a base branch
from

Conversation

chenluli
Copy link
Member

@chenluli chenluli commented Jun 11, 2024

advisor pipeline 重构为插件式的结构:

  1. Pipeline默认由4个阶段组成(数据处理, 图表类型推荐, 图表视觉通道映射, G2 spec 生成),每个阶段对应一个 Component 来处理
  2. 每个 component 的实际能力由 plugins 实现,通过自定义注册 plugins,实现自定义的对应环节的数据处理或推荐逻辑。每个 component 中的多个 plugins 并行执行,在 afterPluginsExecute hook 中处理多个插件的结果。
  3. 内部原函数改写为内置插件形式
    期望通过这样的处理,可以解耦原 adise 函数,灵活选择性地使用和定制图表推荐的部分环节。该 PR 是第一部分,初始化整体框架,拆分出 encode 和 spec generate 环节见后续 PR: WIP: refactor(ava/advisor): separate visual encodingfrom spec generator module #785
Screenshot 2024-06-17 at 17 27 35

refactor advisor pipeline:

  1. split the default Pipeline into four stage: data analyze, chart type recommend, encode, spec generate. Each stage is processed by a Component
  2. Component could have multiple plugins, these plugins are executed in parallel. The outputs of the plugins could be handled in afterPluginsExecute hook.
  3. The original functions that for processing data, getting chart type spec and etc. are converted to preset plugins.

@chenluli chenluli force-pushed the dev-refactor-pipeline1 branch 2 times, most recently from 6b28516 to 716ab18 Compare June 17, 2024 08:19
@chenluli chenluli changed the title WIP: refactor(ava/advisor): init new advisor pipeline & built-in modules to plugin-based architecture refactor(ava/advisor): init new advisor pipeline & built-in modules to plugin-based architecture Jun 17, 2024
@chenluli chenluli requested review from BBSQQ, Yzing and hustcc June 17, 2024 09:32
@@ -33,6 +38,10 @@ export function stackedAreaChart(data: Data, dataProps: BasicDataPropertyForAdvi
x: field4X.name,
y: field4Y.name,
color: field4Series.name,
size: (datum: Datum) => getLineSize(datum, data, { field4Split: field4Series, field4X }),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 size 处理是因为,在折线图数据只有单点数据时,由于默认 size 太小,一个点看不见,会误以为空白,需要手动加大 size
@hustcc 这个有其他方式么,或者要在 G2 处理吗,否则默认情况下这种离散点展示不太好,用户自己处理有点麻烦
Screenshot 2024-06-19 at 11 18 14

dataProps,
data: filteredData,
chartTypeRecommendations: [{ chartType: 'pie_chart' }],
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只看调用。看不出来 .dataAnalyzer.specGeneratoradviceAsnyc 是 pipeline 的关系。对 execute 有疑惑 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改为不从 advisor 直接调用内部环节,而是从 pipeline 的方法上调用,每个 component 可以从 pipeline 上拿到并单独调用,如下示例:

 // 单独使用 pipeline 中的部分环节
    const dataAnalyzer = myChartAdvisor.pipeline.getComponent(PresetComponentName.dataAnalyzer)
 dataAnalyzer.execute({...})
// 之后增加 pipe 方法,dataAnalyzer.pipe(specGenerator).execute() ...

export type AdvisorPipelineContext = {
/** 原始数据 */
data?: Data;
advisor: Advisor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline 流转过程中 advisor 也是可选的?它一定有吗?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一定有的

type?: 'async' | 'sync';
execute: (data: Input, context: AdvisorPipelineContext) => Output | Promise<Output>;
/** 判断插件运行的条件 */
condition?: (data?: Input, context?: AdvisorPipelineContext) => boolean | Promise<boolean>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data 也属于 ctx 是不是不用放第一个 args 呢?上面 exec 同理。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 data 是各环节 component 自己的 input, context 里原来是整个 pipeline 开头的原始数据,不一样
这个问题可能和下面哪个 pipeline context 里是否加上 input 和 output 一起看

}) => {
const chatTypes = Object.keys(chartWIKI);
const list: ScoringResultForChartType[] = chatTypes.map((chartType) => {
return scoreRules(chartType, chartWIKI, dataProps, ruleBase, options);
return scoreRules(chartType, chartWIKI, dataProps, ruleBase, options, advisorContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看到了个之前遗留的不统一的点:chartWIKI 是不是要小驼峰?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内部的函数我改了,但是之前有 validator 和 trigger 对外的 api 里有 chartWIKI,这个版本目前还不想产生不向下兼容的 break change... 所以对外接口里的还是先不改吧
Screenshot 2024-07-08 at 11 41 58

export type ChartTypeRecommendOutput = { chartTypeRecommendations: ScoringResultForChartType[] };

export type SpecGeneratorInput = {
// todo 实际上不应该需要 score 信息
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以 score 是不是 for 中间结果评测的?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的, score 模块不应该影响 spec 生成的环节,这个后面还要再拆

export type PipelineStageType = 'dataAnalyze' | 'chartTypeRecommend' | 'encode' | 'specGenerate';

/** 基础插件接口定义 */
export interface AdvisorPluginType<Input = any, Output = any> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我有一个大胆的想法,如果 advice pipeline 是确定的,有几个 component 也是确定的。这条 pipeline 是不是都属于一个 ctx,那这条链路上的 input 和 output 其实都是从当前的 ctx 中获得,而不用区分?也就是说这里的 Input 和 Output 其实都是 ctx。况且看下面 stage 还有多个的情况。

Copy link
Member Author

@chenluli chenluli Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

意思是说,pipeline context 里加上 input 和 output 属性么,这两个属性根据当前执行的环节进行更新?
每个环节的 input 和 output 可以挂到 ctx 上,不过前置环节的应该都需要挂上去,不能只挂当前环节的🤔️

每个 component 和 plugin 用类型约束,是想说用户在自定义自己的 plugin 时,能够明确需要的类型,都放 pipeline 感觉类型不好约束了?


components.forEach((component) => {
if (!component) return;
this.componentsManager.tapPromise(component.name, async (previousResult) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看起来 type?: 'async' | 'sync'; 并没有区分处理?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline 这一层还没有区分,type?: 'async' | 'sync' 是 plugin 的定义,在 components 处理 plugin 执行时区分的。目前 pipeline 这一层抽象想的是串行执行各环节(里面的 components 只是个数组),暂不做编排工作?
Screenshot 2024-07-08 at 17 48 54

@@ -1,7 +1,7 @@
import { Info, RuleModule } from '../ruler';

import type { Specification } from '../../common/types';
import type { ScoringResultForRule, Lint } from '../types';
import type { ScoringResultForRule, Lint, AdvisorPipelineContext } from '../types';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint 和 advice 现在的关系是什么样的?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint 之前是在 spec generator 环节被内置调用的,保持原用法不变的话,它还是归属在 advise 的一环
另外,lint 也可看作一个独立的 pipeline,可以单独调用

pipelineComponent.registerPlugin(plugin);
return;
}
if (size(plugin.stage) === 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以当前还没有多 stage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我先拿掉了 stage 的概念,plugin 挂到 component 上,各 component 直接在 pipeline 中串行执行,好像 stage 不是必须的🤔️

// todo 内置的 visualEncode 和 spec generate 插件需要明确支持哪些图表类型
export const specGeneratorPlugin: AdvisorPluginType<SpecGeneratorInput, SpecGeneratorOutput> = {
name: 'defaultSpecGenerator',
stage: ['specGenerate'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果注册多个 plugin 作用于同一个 stage 它的表现会如何,顺序是怎么管理的?
问这个问题是看到下面操作生成 spec 其实也是有一个生成 or 操作顺序的,那么其实可以拆多个 plugin 实现,实现业务自定义 theme 之类的需求

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前想的是,一个 component 里的 plugin 都是并行的(输入和输出都被约束为一致的),所以没有声明它的执行顺序。
如果需要所说的有一个顺序执行的,感觉可以后续扩展 componet 的定义,如上面的框图所示, component 有点是执行管理器的概念,可以是用扩展定义其类型和嵌套来实现?
Screenshot 2024-07-08 at 18 01 06


constructor(
config: AdvisorConfig = {},
custom: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前仅 constructor 输入 custom 不知道是否支持后续 实例动态注册?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要的,后续打算在 advisor 上增加 updateCkb, updateRuleBase, resigterComponents 等 api

/** extra info to pass through the pipeline
* 额外透传到推荐 pipeline 中的业务信息
*/
extra?: Record<string, any>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理,extra 直接放 ctx 上?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是放在 context 上的,这里是 constructor 时进行初始化传参

return Promise.resolve(this.execute(params));
}
const pluginsOutput = {};
return Promise.all(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promise.all 是并行执行的,上面的 forEach 是顺序调用的,不知道对结果有没有什么影响?🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面是同步执行的,所以直接 forEach 了,这里异步不考虑顺序关系,如上评论,后续 component 若需要扩展出串行类型,再修改这里的执行逻辑

}

private initDefaultComponents() {
this.dataAnalyzer = new BaseComponent('data', { plugins: [dataProcessorPlugin], context: this.context });
this.chartTypeRecommender = new BaseComponent('chartType', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode 和 generate 都是以动词结尾的,这个 chartType 是一整个名次,是不是叫 category 之类的包含 “分类” 含义好一些?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,收在 enum PresetComponentName 下

/** 内置默认模块名 */
export enum PresetComponentName {
  dataAnalyzer,
  chartTypeRecommender,
  chartEncoder,
  specGenerator,
}

@chenluli chenluli marked this pull request as ready for review July 8, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants