Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

功能: 处理脏数据 #1

Open
zipper-meng opened this issue Jun 21, 2023 · 0 comments
Open

功能: 处理脏数据 #1

zipper-meng opened this issue Jun 21, 2023 · 0 comments
Assignees

Comments

@zipper-meng
Copy link
Member

https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md#脏数据处理

脏数据处理

什么是脏数据?

目前主要有三类脏数据:

  1. Reader读到不支持的类型、不合法的值。
  2. 不支持的类型转换,比如:Bytes转换为Date
  3. 写入目标端失败,比如:写mysql整型长度超长。

如何处理脏数据

Reader.TaskWriter.Task中,通过AbstractTaskPlugin.getTaskPluginCollector()可以拿到一个TaskPluginCollector,它提供了一系列collectDirtyRecord的方法。当脏数据出现时,只需要调用合适的collectDirtyRecord方法,把被认为是脏数据的Record传入即可。

用户可以在任务的配置中指定脏数据限制条数或者百分比限制,当脏数据超出限制时,框架会结束同步任务,退出。插件需要保证脏数据都被收集到,其他工作交给框架就好。

@zipper-meng zipper-meng self-assigned this Jun 21, 2023
@zipper-meng zipper-meng changed the title 功能: 处理脏数据 功能请求: 处理脏数据 Jun 21, 2023
@zipper-meng zipper-meng changed the title 功能请求: 处理脏数据 功能: 处理脏数据 Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant