Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][chunjun-core] Supports capturing dirty data from the source and when the source sends it downstream #1901 #1902

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

david-gao1
Copy link
Contributor

… and when the source sends it downstream #1901

Purpose of this pull request

Which issue you fix

Fixes # (issue).

Checklist:

  • I have executed the 'mvn spotless:apply' command to format my code.
  • I have a meaningful commit message (including the issue id, the template of commit message is '[label-type-#issue-id][fixed-module] a meaningful commit message.')
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have checked my code and corrected any misspellings.
  • My commit is only one. (If there are multiple commits, you can use 'git squash' to compress multiple commits into one.)

@github-actions github-actions bot added the CORE label Jun 25, 2024
@david-gao1
Copy link
Contributor Author

david-gao1 commented Jul 9, 2024

问题再现:

CREATE` TABLE source
(
  `ID` int,
  `FloatColumn` string,
  `BinaryColumn` bytes,
  `VarBinaryColumn` bytes,
  `LongBinaryColumn` bytes
) WITH (
      'connector' = 'xxx-x',


      );
CREATE TABLE sink
(
  `ID` int,
  `FloatColumn` int,
  `BinaryColumn` string,
  `VarBinaryColumn` string,
  `LongBinaryColumn` string
) WITH (
      'connector' = 'stream-x'
      );
insert into sink 
select 
`ID` as `ID`,
CAST(`FloatColumn` AS int)  as `FloatColumn`, --比如这里数据源来一条脏数据为:111aa, 数据发送到下游算子时会报错,但此时脏数据无法捕获,脏数据管理器的能力就发挥不出来
CAST(`BinaryColumn` AS string)  as `BinaryColumn`,
CAST(`VarBinaryColumn` AS string)  as `VarBinaryColumn`,
CAST(`LongBinaryColumn` AS string)  as `LongBinaryColumn`
 from source ;

@david-gao1
Copy link
Contributor Author

david-gao1 commented Jul 9, 2024

本featrue主要是增强脏数据的捕获能力,这里能够获取数据源以及数据源向下游算子报错时的脏数据。这样会减少手动剔除脏数据的情况

@zoudaokoulife zoudaokoulife merged commit 1028750 into DTStack:master Jul 10, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants