Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor kafka source API and enhance virtual sources resolving logic #37

Merged

Conversation

gabb1er
Copy link
Collaborator

@gabb1er gabb1er commented Apr 16, 2024

  • added support for binary format for key and value. This means, that kafka message key or value is read as is, without casting to other types. Thus, it is up to user to use virtual sources/streams functionality to cast column to a desired type suitable for data quality checks.
  • added boolean flag to enable schema ID subtraction from kafka value: when schema registry is used, the schema ID is embedded into kafka value (magic byte + 4 bytes of schema ID). Therefore, in order to parse value, first it is required to subtract schema ID from it.
  • changed Avro schema API by adding boolean flag that enables or disable default values checks.
  • refactor logic for resolving virtual sources: now they are resolved not in order but rather with respect to their parents dependencies.
  • added tests to verify virtual source resolving logic.
  • documentation updated with respect to API changes.
  • configuration api version is updated to 1.5
  • fixes:
    • updates SQLIte dependency version: security patch [VDB-248999].
    • fixed JoinVirtualSourceReader by adding aliases to dataframes that are being joined. This is needed to avoid ambiguous column referencing in the resultant dataframe.

… logic:

- added support for binary format for key and value. This means, that kafka message key or value is read as is, without casting to other types. Thus, it is up to user to use virtual sources/streams functionality to cast column to a desired type suitable for data quality checks.
- added boolean flag to enable schema ID subtraction from kafka value: when schema registry is used, the schema ID is embedded into kafka value (magic byte + 4 bytes of schema ID). Therefore, in order to parse value, first it is required to subtract schema ID from it.
- changed Avro schema API by adding boolean flag that enables or disable default values checks.
- refactor logic for resolving virtual sources: now they are resolved not in order but rather with respect to their parents dependencies.
- added tests to verify virtual source resolving logic.
- documentation updated with respect to API changes.
- configuration api version is updated to 1.5
- fixes:
    - updates SQLIte dependency version: security patch [VDB-248999].
    - fixed JoinVirtualSourceReader by adding aliases to dataframes that are being joined. This is needed to avoid ambiguous column referencing in the resultant dataframe.
@gabb1er gabb1er merged commit 8ea9c23 into Raiffeisen-DGTL:main Apr 16, 2024
8 checks passed
cibaa-team-user pushed a commit that referenced this pull request Apr 16, 2024
# [1.6.0](v1.5.0...v1.6.0) (2024-04-16)

### Features

* refactor kafka source API and enhance virtual sources resolving logic ([#37](#37)) ([8ea9c23](8ea9c23))
@cibaa-team-user
Copy link
Collaborator

🎉 This PR is included in version 1.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants