Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding functionality to run quality checks over streaming sources #19

Merged

Conversation

gabb1er
Copy link
Collaborator

@gabb1er gabb1er commented Dec 8, 2023

  • Changed core functionality of the framework to implement streaming processing: added separate implementations
    of metric processor: for static and streaming data sources.
  • Added streaming DQ job and added corresponding builders in DQ context.
  • Refactored error accumulators to work with both batch and streaming processing.
  • Configuration model of sources has changed to enable reading them as a stream dataframes.
    Sources that are not streamable are marked accordingly.
  • Added trait with loadDataStream method to mix in connections that support reading data as a stream.
  • Modified kafka connection to implement loadDataStream method.
  • Modified source readers to enable readStream method for sources.
  • Added functionality to read streaming virtual sources: only for those virtual source types that can be build over streaming sources.
    Virutal Source readers updated accordingly.
  • Application configuration has extended with streaming settings.
  • Documentation updated to include description of functionality related to running quality checks over streaming sources.
  • In addition, logging have been revised: Checkita uses Log4J2 library. For earlier versions of Spark (that work with Log4J 1x)
    the Log4j2 dependency is added explicitly as well as log4j2 -> slf4j bridge.
  • Other minor fixes and changes related to implementation of the streaming processing functionality.

- changed core functionality of the framework to implement streaming processing
- added separate implementations of metric processor: for static and streaming data sources
- added streaming DQ job and added corresponding builders in DQ context
- refactored error accumulators
- added functionality to read streaming virtual sources
- other changes connected to implementation of the streaming processing functionality and minor bug fixes.
- documentation updates include description of functionality related to running quality checks over streaming sources
- In addition, logging have been revised: Checkita uses Log4J2 library. For earlier versions of Spark (that work with Log4J 1x) the Log4j2 dependency is added explicitly as well as log4j2 -> slf4j bridge.
@gabb1er gabb1er merged commit 0895fed into Raiffeisen-DGTL:main Dec 8, 2023
8 checks passed
cibaa-team-user pushed a commit that referenced this pull request Dec 8, 2023
# [1.1.0](v1.0.1...v1.1.0) (2023-12-08)

### Features

* added new type connections ([#17](#17)) ([205cbc8](205cbc8))
* adding functionality to run quality checks over streaming sources ([#19](#19)) ([0895fed](0895fed))
* adding streaming sources and stream readers ([#16](#16)) ([b5398fa](b5398fa))
@cibaa-team-user
Copy link
Collaborator

🎉 This PR is included in version 1.1.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants