Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ScanFile expression and visitor for CDF (#546)
## What changes are proposed in this pull request? This PR introduces four concepts: - `cdf_scan_row_schema`: This is the schema that engine data will be transformed into at the end of the log replay phase. This schema prunes the log schema down only to the fields necessary to produce CDF columns. - `cdf_scan_row_expression`: This is a function that generates an expression to transform an engine data into the `cdf_scan_row_schema`. The function takes timestamp and commit number as arguments because it inserts these as columns into the output engine data. - `CDFScanFile`: This is a type that holds all the information needed to read a data file and generate its CDF rows. It holds path, deletion vector, the type of action, and the paired remove deletion vector. The action type is encoded as an enum `CDFScanFileType` - `CDFScanFileVisitor`: This is a visitor that reads engine data with the `cdf_scan_row_schema` and constructs `CDFScanFile`s. This PR is only for internal use, and is only expected to be used by `TableChangesScan::execute` when it is implemented. Engines must *not* use the visitor nor `CDFScanFile`. ## How was this change tested? I generate a table with add, remove and cdc actions. Then: - The table is read, - The engine data is transformed using `table_changes_action_iter` which in transforms the engine data into the `cdf_scan_row_schema` using the `cdf_scan_row_expression` - The transformed engine data is read again using the `CDFScanFileVisitor` and assert that the `CDFScanFile`s are as expected. This test checks the following cases: - A remove with `None` partition values. An empty hashmap for partition values should be used. - A remove with partition values. - An add/remove DV pair. This should place the correct remove dv into the add's CdfScanFile. - The visitor extracts the correct timestamp and commit version for each file.
- Loading branch information