-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(low-code): Fix legacy state migration in SubstreamPartitionRouter #261
Conversation
📝 WalkthroughWalkthroughThe changes focus on improving the Changes
Sequence DiagramsequenceDiagram
participant SR as SubstreamPartitionRouter
participant IS as Initial State
SR->>IS: set_initial_state()
IS-->>SR: Validate state type
alt Valid State
SR->SR: Copy state to parent streams
else Invalid State
SR->SR: Ignore state
end
Looks like you've added some nice defensive programming to the state handling! A couple of quick thoughts:
Would you be interested in exploring that logging addition? It could help with debugging unexpected state scenarios. Cheers! 🚀 ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py (1)
299-304
: LGTM! The state handling improvements look solid.The changes make the state migration more robust by filtering out non-scalar values. The variable rename from
substream_state
tosubstream_state_values
better reflects its purpose.What do you think about adding a debug log when filtering out a state value? This could help with troubleshooting, wdyt? 🤔
substream_state_values = list(stream_state.values()) substream_state = substream_state_values[0] if substream_state_values else {} # Filter out per partition state. Because we pass the state to the parent stream in the format {cursor_field: substream_state} if isinstance(substream_state, (list, dict)): + self.logger.debug(f"Filtering out non-scalar state value: {substream_state}") substream_state = {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py
(1 hunks)unit_tests/sources/declarative/partition_routers/test_substream_partition_router.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/partition_routers/test_substream_partition_router.py (1)
405-512
: Great test coverage! 🎯The test cases thoroughly validate the state migration logic with well-structured parameterized tests. The test IDs and docstring provide clear documentation of what's being tested.
I found this error in CATs for Jira: The parent state was missing, causing the substream partition router to attempt migrating the substream state for the parent stream. However, it should only be migrated if it is in the legacy format. We didn’t encounter errors related to this issue previously because the parent state was updated during the sync. In contrast, with the concurrent per-partition cursor, the parent state is updated at the end of the sync, which created this corner case where the parent state is missing. With the fix, tests are successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
Created a DPath Enhancing Extractor Refactored the record enhancement logic - moved to the extracted class Split the tests of DPathExtractor and DPathEnhancingExtractor Fix the failing tests: FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters] FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario] FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario] They faile because of comparing string and int values of the page_size (public) attribute. Imposed an invariant: on construction, page_size can be set to a string or int keep only values of one type in page_size for uniform comparison (convert the values of the other type) _page_size holds the internal / working value ... unless manipulated directly. Merged: feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228) fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267) feat: combine slash command jobs into single job steps (airbytehq#266) feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256) feat: add help response for unrecognized slash commands (airbytehq#264) ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263) fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(low-code): add DpathFlattenFields (airbytehq#227) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Summary by CodeRabbit
Bug Fixes
Tests