-
Notifications
You must be signed in to change notification settings - Fork 30
feat(query_properties): Add configured catalog to SimpleRetriever and only fetch properties that are included in the stream schema #788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…hat are selected in the json_schema
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@brian/property_chunking_only_fetch_enabled_fields#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch brian/property_chunking_only_fetch_enabled_fields Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
PyTest Results (Fast)3 813 tests +11 3 801 ✅ +11 6m 16s ⏱️ -27s Results for commit afc67fd. ± Comparison against base commit 20ae208. This pull request removes 1 and adds 12 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
PyTest Results (Full)3 816 tests 3 804 ✅ 11m 14s ⏱️ Results for commit afc67fd. ♻️ This comment has been updated with latest results. |
…exibility to modify selected properties
📝 WalkthroughWalkthroughAdds a JsonSchemaPropertySelector and integrates per-stream configured catalog propagation through ModelToComponentFactory; QueryProperties and PropertyChunking are updated to use selectors to filter requested properties. Tests and factory wiring updated to support selector creation and transformations. Changes
Sequence Diagram(s)sequenceDiagram
participant Source as ConcurrentDeclarativeSource
participant Factory as ModelToComponentFactory
participant QueryProps as QueryProperties
participant Selector as JsonSchemaPropertySelector
participant Transform as Transformation
participant Chunking as PropertyChunking
Note over Source,Factory: Factory is constructed with configured_catalog
Source->>Factory: instantiate(configured_catalog)
Factory->>Factory: build stream_name -> configured_stream map
Factory->>QueryProps: create_query_properties(model, stream_name)
QueryProps->>Factory: request property_selector creation
Factory->>Selector: create_json_schema_property_selector(model, stream_name)
Selector->>Selector: load configured_stream.json_schema.properties
alt transformations present
Selector->>Transform: apply each transformation
Transform-->>Selector: transformed properties
end
Selector-->>QueryProps: property_selector ready
QueryProps->>Selector: select()
Selector-->>QueryProps: configured_properties (Set[str] or None)
QueryProps->>Chunking: get_request_property_chunks(property_fields, always_include, configured_properties)
Chunking-->>QueryProps: property chunks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Would you like me to add a short checklist of things to verify during review (e.g., stream mapping edge cases, transformation ordering, and performance of chunking with selectors)? wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)
48-53
: Respect configured property selection when not chunkingWhen
property_limit
isNone
we still return the rawproperty_fields
, so anyconfigured_properties
passed in are ignored. Streams that rely on the selector without chunking will continue to request every field, which defeats the point of this optimization. Could we filter the iterable before building the single chunk so we only send the selected set (plus any always-include fields) in this branch as well? wdyt?- single_property_chunk = list(property_fields) + filtered_fields = [ + field + for field in property_fields + if configured_properties is None or field in configured_properties + ] + single_property_chunk = filtered_fields
🧹 Nitpick comments (4)
unit_tests/sources/declarative/requesters/query_properties/property_selector/test_json_schema_property_selector.py (1)
41-94
: Consider verifying transformation application more explicitly?The test comprehensively covers the three catalog states (transformations, None, empty). However, in the first test case (lines 44-54), we're validating the final output but not explicitly checking that the transformations were applied. For instance, we could verify that "properties_" was indeed removed from "properties_statistics" → "statistics".
This might help catch bugs where transformations are silently failing but the test passes due to coincidental matching. Wdyt?
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
60-63
: Consider simplifying the yield statement?The logic correctly filters fields when
configured_properties
is not None. However, line 61 usesyield from [[field for field in fields if field in configured_properties]]
, which is functionally equivalent toyield [field for field in fields if field in configured_properties]
.The
yield from
with a single-element list is a bit unusual. Would directyield
be clearer? E.g.:- if configured_properties is not None: - yield from [[field for field in fields if field in configured_properties]] - else: - yield list(fields) + if configured_properties is not None: + yield [field for field in fields if field in configured_properties] + else: + yield list(fields)This maintains the same behavior while being slightly more idiomatic. Wdyt?
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4566-4571
: Consider testing the property_selector's select() method behavior?The test validates that the
property_selector
is correctly instantiated with the expected transformation. However, it doesn't callselect()
to verify that the transformation is actually applied to the configured stream's properties. Would it be valuable to add an assertion that callsproperty_selector.select()
and verifies the returned property names are transformed as expected (e.g., "properties_field" → "field")? wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
679-679
: Thread configured_catalog through factory - well done!The implementation correctly threads the
configured_catalog
through the factory and builds the stream name mapping. The static method is clean and efficient.One consideration: if the
configured_catalog
contains multiple streams with the same name, the dictionary comprehension (line 817) would silently keep only the last one. While this is probably prevented by platform constraints, would it be worth adding a validation check to fail fast if duplicate stream names are detected? Something like:if configured_catalog: stream_names = [stream.stream.name for stream in configured_catalog.streams] if len(stream_names) != len(set(stream_names)): raise ValueError("Configured catalog contains duplicate stream names") return {stream.stream.name: stream for stream in configured_catalog.streams}This would make debugging easier if the platform constraint is ever violated. Wdyt?
Also applies to: 690-692, 812-820
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(4 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(12 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
(3 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(1 hunks)unit_tests/connector_builder/test_property_chunking.py
(1 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(7 hunks)unit_tests/sources/declarative/requesters/query_properties/property_selector/__init__.py
(1 hunks)unit_tests/sources/declarative/requesters/query_properties/property_selector/test_json_schema_property_selector.py
(1 hunks)unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py
(3 hunks)unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py
(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
🧬 Code graph analysis (12)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/__init__.py (2)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (1)
PropertySelector
(9-24)
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
PropertyChunking
(1040-1057)PropertyLimitType
(1035-1037)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (3)
PropertyChunking
(25-76)PropertyLimitType
(14-21)get_request_property_chunks
(42-73)airbyte_cdk/sources/declarative/requesters/query_properties/strategies/group_by_key.py (1)
GroupByKey
(13-33)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
get_request_property_chunks
(33-63)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py (1)
select
(26-38)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
select
(30-52)
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (5)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)unit_tests/sources/declarative/requesters/query_properties/property_selector/test_json_schema_property_selector.py (1)
_create_configured_airbyte_stream
(24-38)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (2)
get_request_property_chunks
(33-63)QueryProperties
(18-63)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (2)
get_request_property_chunks
(42-73)PropertyLimitType
(14-21)airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
PropertiesFromEndpoint
(14-40)get_properties_from_endpoint
(31-40)
unit_tests/sources/declarative/requesters/query_properties/property_selector/test_json_schema_property_selector.py (5)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
JsonSchemaPropertySelector
(2034-2054)RemoveFields
(1079-1096)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (2)
JsonSchemaPropertySelector
(16-52)select
(30-52)airbyte_cdk/sources/declarative/transformations/transformation.py (1)
RecordTransformation
(13-37)airbyte_cdk/sources/declarative/transformations/keys_replace_transformation.py (1)
KeysReplaceTransformation
(14-61)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (1)
select
(16-24)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (3)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
PropertySelector
(9-24)select
(16-24)airbyte_cdk/sources/declarative/transformations/transformation.py (1)
RecordTransformation
(13-37)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)JsonSchemaPropertySelector
(2034-2054)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)
unit_tests/connector_builder/test_connector_builder_handler.py (1)
configured_catalog
(468-471)unit_tests/sources/test_source.py (1)
catalog
(70-93)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (4)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
PropertiesFromEndpoint
(14-40)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (2)
PropertyChunking
(25-76)get_request_property_chunks
(42-73)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
PropertySelector
(9-24)select
(16-24)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
select
(30-52)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6)
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(13-38)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)airbyte_cdk/sources/declarative/transformations/add_fields.py (1)
AddFields
(37-148)airbyte_cdk/sources/declarative/transformations/remove_fields.py (1)
RemoveFields
(17-75)airbyte_cdk/sources/declarative/transformations/flatten_fields.py (1)
FlattenFields
(13-52)airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py (1)
DpathFlattenFields
(30-100)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
JsonSchemaPropertySelector
(2034-2054)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)airbyte_cdk/sources/declarative/transformations/keys_replace_transformation.py (1)
KeysReplaceTransformation
(14-61)
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py (3)
airbyte_cdk/sources/declarative/transformations/transformation.py (1)
RecordTransformation
(13-37)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)JsonSchemaPropertySelector
(2034-2054)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (2)
JsonSchemaPropertySelector
(16-52)select
(30-52)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (18)
JsonSchemaPropertySelector
(2034-2054)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)QueryProperties
(2810-2832)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (16)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1-2
: Note: Auto-generated fileBased on learnings, this file is auto-generated from
declarative_component_schema.yaml
and should be ignored in the review order. The schema changes look consistent with the manual implementations in the PR.unit_tests/sources/declarative/requesters/query_properties/property_selector/test_json_schema_property_selector.py (1)
24-38
: LGTM! Clean helper functionThe
_create_configured_airbyte_stream
helper provides a nice abstraction for test setup. The dynamic property generation using a dictionary comprehension is elegant.unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (3)
75-90
: LGTM! Correct parameter handlingThe addition of
configured_properties = set(property_fields)
before convertingproperty_fields
to an iterator is the right approach. This ensures all properties are configured by default in these tests, maintaining backward compatibility.
98-119
: Good coverage of empty configured_propertiesThis test validates the important edge case where
configured_properties
is an empty set, ensuring onlyalways_include_properties
are included in chunks. This behavior aligns with the intent that disabled properties should be filtered out.
121-144
: LGTM! Validates the None vs empty set distinctionThis test correctly validates that
configured_properties=None
results in no filtering (all properties are chunked), which differs from an empty set. This distinction is important for non-READ operations like CHECK and DISCOVER where there's no configured catalog.airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (1)
8-24
: LGTM! Clean abstract interfaceThe
PropertySelector
ABC follows Python best practices with clear documentation. TheOptional[Set[str]]
return type correctly captures the three states: None (no filtering), empty set (no properties), or a populated set (selected properties).airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (2)
29-29
: LGTM! Property selector integrationThe
property_selector
field is correctly typed and positioned. Its optional nature allows backward compatibility for manifests that don't specify column selection.
45-45
: LGTM! Correct configured_properties derivationThe logic correctly derives
configured_properties
by callingproperty_selector.select()
when available, otherwise defaulting toNone
(no filtering).unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
246-266
: LGTM - Clean test for configured catalog mapping!This test nicely validates that the ModelToComponentFactory correctly maps stream names to their configured streams when initialized with a configured_catalog. The setup is clear and the assertion directly checks the expected behavior.
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (2)
31-46
: LGTM - Nice helper function!This helper cleanly creates a ConfiguredAirbyteStream for testing purposes. The pattern of generating properties from a list is elegant and reusable.
48-93
: Verify test expectations align with property_selector behaviorIn this test, the property list includes
"remove_me"
(line 66), and the configured stream also includes it (line 50). TheRemoveFields
transformation should remove it from the configured properties, which would then filter it out during chunking.The expected result shows 2 chunks without "remove_me" (lines 90-92), which makes sense. However, could you verify this test actually passes? I have concerns about the
JsonSchemaPropertySelector.select()
implementation (see my comments on that file) that might cause this filtering to not work as expected.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
29-29
: LGTM! Imports are clean and necessary.The new imports for
ConfiguredAirbyteStream
,ConfiguredAirbyteCatalog
,JsonSchemaPropertySelectorModel
, andJsonSchemaPropertySelector
are all used appropriately throughout the file to support the property selection feature.Also applies to: 46-46, 319-321, 509-511
749-749
: LGTM! Model-to-constructor mapping updated correctly.The
JsonSchemaPropertySelectorModel
is properly wired to its constructor method, following the established pattern in the factory.
3047-3069
: LGTM! JsonSchemaPropertySelector factory method is well-implemented.The new
create_json_schema_property_selector
method follows the established factory pattern and correctly:
- Retrieves the
configured_stream
from the mapping (handling None gracefully)- Creates transformations by recursively processing transformation models
- Passes all necessary parameters to the
JsonSchemaPropertySelector
constructorThe handling of
configured_stream
being None is intentional and correct for CHECK/DISCOVER operations or when a stream isn't in the configured catalog, as documented in the relevant code snippets.
3294-3294
: LGTM! All call sites correctly pass stream_name.All invocations of
create_query_properties
within this file have been updated to pass the requiredstream_name
parameter. Thename
variable is available in the method context and represents the stream name appropriately.Also applies to: 3320-3320, 3326-3326
3013-3045
: Implementation is correct—all call sites properly updated.The changes look solid. You've correctly threaded
stream_name
through to the property_selector creation (line 3032 in your diff) where it's needed, while appropriately omitting it from property_list and property_chunking which don't require stream context.I verified that:
- Direct call sites within this file are updated (lines 3318-3322 and 3324-3327 both pass stream_name)
QueryPropertiesModel
only appears in this file—no external usages found- The nested dispatch for property_selector correctly passes stream_name, enabling stream-aware component behavior
Since this is an internal factory method and all callers are accounted for, the breaking change concern doesn't apply here. The implementation is ready.
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall makes sense. Just need a bit more context
airbyte_cdk/sources/declarative/requesters/query_properties/json_schema_property_selector.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (4)
42-44
: Docstring param is stale (configured_stream).The method no longer accepts configured_stream; the doc should reflect the property_selector-based flow. Can we update the docstring accordingly, wdyt?
Apply this minimal fix:
- :param configured_stream: The customer configured stream being synced which is needed to identify which - record fields to query for and emit. + The configured stream context is provided via `property_selector` when present.
45-45
: Avoid recomputing and side effects: cache selected properties.
select()
may be non‑trivial (and can mutate internal state in current impl). Can we memoize once per instance to keep results stable across slices, wdyt?Apply this change:
- configured_properties = self.property_selector.select() if self.property_selector else None + if self.property_selector: + if not hasattr(self, "_configured_properties_cache"): + self._configured_properties_cache = self.property_selector.select() + configured_properties = self._configured_properties_cache + else: + configured_properties = None
60-65
: Simplify redundant conditional.Inner
if configured_properties is not None
is redundant inside the same guard. Can we trim it for clarity, wdyt?- if configured_properties is not None: - all_fields = ( - [field for field in fields if field in configured_properties] - if configured_properties is not None - else list(fields) - ) + if configured_properties is not None: + all_fields = [field for field in fields if field in configured_properties] else: all_fields = list(fields)
6-6
: Optional: remove unused import.
ConfiguredAirbyteStream
isn’t referenced here. Shall we drop it to keep imports tight, wdyt?unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (2)
152-189
: Add a test for “empty configured properties.”To lock in semantics, could we add a case where the configured catalog has zero properties selected and assert the produced chunks (empty vs none) and downstream behavior, wdyt?
191-236
: Edge case: no property_limit ordering.When
property_limit
is unset,PropertyChunking
currently appendsalways_include_properties
at the end of the chunk, while the no‑chunking path prepends it. Could we add a test to pin desired ordering and align implementations later, wdyt?airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (2)
25-25
: Minor: default_factory can belist
directly.Using
default_factory=list
is slightly cleaner than a lambda. Want to switch, wdyt?- properties_transformations: List[RecordTransformation] = field(default_factory=lambda: []) + properties_transformations: List[RecordTransformation] = field(default_factory=list)
27-29
: Optional:_parameters
is unused.We store
_parameters
but don’t reference it. Should we drop it or add a TODO explaining upcoming use, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
(3 hunks)unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py
(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (5)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
PropertiesFromEndpoint
(14-40)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (2)
PropertyChunking
(25-76)get_request_property_chunks
(42-73)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
PropertySelector
(9-24)select
(16-24)airbyte_cdk/sources/types.py (1)
StreamSlice
(75-169)airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
select
(30-52)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (3)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
PropertySelector
(9-24)select
(16-24)airbyte_cdk/sources/declarative/transformations/transformation.py (1)
RecordTransformation
(13-37)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)JsonSchemaPropertySelector
(2034-2054)
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (5)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
JsonSchemaPropertySelector
(16-52)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5)
JsonSchemaPropertySelector
(2034-2054)RemoveFields
(1079-1096)QueryProperties
(2810-2832)PropertyLimitType
(1035-1037)PropertiesFromEndpoint
(2796-2807)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (2)
get_request_property_chunks
(33-72)QueryProperties
(18-72)airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (2)
get_request_property_chunks
(42-73)PropertyLimitType
(14-21)airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
PropertiesFromEndpoint
(14-40)get_properties_from_endpoint
(31-40)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
53-58
: Confirm “empty selection” semantics.If
configured_properties
is an empty set, chunking yields a single empty chunk (unlessalways_include_properties
). Is that intended (vs. yielding no chunks / no request)? Could this produce a request with an empty fields param in some retrievers? Should we short‑circuit to yield nothing when bothconfigured_properties
andalways_include_properties
are empty, wdyt?unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (2)
31-46
: Helper looks good.Factory for ConfiguredAirbyteStream is clear and reusable. Nice.
48-93
: Great coverage for static list + selector + chunking.Assertions match the new selection behavior; this is valuable.
...s/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
27-28
: Isself._parameters
needed?I notice that
parameters
is stored inself._parameters
but never referenced elsewhere in the class. If it's not required for any base class protocol or future use, we could simplify by removing this line, wdyt?- def __post_init__(self, parameters: Mapping[str, Any]) -> None: - self._parameters = parameters + def __post_init__(self, parameters: Mapping[str, Any]) -> None: + pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (3)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/property_selector.py (2)
PropertySelector
(9-24)select
(16-24)airbyte_cdk/sources/declarative/transformations/transformation.py (1)
RecordTransformation
(13-37)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)JsonSchemaPropertySelector
(2034-2054)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_selector/json_schema_property_selector.py (1)
45-54
: Nice work on the deep copy!The use of
copy.deepcopy()
correctly addresses the mutation concern from the previous review. The transformation logic looks solid—applying transformations to the copied schema and returning the property keys maintains the integrity of the original configured catalog.
What
An optimization that we've been looking into for some of our certified connectors that require multiple requests to fetch the entire set of properties for each record. Some examples are:
source-linkedin-ads
- Analytics streams have a static list of almost 100 fields that requires multiple requests to get the complete record.source-hubspot
- Various CRM search and other streams have a dynamic list of fields to fetch and often will require multiple requests to get each complete record.The issue is that if a customer were to only select a few few fields when configuring their catalog, we still make the extra requests to get the complete record even if many of those fields are ultimately not emitted in the response. This can lead to a lot of wasted requests.
This change filters out schema properties that were de-selected by users which should allow for fewer API requests for customers who do not need to sync every column of a stream.
How
There's two pieces to focus on here:
configured_catalog
during a read and we load that into theModelToComponentFactory
and structure it as a mapping from stream names to configured streams.None
for check/read : Fetch all properties which is the default case.['selected_1
,selected_2
]: Fetch only selected values[]
: all values de-selected which should not fetch any propertiesNone
and[]
as different behaviors.JsonSchemaPropertySelector
that takes the arbitrary json_schema that comes in the catalog. We then treat this like a single record which allows us to reuse the existing record transformation logic. For Hubspot we need to rename all properties fields to remove theproperties_
prefix. For other sources, we can reuse this logic to remove unneeded fields likeid
,created_at
etc.Ruled out approaches
properties_
, we might end up with a collision at the top level which would be very badproperties
object. However, the current platform does not allow column selection for nested objects, only at the top levelSummary by CodeRabbit
New Features
Tests