-
Notifications
You must be signed in to change notification settings - Fork 30
fix: query properties for custom requesters #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: query properties for custom requesters #783
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_query_properties_for_custom_requesters#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_query_properties_for_custom_requestersHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughRenames a QueryProperties detection helper, adds _ensure_query_properties_to_model to migrate dict-based QueryProperties in request_parameters into QueryPropertiesModel and remove them from interpolated parameters, updates factory flows to call the new helpers, adds TestingRequester for tests, and introduces new unit tests for CustomRequester + QueryProperties. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant MF as ModelToComponentFactory
participant Req as RequesterModel (Http/Custom)
participant QP as QueryPropertiesModel
participant ROP as InterpolatedRequestOptionsProvider
MF->>Req: Inspect request_parameters
alt request_parameters contain QueryProperties-like dicts
MF->>MF: _ensure_query_properties_to_model(Req)
MF->>QP: Create QueryPropertiesModel(s) from dicts
MF->>Req: attach query_properties field
MF->>ROP: remove QueryProperties entries from interpolated request_parameters
else no QueryProperties present
MF-->>Req: proceed without migration
end
Req->>ROP: Build final request options (no duplicate QueryProperties)
Req-->>MF: Ready requester
sequenceDiagram
autonumber
participant Test as Unit Test
participant TReq as TestingRequester
participant ROP as InterpolatedRequestOptionsProvider
Test->>TReq: Instantiate with request_parameters, config, parameters
TReq->>ROP: __post_init__: create provider from request_parameters/config/parameters
TReq-->>Test: Requester ready for stream execution
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Would you like me to add a short checklist of risky areas to double-check (re-interpolation edge cases, duplicated tests removal, requester backward-compatibility), wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🔇 Additional comments (3)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3193-3227: Consider consolidating QueryProperties cleanup logic.The query properties handling is now split between
create_http_requester(where the type error occurs) andcreate_simple_retriever(where the cleanup happens). This separation creates the pipeline failures we're seeing.Since QueryProperties need to be removed before being passed to
InterpolatedRequestOptionsProvider, should we extract this cleanup logic into a helper method that bothcreate_http_requesterandcreate_simple_retrievercan call? This would:
- Fix the pipeline failures
- Reduce code duplication
- Make the flow clearer: "ensure models, extract query properties, remove from params, proceed"
Alternatively, should the cleanup happen earlier in the pipeline, perhaps right after the model is parsed? Wdyt?
🧹 Nitpick comments (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
1193-1258: Enhance assertions to verify QueryProperties handling?This test validates that a CustomRequester with QueryProperties can be created and read from without errors, which is great! However, the test could be more thorough in verifying the core functionality being fixed, wdyt?
Consider adding assertions to verify:
- That the QueryProperties are correctly extracted from
request_parametersand moved to the retriever (as mentioned in the PR description)- That the HTTP request was made with the expected parameters (you could inspect
requests_mock.request_historyto verify the actual request)- That property chunking configuration is properly accessible via the retriever's
additional_query_propertiesFor example:
# Verify QueryProperties were removed from request parameters retriever = get_retriever(stream) request_options_provider = retriever.requester.request_options_provider assert isinstance(request_options_provider, InterpolatedRequestOptionsProvider) assert "query" not in request_options_provider.request_parameters assert request_options_provider.request_parameters.get("not_query") == 1 # Verify QueryProperties are accessible via retriever assert retriever.additional_query_properties is not None assert retriever.additional_query_properties.property_list == ["id", "field"]This would provide stronger evidence that the fix for custom requesters + query properties is working correctly.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3179-3200: Potential runtime error: modifying dictionary during iteration.The code iterates over
request_parameters.keys()and then modifiesrequest_parameters[request_parameter_key]within the loop. While this specific pattern (iterating over.keys()) is safe in Python 3.7+, there are a couple of concerns:
Line 3191 uses
isinstance(request_parameters, Dict)which is stricter than necessary. Should this beMappingto handle a broader range of dict-like objects?The method modifies the model in place, which could be problematic if the model instance is reused elsewhere. Is this intentional?
Also, minor style note: the loop could be simplified using
items()since you're accessing both keys and values.Wdyt about these adjustments?
Consider this refactor for better type handling:
def _ensure_query_properties_to_model( self, requester: Union[HttpRequesterModel, CustomRequesterModel] ) -> None: """ For some reason, it seems like CustomRequesterModel request_parameters stays as dictionaries which means that the other conditions relying on it being QueryPropertiesModel instead of a dict fail. Here, we migrate them to proper model. """ if not hasattr(requester, "request_parameters"): return request_parameters = requester.request_parameters - if request_parameters and isinstance(request_parameters, Dict): - for request_parameter_key in request_parameters.keys(): - request_parameter = request_parameters[request_parameter_key] + if request_parameters and isinstance(request_parameters, Mapping): + for request_parameter_key, request_parameter in request_parameters.items(): if ( isinstance(request_parameter, Dict) and request_parameter.get("type") == "QueryProperties" ): request_parameters[request_parameter_key] = QueryPropertiesModel.parse_obj( request_parameter )unit_tests/sources/declarative/parsers/testing_components.py (1)
93-107: Verify TestingRequester matches actual CustomRequester behavior.The
TestingRequestermanually creates anInterpolatedRequestOptionsProviderwith onlyrequest_parameters. However, looking at the production code increate_http_requester(lines 2398-2407), theInterpolatedRequestOptionsProvideris initialized with many more fields:request_body,request_body_data,request_body_json,request_headers, andquery_properties_key.Should
TestingRequesteralso initialize these other fields to more accurately represent how a real custom requester would behave? Or is the simplified version sufficient for the test scenarios you're targeting?Also, is there a corresponding test that exercises this
TestingRequesterwith QueryProperties to validate the fix works end-to-end?Consider adding the missing fields for completeness:
def __post_init__(self, parameters: Mapping[str, Any]) -> None: """ Initializes the request options provider with the provided parameters and any configured request components like headers, parameters, or bodies. """ self.request_options_provider = InterpolatedRequestOptionsProvider( request_parameters=self.request_parameters, + request_headers=None, + request_body_data=None, + request_body_json=None, config=self.config, parameters=parameters or {}, ) super().__post_init__(parameters)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(6 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py(2 hunks)unit_tests/sources/declarative/parsers/testing_components.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
unit_tests/sources/declarative/parsers/testing_components.py (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
HttpRequester(2529-2668)RequestOption(1100-1119)DefaultErrorHandler(1950-1978)airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py (1)
InterpolatedRequestOptionsProvider(31-178)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (5)
airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
_parse(62-69)airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py (1)
preprocess_manifest(102-107)airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)
propagate_types_and_parameters(87-188)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
create_component(789-822)airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py (1)
read(99-125)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 2403-2403: mypy: Argument "request_parameters" to "InterpolatedRequestOptionsProvider" has incompatible type "dict[str, str | QueryProperties] | str | None"; expected "str | Mapping[str, str] | None".
[error] 3224-3226: mypy: Item "CustomRequester" of "HttpRequester | CustomRequester" has no attribute "request_parameters". (union-attr)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4-4: LGTM!The
jsonimport is appropriately used for serializing the mock response data in the new test.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3364-3374: LGTM on the method rename.The rename from
_query_properties_in_request_parametersto_has_query_properties_in_request_parametersbetter conveys that this is a boolean check. The implementation looks good.unit_tests/sources/declarative/parsers/testing_components.py (1)
11-22: LGTM on the new imports.The imports are necessary for the new
TestingRequesterclass and are correctly organized.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(6 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 3221-3221: Ruff formatting change required. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py' (or 'ruff format .') to fix formatting.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
## What Addresses airbytehq/airbyte-internal-issues#14717 ## How Update the CDK version to pick airbytehq/airbyte-python-cdk#783 up ## Review guide <!-- 1. `x.py` 2. `y.py` --> ## User Impact <!-- * What is the end result perceived by the user? * If there are negative side effects, please list them. --> ## Can this PR be safely reverted and rolled back? <!-- * If unsure, leave it blank. --> - [x] YES 💚 - [ ] NO ❌
What
Addresses https://github.com/airbytehq/airbyte-internal-issues/issues/14717
Since this change property chunking is broken for custom requesters.
How
_remove_query_properties as part of the SimpleRetriever instead of the HttpRequester. This means that if the connector both has a custom retriever and a custom requester, this will not work. I think this is fine for now.
One thing that I'm unsure is the use of
_ensure_query_properties_to_modelwhich basically stems from the child of custom components not being parsed as models and staying as dict. If there is a way to propagate the model generation to custom components children, that would probably be a better solution. For now,_ensure_query_properties_to_modelpatches that for requesters and query properties specifically.Running
ad_campaign_analyticswith this version of the CDK and our sandbox account, I get 2 records.Summary by CodeRabbit
New Features
Bug Fixes
Tests