Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Oct 7, 2025

What

Addresses https://github.com/airbytehq/airbyte-internal-issues/issues/14717

Since this change property chunking is broken for custom requesters.

How

_remove_query_properties as part of the SimpleRetriever instead of the HttpRequester. This means that if the connector both has a custom retriever and a custom requester, this will not work. I think this is fine for now.

One thing that I'm unsure is the use of _ensure_query_properties_to_model which basically stems from the child of custom components not being parsed as models and staying as dict. If there is a way to propagate the model generation to custom components children, that would probably be a better solution. For now, _ensure_query_properties_to_model patches that for requesters and query properties specifically.

Running ad_campaign_analytics with this version of the CDK and our sandbox account, I get 2 records.

Summary by CodeRabbit

  • New Features

    • Improved handling and migration of query-parameter configurations so custom and HTTP requesters recognize and store query settings consistently.
    • Added a testing requester utility to simplify building interpolated request options in tests.
  • Bug Fixes

    • Prevented query-parameter entries from being re-applied during interpolation, avoiding duplicate or conflicting request parameters.
  • Tests

    • Added an end-to-end test covering custom requesters with query-parameter configurations.

@github-actions github-actions bot added bug Something isn't working security labels Oct 7, 2025
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_query_properties_for_custom_requesters#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_query_properties_for_custom_requesters

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

📝 Walkthrough

Walkthrough

Renames a QueryProperties detection helper, adds _ensure_query_properties_to_model to migrate dict-based QueryProperties in request_parameters into QueryPropertiesModel and remove them from interpolated parameters, updates factory flows to call the new helpers, adds TestingRequester for tests, and introduces new unit tests for CustomRequester + QueryProperties.

Changes

Cohort / File(s) Summary of changes
Parser factory: QueryProperties handling
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Renamed _query_properties_in_request_parameters(...)_has_query_properties_in_request_parameters(...). Added _ensure_query_properties_to_model(...) to convert dict entries to QueryPropertiesModel and remove them from interpolated request_parameters. Updated create_http_requester/create_simple_retriever flows to call the new helpers and ensure migrated entries are not re-interpolated. Centralized removal via _remove_query_properties(...) where appropriate.
Unit tests: CustomRequester + QueryProperties
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
Added test_stream_with_custom_requester_and_query_properties (appears duplicated in diff) to validate DeclarativeStream behavior with CustomRequester and QueryProperties; added top-level json import used by the test.
Test utilities: TestingRequester
unit_tests/sources/declarative/parsers/testing_components.py
Added TestingRequester (subclass of HttpRequester) with optional request_parameters and __post_init__ that initializes an InterpolatedRequestOptionsProvider. Updated imports to expose HttpRequester, InterpolatedRequestOptionsProvider, and RequestInput for testing paths.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant MF as ModelToComponentFactory
  participant Req as RequesterModel (Http/Custom)
  participant QP as QueryPropertiesModel
  participant ROP as InterpolatedRequestOptionsProvider

  MF->>Req: Inspect request_parameters
  alt request_parameters contain QueryProperties-like dicts
    MF->>MF: _ensure_query_properties_to_model(Req)
    MF->>QP: Create QueryPropertiesModel(s) from dicts
    MF->>Req: attach query_properties field
    MF->>ROP: remove QueryProperties entries from interpolated request_parameters
  else no QueryProperties present
    MF-->>Req: proceed without migration
  end
  Req->>ROP: Build final request options (no duplicate QueryProperties)
  Req-->>MF: Ready requester
Loading
sequenceDiagram
  autonumber
  participant Test as Unit Test
  participant TReq as TestingRequester
  participant ROP as InterpolatedRequestOptionsProvider

  Test->>TReq: Instantiate with request_parameters, config, parameters
  TReq->>ROP: __post_init__: create provider from request_parameters/config/parameters
  TReq-->>Test: Requester ready for stream execution
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • brianjlai
  • darynaishchenko

Would you like me to add a short checklist of risky areas to double-check (re-interpolation edge cases, duplicated tests removal, requester backward-compatibility), wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly identifies the primary change—correcting query properties handling in custom requesters—without unnecessary detail, and it aligns directly with the changes in the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/fix_query_properties_for_custom_requesters

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 751d1f3 and 529e0b1.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

3193-3229: LGTM! Clean separation of QueryProperties handling.

The approach of migrating QueryProperties from dicts to models, extracting them, and then removing them from request_parameters is well thought out. The hasattr guard at line 3224 properly handles CustomRequester cases where request_parameters might not exist.

The mutation of model.requester.request_parameters in place is intentional since QueryProperties will be handled separately via the additional_query_properties parameter of SimpleRetriever.


3366-3377: LGTM! Clearer method name.

Renaming to _has_query_properties_in_request_parameters with the has_ prefix makes it immediately clear that this method returns a boolean, which aligns well with Python naming conventions.


2403-2403: No extra defensive check needed for create_http_requester.
Calls to create_http_requester only occur via the mapping in create_simple_retriever, where QueryProperties are stripped beforehand, so the current type ignore is safe. If in the future this method is reused elsewhere, should we consider adding an assertion or docstring to enforce its precondition? wdyt?


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

PyTest Results (Fast)

3 780 tests  +1   3 768 ✅ +1   6m 27s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 529e0b1. ± Comparison against base commit c67570b.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3193-3227: Consider consolidating QueryProperties cleanup logic.

The query properties handling is now split between create_http_requester (where the type error occurs) and create_simple_retriever (where the cleanup happens). This separation creates the pipeline failures we're seeing.

Since QueryProperties need to be removed before being passed to InterpolatedRequestOptionsProvider, should we extract this cleanup logic into a helper method that both create_http_requester and create_simple_retriever can call? This would:

  1. Fix the pipeline failures
  2. Reduce code duplication
  3. Make the flow clearer: "ensure models, extract query properties, remove from params, proceed"

Alternatively, should the cleanup happen earlier in the pipeline, perhaps right after the model is parsed? Wdyt?

🧹 Nitpick comments (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

1193-1258: Enhance assertions to verify QueryProperties handling?

This test validates that a CustomRequester with QueryProperties can be created and read from without errors, which is great! However, the test could be more thorough in verifying the core functionality being fixed, wdyt?

Consider adding assertions to verify:

  1. That the QueryProperties are correctly extracted from request_parameters and moved to the retriever (as mentioned in the PR description)
  2. That the HTTP request was made with the expected parameters (you could inspect requests_mock.request_history to verify the actual request)
  3. That property chunking configuration is properly accessible via the retriever's additional_query_properties

For example:

# Verify QueryProperties were removed from request parameters
retriever = get_retriever(stream)
request_options_provider = retriever.requester.request_options_provider
assert isinstance(request_options_provider, InterpolatedRequestOptionsProvider)
assert "query" not in request_options_provider.request_parameters
assert request_options_provider.request_parameters.get("not_query") == 1

# Verify QueryProperties are accessible via retriever
assert retriever.additional_query_properties is not None
assert retriever.additional_query_properties.property_list == ["id", "field"]

This would provide stronger evidence that the fix for custom requesters + query properties is working correctly.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3179-3200: Potential runtime error: modifying dictionary during iteration.

The code iterates over request_parameters.keys() and then modifies request_parameters[request_parameter_key] within the loop. While this specific pattern (iterating over .keys()) is safe in Python 3.7+, there are a couple of concerns:

  1. Line 3191 uses isinstance(request_parameters, Dict) which is stricter than necessary. Should this be Mapping to handle a broader range of dict-like objects?

  2. The method modifies the model in place, which could be problematic if the model instance is reused elsewhere. Is this intentional?

Also, minor style note: the loop could be simplified using items() since you're accessing both keys and values.

Wdyt about these adjustments?

Consider this refactor for better type handling:

 def _ensure_query_properties_to_model(
     self, requester: Union[HttpRequesterModel, CustomRequesterModel]
 ) -> None:
     """
     For some reason, it seems like CustomRequesterModel request_parameters stays as dictionaries which means that
     the other conditions relying on it being QueryPropertiesModel instead of a dict fail. Here, we migrate them to
     proper model.
     """
     if not hasattr(requester, "request_parameters"):
         return

     request_parameters = requester.request_parameters
-    if request_parameters and isinstance(request_parameters, Dict):
-        for request_parameter_key in request_parameters.keys():
-            request_parameter = request_parameters[request_parameter_key]
+    if request_parameters and isinstance(request_parameters, Mapping):
+        for request_parameter_key, request_parameter in request_parameters.items():
             if (
                 isinstance(request_parameter, Dict)
                 and request_parameter.get("type") == "QueryProperties"
             ):
                 request_parameters[request_parameter_key] = QueryPropertiesModel.parse_obj(
                     request_parameter
                 )
unit_tests/sources/declarative/parsers/testing_components.py (1)

93-107: Verify TestingRequester matches actual CustomRequester behavior.

The TestingRequester manually creates an InterpolatedRequestOptionsProvider with only request_parameters. However, looking at the production code in create_http_requester (lines 2398-2407), the InterpolatedRequestOptionsProvider is initialized with many more fields: request_body, request_body_data, request_body_json, request_headers, and query_properties_key.

Should TestingRequester also initialize these other fields to more accurately represent how a real custom requester would behave? Or is the simplified version sufficient for the test scenarios you're targeting?

Also, is there a corresponding test that exercises this TestingRequester with QueryProperties to validate the fix works end-to-end?

Consider adding the missing fields for completeness:

 def __post_init__(self, parameters: Mapping[str, Any]) -> None:
     """
     Initializes the request options provider with the provided parameters and any
     configured request components like headers, parameters, or bodies.
     """
     self.request_options_provider = InterpolatedRequestOptionsProvider(
         request_parameters=self.request_parameters,
+        request_headers=None,
+        request_body_data=None,
+        request_body_json=None,
         config=self.config,
         parameters=parameters or {},
     )
     super().__post_init__(parameters)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c67570b and 57cbdd9.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
  • unit_tests/sources/declarative/parsers/testing_components.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
unit_tests/sources/declarative/parsers/testing_components.py (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
  • HttpRequester (2529-2668)
  • RequestOption (1100-1119)
  • DefaultErrorHandler (1950-1978)
airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py (1)
  • InterpolatedRequestOptionsProvider (31-178)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (5)
airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
  • _parse (62-69)
airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py (1)
  • preprocess_manifest (102-107)
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)
  • propagate_types_and_parameters (87-188)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
  • create_component (789-822)
airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py (1)
  • read (99-125)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 2403-2403: mypy: Argument "request_parameters" to "InterpolatedRequestOptionsProvider" has incompatible type "dict[str, str | QueryProperties] | str | None"; expected "str | Mapping[str, str] | None".


[error] 3224-3226: mypy: Item "CustomRequester" of "HttpRequester | CustomRequester" has no attribute "request_parameters". (union-attr)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

4-4: LGTM!

The json import is appropriately used for serializing the mock response data in the new test.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3364-3374: LGTM on the method rename.

The rename from _query_properties_in_request_parameters to _has_query_properties_in_request_parameters better conveys that this is a boolean check. The implementation looks good.

unit_tests/sources/declarative/parsers/testing_components.py (1)

11-22: LGTM on the new imports.

The imports are necessary for the new TestingRequester class and are correctly organized.

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

PyTest Results (Full)

3 783 tests  +1   3 771 ✅ +1   11m 14s ⏱️ +20s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 529e0b1. ± Comparison against base commit c67570b.

♻️ This comment has been updated with latest results.

coderabbitai[bot]
coderabbitai bot previously requested changes Oct 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57cbdd9 and 751d1f3.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 3221-3221: Ruff formatting change required. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py' (or 'ruff format .') to fix formatting.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)

Copy link
Contributor

@darynaishchenko darynaishchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@maxi297 maxi297 merged commit b28c6e3 into main Oct 8, 2025
28 of 29 checks passed
@maxi297 maxi297 deleted the maxi297/fix_query_properties_for_custom_requesters branch October 8, 2025 12:52
maxi297 added a commit to airbytehq/airbyte that referenced this pull request Oct 14, 2025
## What
Addresses
airbytehq/airbyte-internal-issues#14717

## How
Update the CDK version to pick
airbytehq/airbyte-python-cdk#783 up

## Review guide
<!--
1. `x.py`
2. `y.py`
-->

## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them. 
-->

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO ❌
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants