feat: cache properties from endpoint #808

maxi297 · 2025-10-22T13:37:39Z

What

https://github.com/airbytehq/airbyte-internal-issues/issues/14929

How

Adding a cache within PropertiesFromEndpoint. Note that with this solution, every instance of PropertiesFromEndpoint will have a different cache so it may be that the same stream as a child/main stream vs as a parent stream have different properties from endpoint if a field is added between the read of those streams. I don't see a case where this would be problematic though.

Summary by CodeRabbit

Performance Improvements
- Endpoint properties are cached to avoid redundant retrievals on repeated calls.
Bug Fixes
- Property values are consistently returned as strings.
- Chunking no longer mutates input property lists.
Improvements
- Simplified public API for property chunking and request-property generation (fewer parameters) and more consistent handling of property field types.
Tests
- New tests cover single-call caching, type coercion, and input immutability.
Documentation
- Clarified note that stream slices can't be interpolated from this retriever.

github-actions · 2025-10-22T13:37:58Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/cache_properties_from_endpoint#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/cache_properties_from_endpoint

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

coderabbitai · 2025-10-22T13:43:16Z

📝 Walkthrough

Walkthrough

PropertiesFromEndpoint now caches computed property names and returns them as a List[str]; query-property APIs were tightened to accept List[str] and removed stream_slice parameters; call sites and tests updated to match new signatures and verify caching and type coercion.

Changes

Cohort / File(s)	Summary
Core Caching Implementation `airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py`	Added private `_cached_properties: Optional[List[str]] = None`. Changed `get_properties_from_endpoint` to return `List[str]`, build and cache properties on first call (invoking `retriever.read_records` with an internally constructed empty slice), and added helper `_get_property(property_obj: Mapping[str, Any]) -> str` to extract and coerce values.
Type Annotation Tightening & Signature Changes `airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py`, `airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py`	Tightened `property_fields` typing to `List[str]`. Updated `QueryProperties.get_request_property_chunks` signature to remove `stream_slice` and return `Iterable[List[str]]`. Internal chunk formation logic unchanged.
Retriever Call-site Updates & Cleanup `airbyte_cdk/sources/declarative/retrievers/simple_retriever.py`	Updated calls to `get_request_property_chunks()` (no `stream_slice` arg). For each chunk, constructs a `StreamSlice` with `extra_fields={"query_properties": properties}`. Removed an unused local variable in `read_records`.
Unit Tests — PropertiesFromEndpoint `unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py`	Tests updated to treat `get_properties_from_endpoint()` as returning `List[str]`. Added tests asserting retriever called only once across multiple calls and that integer property values are coerced to strings.
Unit Tests — Property Chunking `unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py`	Removed iterator-conversion in test setup; added test ensuring `get_request_property_chunks` does not mutate the input `property_fields` when `always_include_properties` is provided.
Unit Tests — QueryProperties `unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py`	Updated tests to call `get_request_property_chunks()` without `stream_slice` and reflect tightened typing/behavior.
Schema doc `airbyte_cdk/sources/declarative/declarative_component_schema.yaml`	Documentation note added: "Note that stream_slices can't be interpolated from this retriever." No code change.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant PropertiesFromEndpoint
    participant Retriever

    Caller->>PropertiesFromEndpoint: get_properties_from_endpoint()
    alt cached not set
        PropertiesFromEndpoint->>Retriever: read_records(stream_slice = {"partition": {}, "cursor_slice": {}})
        Retriever-->>PropertiesFromEndpoint: iterable records
        rect rgb(220,240,220)
            PropertiesFromEndpoint->>PropertiesFromEndpoint: for each record -> _get_property -> build List[str]
            PropertiesFromEndpoint->>PropertiesFromEndpoint: store List[str] in _cached_properties
        end
        PropertiesFromEndpoint-->>Caller: return List[str]
    else cached
        PropertiesFromEndpoint-->>Caller: return cached List[str]
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(query_properties): Add configured catalog to SimpleRetriever and only fetch properties that are included in the stream schema #788: Modifies query-properties flow and SimpleRetriever call sites; closely related to these signature/callsite edits.
feat(property-chunking): Allow the definition of query_properties on the HttpRequester component #638: Touches QueryProperties / PropertiesFromEndpoint and property-chunking usage; overlaps with API/behavior changes here.
fix(property chunking): Switch the ordering page iteration and property chunking process chunks first instead of pages first #487: Adjusts SimpleRetriever._read_pages and property-chunk handling; directly relevant to retriever changes in this PR.

Suggested reviewers

pnilan
darynaishchenko

Should cached properties ever be invalidated (e.g., if endpoint values can change per-run or per-slice), or is it acceptable to assume endpoint properties remain stable for the connector's lifetime — wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: cache properties from endpoint" directly aligns with the primary objective of the changeset. The main modification introduces an in-instance cache field `_cached_properties` within `PropertiesFromEndpoint` and refactors `get_properties_from_endpoint` to return a cached `List[str]` instead of streaming results. The title accurately captures this core feature in a concise, clear manner that uses conventional commit format. A teammate scanning the repository history would immediately understand that this PR adds caching functionality for endpoint properties without ambiguity.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch maxi297/cache_properties_from_endpoint

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (1)
1-1: Don't forget the formatting fix!

The pipeline is reporting formatting issues for this file too. Could you run ruff format to fix them, wdyt?
#!/bin/bash
cd unit_tests/sources/declarative/requesters/query_properties
ruff format test_properties_from_endpoint.py

🧹 Nitpick comments (1)

unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1)

108-128: Test logic is solid—just needs a formatting touch-up.

The test is excellent for validating the non-mutation guarantee. However, verification confirms the pipeline was right: the file is missing a final newline at the end.

Could you run ruff format unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py to add it, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20ae208 and ad730cf.

📒 Files selected for processing (5)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2 hunks)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1 hunks)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1 hunks)
unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3 hunks)
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1)

airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (3)

PropertyChunking (25-71)

PropertyLimitType (14-21)

get_request_property_chunks (42-68)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)

airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)

InterpolatedString (13-79)

airbyte_cdk/sources/types.py (1)

StreamSlice (75-169)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

read_records (512-554)

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

get_properties_from_endpoint (34-37)

PropertiesFromEndpoint (15-44)

airbyte_cdk/sources/types.py (5)

StreamSlice (75-169)

cursor_slice (107-112)

partition (99-104)

Record (21-72)

data (35-36)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

read_records (512-554)

🪛 GitHub Actions: Linters

unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-intercom
GitHub Check: Check: source-hardcoded-records
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-shopify
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: SDM Docker Image Build
GitHub Check: Manifest Server Docker Image Build

🔇 Additional comments (6)

airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)

42-44: LGTM! Type tightening aligns with implementation.

The change from Iterable[str] to List[str] matches the actual usage pattern—the function converts to a list immediately on line 46 anyway. This makes the contract more explicit.

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)

37-41: LGTM! Type annotation correctly tightened.

The local fields variable type now accurately reflects that get_properties_from_endpoint returns List[str], making the typing more precise.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)

39-44: Nice helper method for property extraction!

The _get_property method cleanly handles path evaluation and type coercion to string. The logic for handling both string and InterpolatedString nodes looks solid.

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3)

47-50: LGTM! Test updated for new return type.

Removing the list() wrapper is correct since get_properties_from_endpoint now returns List[str] directly.

136-156: Good caching test, but consider testing different stream_slice values?

This test verifies that the retriever is called only once, which is great! However, it uses the same stream_slice for all three calls. Given my concern about the cache not accounting for different slices (see my comment on properties_from_endpoint.py), it might be valuable to add a test that uses different stream_slice values to verify the expected behavior, wdyt?

158-177: Excellent test for type coercion!

This test ensures that integer property values are correctly converted to strings, which is important for the consistent List[str] return type. Nice edge case coverage!

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

coderabbitai · 2025-10-22T13:43:20Z

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

+    def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
+        if self._cached_properties is None:
+            self._cached_properties = list(map(self._get_property, self.retriever.read_records(records_schema={}, stream_slice=stream_slice)))  # type: ignore # extracted will be a MutableMapping, given input data structure
+        return self._cached_properties


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical: Cache doesn't account for stream_slice parameter.

The caching logic assumes all calls return the same properties, but stream_slice is passed to retriever.read_records. If different stream_slice values produce different properties, subsequent calls with different slices will incorrectly return the first cached result.

Could you verify whether stream_slice affects the properties returned? If it does, the cache should be keyed by the slice. If not, perhaps the parameter could be removed or documented as unused, wdyt?

Run this script to check if any retriever implementations use stream_slice when reading property records:

🏁 Script executed:

#!/bin/bash # Search for retriever implementations that might use stream_slice in read_records # Look for read_records method implementations and usage patterns ast-grep --pattern $'def read_records($$$, stream_slice$$$): $$$' # Also check if PropertiesFromEndpoint is used in contexts where stream_slice varies rg -n -A5 -B5 'PropertiesFromEndpoint' --type=py

Length of output: 42887

Now let me search for test cases and actual usage to confirm whether stream_slice variations would cause issues:

Based on my verification, the original review comment raises a valid and critical concern. Here's what I found:

Stream slices typically return a list of slices each with a different hourly timestamp to be used when creating requests, and the stream slice description (the input dict) is passed as input to each stream's read_stream method so the connector can use that to make queries as needed. The codebase demonstrates this through its factory comments that explicitly state the properties endpoint is called for every slice.

The factory code's comment—"properties endpoint will be called for every slice"—directly contradicts the current implementation. While there is HTTP-level caching on the retriever, the _cached_properties field on PropertiesFromEndpoint is keyed only by initialization, not by stream_slice. This means if the endpoint's response depends on the stream_slice parameter (which is clearly possible given it's passed as a parameter), you'll get incorrect results after the first slice.

Cache doesn't account for stream_slice parameter.

The _cached_properties field caches properties globally on the PropertiesFromEndpoint instance without considering the stream_slice parameter. Since stream_slice is passed to retriever.read_records() and can vary across multiple stream partitions (different dates, IDs, etc.), different slices may return different properties—but the cache will return the first result regardless.

If the properties endpoint accepts stream_slice as a filter (e.g., per-date properties), this will cause incorrect data to be used for all slices after the first one. Consider whether the cache should be keyed by stream_slice, or if the parameter is unused and could be removed or documented, wdyt?

@brianjlai this comment make sense to me but I'm also confused as to why the stream slice would be useful in extracting properties. Do you have more context on this?

I think this comment only makes sense if we allow for the stream_slice or whatever to be a part of the interpolation context for the API request we make to fetch the properties. It doesn't really feel like we have a good use case for needing stream slice to be in the interpolation context because schemas are 1:1 with the stream and therefore global. I can't think of an api that would have different properties or schemas for different date ranges or parent records.

We would also lose a bit of optimization if we keyed by stream slice too since all date ranges would have to make at least 1 HTTP request to cache versus using the global properties on the initial request.

We could just:

Not include the stream_slice context in the HttpRequester component since we control what we paste into the

Raise an error if the properties from endpoint attempts to use the stream slice context

I feel like we won't run into this issue very often if ever since stream slice doesn't make sense to perform schema discovery. but this extra work would just be to protect against it. I'm not worried though

Ok! So I think the step forward is to remove the stream_slice from the properties from endpoint stuff. I'll look at it a bit more then, stay tuned!

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

github-actions · 2025-10-22T13:45:28Z

PyTest Results (Fast)

3 805 tests +3 3 793 ✅ +3 6m 27s ⏱️ -16s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit d22170d. ± Comparison against base commit 20ae208.

♻️ This comment has been updated with latest results.

maxi297 · 2025-10-22T13:46:42Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

maxi297 · 2025-10-22T13:57:41Z

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

+        if self._cached_properties is None:
+            self._cached_properties = list(
+                map(
+                    self._get_property,  # type: ignore # SimpleRetriever and AsyncRetriever only returns Record. Should we change the return type of Retriever.read_records?


The previous # type: ignore was catching both the return type of Retriever.read_records and the Mapping being actually a MutableMapping

maxi297 · 2025-10-22T13:58:16Z

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

-            ]
-            yield dpath.get(property_obj, path, default=[])  # type: ignore # extracted will be a MutableMapping, given input data structure
+    def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
+        if self._cached_properties is None:


Concern to be resolved as if the cache should be per StreamSlice

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebac375 and f76ee08.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (0 hunks)

💤 Files with no reviewable changes (1)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py

🧰 Additional context used

🧬 Code graph analysis (1)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)

airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)

InterpolatedString (13-79)

airbyte_cdk/sources/types.py (1)

StreamSlice (75-169)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

read_records (512-553)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-hardcoded-records
GitHub Check: Check: source-intercom
GitHub Check: Check: source-shopify
GitHub Check: Check: destination-motherduck
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Manifest Server Docker Image Build
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

github-actions · 2025-10-22T14:09:04Z

PyTest Results (Full)

3 808 tests 3 796 ✅ 11m 15s ⏱️
1 suites 12 💤
1 files 0 ❌

Results for commit d22170d.

♻️ This comment has been updated with latest results.

brianjlai

left a few notes, but I think this change makes sense and nothing to block. Let me know how you feel about the notes I mentioned.

brianjlai · 2025-10-23T17:01:13Z

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

+    def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
+        if self._cached_properties is None:
+            self._cached_properties = list(map(self._get_property, self.retriever.read_records(records_schema={}, stream_slice=stream_slice)))  # type: ignore # extracted will be a MutableMapping, given input data structure
+        return self._cached_properties


I think this comment only makes sense if we allow for the stream_slice or whatever to be a part of the interpolation context for the API request we make to fetch the properties. It doesn't really feel like we have a good use case for needing stream slice to be in the interpolation context because schemas are 1:1 with the stream and therefore global. I can't think of an api that would have different properties or schemas for different date ranges or parent records.

We would also lose a bit of optimization if we keyed by stream slice too since all date ranges would have to make at least 1 HTTP request to cache versus using the global properties on the initial request.

We could just:

Not include the stream_slice context in the HttpRequester component since we control what we paste into the

Raise an error if the properties from endpoint attempts to use the stream slice context

I feel like we won't run into this issue very often if ever since stream slice doesn't make sense to perform schema discovery. but this extra work would just be to protect against it. I'm not worried though

brianjlai · 2025-10-23T17:09:19Z

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

-                for node in self._property_field_path
-            ]
-            yield dpath.get(property_obj, path, default=[])  # type: ignore # extracted will be a MutableMapping, given input data structure
+    def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:


one risk that this does pose because we cache is if we have a crazy large number of properties and we run out of memory. That was why I had originally made this use an iterable so that we yield properties as they come in.

Granted it feels pretty necessary in order to allow caching, but it does also help make the argument for only maintaining a global set of property mapping. Otherwise if we had a large set of properties and we had to store it for every slice that might be more concerning

The other risk I've thought of is if someone removes a property during the sync. Then, it might fail because we ask for a property that HubSpot doesn't know about. That being said, I assume this happens rarely and if it's too much of a pain, we can have a refresh on properties maybe later...

I've pushed the change. I know declarative_component_schema.yaml reference the SimpleRetriever component which allows for interpolation on StreamSlices but I don't have an easy way to remove that. Do you think we should improve the description of PropertiesFromEndpoint to clarify this or we don't think people would do anything in the context of stream slice for properties retrieval?

maxi297 · 2025-10-23T19:27:13Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
43-48: Missing property path returns literal string "[]" instead of empty string.

When dpath.get doesn't find the property path, it returns the default value [] (empty list), which str() then converts to the literal string "[]". This means properties missing from the endpoint response will appear as the string "[]" in the property list rather than an empty string or being filtered out.

Is this the intended behavior, or should missing properties yield an empty string (using default="") or be skipped entirely, wdyt?

If you'd prefer empty strings for missing properties, apply this diff:
-        return str(dpath.get(property_obj, path, default=[]))  # type: ignore # extracted will be a MutableMapping, given input data structure
+        result = dpath.get(property_obj, path, default="")
+        return str(result) if result else ""  # type: ignore # extracted will be a MutableMapping, given input data structure
Alternatively, if missing properties should be skipped, you could filter them out in get_properties_from_endpoint instead.

🧹 Nitpick comments (1)

unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (1)
88-90: Mock return type should match the actual implementation.

The mock returns an iterator (iter([...])) but get_properties_from_endpoint now returns a concrete List[str]. While Python's duck typing makes this work in practice, the mock should match the actual type for accuracy and to catch potential type-related issues, wdyt?

Apply this diff to align the mock with the actual return type:
-    properties_from_endpoint_mock.get_properties_from_endpoint.return_value = iter(
-        ["alice", "clover", "dio", "k", "luna", "phi", "quark", "sigma", "tenmyouji"]
-    )
+    properties_from_endpoint_mock.get_properties_from_endpoint.return_value = [
+        "alice", "clover", "dio", "k", "luna", "phi", "quark", "sigma", "tenmyouji"
+    ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76ee08 and 782a7b1.

📒 Files selected for processing (5)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1 hunks)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3 hunks)
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py

🧰 Additional context used

🧬 Code graph analysis (3)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)

airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)

InterpolatedString (13-79)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

read_records (510-551)

airbyte_cdk/sources/types.py (3)

StreamSlice (75-169)

partition (99-104)

cursor_slice (107-112)

unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (2)

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)

get_request_property_chunks (28-46)

airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)

get_request_property_chunks (42-68)

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (3)

airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)

get_request_property_chunks (42-68)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

PropertiesFromEndpoint (14-48)

get_properties_from_endpoint (33-41)

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

PropertiesFromEndpoint (2771-2782)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-shopify
GitHub Check: Check: source-intercom
GitHub Check: Check: source-hardcoded-records
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Manifest Server Docker Image Build
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (Fast)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

25-26: LGTM! Cache field appropriately initialized.

The _cached_properties field is properly typed as Optional[List[str]] and initialized to None, enabling the lazy-load pattern implemented below.

33-41: LGTM! Caching implementation is sound.

The caching logic correctly checks for None on first call, populates the cache via map, and returns the cached result on subsequent calls. This ensures properties are fetched once per instance, as intended.

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
10-10: Minor: Consider removing unused StreamSlice import?

After removing the stream_slice parameter from get_properties_from_endpoint, the StreamSlice import on line 10 appears to be unused. Would you like to clean it up to keep imports minimal, wdyt?
#!/bin/bash
# Verify if StreamSlice is used anywhere in the file
rg -n "StreamSlice" airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py | grep -v "^10:"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d22170d and 2adf3ea.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)

InterpolatedString (13-79)

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

read_records (512-553)

🔇 Additional comments (4)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

3328-3328: LGTM! Clear documentation of the stream_slice limitation.

The note about stream_slices not being interpolatable from this retriever is a helpful clarification that aligns with the API changes in properties_from_endpoint.py. This will prevent users from attempting to use stream_slice context where it's not supported.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)

25-26: Good addition of the cache field.

The _cached_properties field with Optional[List[str]] type hint and None initialization is a clean way to implement lazy caching. The private naming convention is appropriate since this is an internal implementation detail.

33-41: Clean caching implementation for property retrieval.

The caching logic is straightforward and effective:

On first call, retrieves records and maps them through _get_property

Subsequent calls return the cached list

Using stream_slice=None aligns with the discussions about properties being global to the stream

One question: Line 37's # type: ignore comment mentions that the return type of Retriever.read_records might need updating. Is this something that should be addressed at the interface level, or is the type: ignore acceptable here, wdyt?

43-48: Clarify the intended behavior for missing property fields.

The tests all cover records with complete data—none test scenarios where the property field path is missing from a record. When dpath.get doesn't find the path, it returns [] (the default), which str() converts to the literal string "[]". This means incomplete records would contribute "[]" to the cached properties list.

The gap in test coverage makes it unclear whether this behavior is intentional. Should records with missing property fields contribute empty strings ("") instead, or is the "[]" behavior by design? Could you verify this and add a test case for missing properties to clarify the expected behavior, wdyt?

maxi297 added 2 commits October 22, 2025 09:23

cache properties from endpoint

cd1dc82

fix typing and add tests

ad730cf

maxi297 requested a review from brianjlai October 22, 2025 13:37

github-actions bot added the enhancement New feature or request label Oct 22, 2025

coderabbitai bot requested changes Oct 22, 2025

View reviewed changes

octavia-squidington-iii and others added 3 commits October 22, 2025 13:47

Auto-fix lint and format issues

ebac375

mypy

0d73b0d

clean up

f76ee08

maxi297 commented Oct 22, 2025

View reviewed changes

coderabbitai bot requested changes Oct 22, 2025

View reviewed changes

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py Show resolved Hide resolved

brianjlai approved these changes Oct 23, 2025

View reviewed changes

remove streamslice from properties_from_endpoint

782a7b1

maxi297 requested a review from brianjlai October 23, 2025 19:23

coderabbitai bot requested changes Oct 23, 2025

View reviewed changes

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py Show resolved Hide resolved

octavia-squidington-iii and others added 3 commits October 23, 2025 19:28

Auto-fix lint and format issues

1222b18

update doc

d22170d

code review

2adf3ea

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

feat: cache properties from endpoint #808

Are you sure you want to change the base?

feat: cache properties from endpoint #808

Uh oh!

Conversation

maxi297 commented Oct 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 22, 2025

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

Helpful Resources

PR Slash Commands

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

maxi297 commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

brianjlai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxi297 commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

maxi297 commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

github-actions bot commented Oct 22, 2025 •

edited

Loading

maxi297 commented Oct 22, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Oct 22, 2025 •

edited

Loading

maxi297 commented Oct 23, 2025 •

edited by github-actions bot

Loading