Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Oct 22, 2025

What

https://github.com/airbytehq/airbyte-internal-issues/issues/14929

How

Adding a cache within PropertiesFromEndpoint. Note that with this solution, every instance of PropertiesFromEndpoint will have a different cache so it may be that the same stream as a child/main stream vs as a parent stream have different properties from endpoint if a field is added between the read of those streams. I don't see a case where this would be problematic though.

Summary by CodeRabbit

  • Performance Improvements

    • Endpoint properties are cached to avoid redundant retrievals on repeated calls.
  • Bug Fixes

    • Property values are consistently returned as strings.
    • Chunking no longer mutates input property lists.
  • Improvements

    • Simplified public API for property chunking and request-property generation (fewer parameters) and more consistent handling of property field types.
  • Tests

    • New tests cover single-call caching, type coercion, and input immutability.
  • Documentation

    • Clarified note that stream slices can't be interpolated from this retriever.

@maxi297 maxi297 requested a review from brianjlai October 22, 2025 13:37
@github-actions github-actions bot added the enhancement New feature or request label Oct 22, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/cache_properties_from_endpoint#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/cache_properties_from_endpoint

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

📝 Walkthrough

Walkthrough

PropertiesFromEndpoint now caches computed property names and returns them as a List[str]; query-property APIs were tightened to accept List[str] and removed stream_slice parameters; call sites and tests updated to match new signatures and verify caching and type coercion.

Changes

Cohort / File(s) Summary
Core Caching Implementation
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py
Added private _cached_properties: Optional[List[str]] = None. Changed get_properties_from_endpoint to return List[str], build and cache properties on first call (invoking retriever.read_records with an internally constructed empty slice), and added helper _get_property(property_obj: Mapping[str, Any]) -> str to extract and coerce values.
Type Annotation Tightening & Signature Changes
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py, airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
Tightened property_fields typing to List[str]. Updated QueryProperties.get_request_property_chunks signature to remove stream_slice and return Iterable[List[str]]. Internal chunk formation logic unchanged.
Retriever Call-site Updates & Cleanup
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
Updated calls to get_request_property_chunks() (no stream_slice arg). For each chunk, constructs a StreamSlice with extra_fields={"query_properties": properties}. Removed an unused local variable in read_records.
Unit Tests — PropertiesFromEndpoint
unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py
Tests updated to treat get_properties_from_endpoint() as returning List[str]. Added tests asserting retriever called only once across multiple calls and that integer property values are coerced to strings.
Unit Tests — Property Chunking
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py
Removed iterator-conversion in test setup; added test ensuring get_request_property_chunks does not mutate the input property_fields when always_include_properties is provided.
Unit Tests — QueryProperties
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py
Updated tests to call get_request_property_chunks() without stream_slice and reflect tightened typing/behavior.
Schema doc
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Documentation note added: "Note that stream_slices can't be interpolated from this retriever." No code change.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant PropertiesFromEndpoint
    participant Retriever

    Caller->>PropertiesFromEndpoint: get_properties_from_endpoint()
    alt cached not set
        PropertiesFromEndpoint->>Retriever: read_records(stream_slice = {"partition": {}, "cursor_slice": {}})
        Retriever-->>PropertiesFromEndpoint: iterable records
        rect rgb(220,240,220)
            PropertiesFromEndpoint->>PropertiesFromEndpoint: for each record -> _get_property -> build List[str]
            PropertiesFromEndpoint->>PropertiesFromEndpoint: store List[str] in _cached_properties
        end
        PropertiesFromEndpoint-->>Caller: return List[str]
    else cached
        PropertiesFromEndpoint-->>Caller: return cached List[str]
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • pnilan
  • darynaishchenko

Should cached properties ever be invalidated (e.g., if endpoint values can change per-run or per-slice), or is it acceptable to assume endpoint properties remain stable for the connector's lifetime — wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: cache properties from endpoint" directly aligns with the primary objective of the changeset. The main modification introduces an in-instance cache field _cached_properties within PropertiesFromEndpoint and refactors get_properties_from_endpoint to return a cached List[str] instead of streaming results. The title accurately captures this core feature in a concise, clear manner that uses conventional commit format. A teammate scanning the repository history would immediately understand that this PR adds caching functionality for endpoint properties without ambiguity.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/cache_properties_from_endpoint

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (1)

1-1: Don't forget the formatting fix!

The pipeline is reporting formatting issues for this file too. Could you run ruff format to fix them, wdyt?

#!/bin/bash
cd unit_tests/sources/declarative/requesters/query_properties
ruff format test_properties_from_endpoint.py
🧹 Nitpick comments (1)
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1)

108-128: Test logic is solid—just needs a formatting touch-up.

The test is excellent for validating the non-mutation guarantee. However, verification confirms the pipeline was right: the file is missing a final newline at the end.

Could you run ruff format unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py to add it, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20ae208 and ad730cf.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2 hunks)
  • airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1 hunks)
  • unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3 hunks)
  • unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (3)
  • PropertyChunking (25-71)
  • PropertyLimitType (14-21)
  • get_request_property_chunks (42-68)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • read_records (512-554)
unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
  • get_properties_from_endpoint (34-37)
  • PropertiesFromEndpoint (15-44)
airbyte_cdk/sources/types.py (5)
  • StreamSlice (75-169)
  • cursor_slice (107-112)
  • partition (99-104)
  • Record (21-72)
  • data (35-36)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • read_records (512-554)
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/requesters/query_properties/test_property_chunking.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py

[error] 1-1: Ruff format check failed. 3 files would be reformatted. Run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)

42-44: LGTM! Type tightening aligns with implementation.

The change from Iterable[str] to List[str] matches the actual usage pattern—the function converts to a list immediately on line 46 anyway. This makes the contract more explicit.

airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)

37-41: LGTM! Type annotation correctly tightened.

The local fields variable type now accurately reflects that get_properties_from_endpoint returns List[str], making the typing more precise.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)

39-44: Nice helper method for property extraction!

The _get_property method cleanly handles path evaluation and type coercion to string. The logic for handling both string and InterpolatedString nodes looks solid.

unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3)

47-50: LGTM! Test updated for new return type.

Removing the list() wrapper is correct since get_properties_from_endpoint now returns List[str] directly.


136-156: Good caching test, but consider testing different stream_slice values?

This test verifies that the retriever is called only once, which is great! However, it uses the same stream_slice for all three calls. Given my concern about the cache not accounting for different slices (see my comment on properties_from_endpoint.py), it might be valuable to add a test that uses different stream_slice values to verify the expected behavior, wdyt?


158-177: Excellent test for type coercion!

This test ensures that integer property values are correctly converted to strings, which is important for the consistent List[str] return type. Nice edge case coverage!

Comment on lines 34 to 37
def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
if self._cached_properties is None:
self._cached_properties = list(map(self._get_property, self.retriever.read_records(records_schema={}, stream_slice=stream_slice))) # type: ignore # extracted will be a MutableMapping, given input data structure
return self._cached_properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical: Cache doesn't account for stream_slice parameter.

The caching logic assumes all calls return the same properties, but stream_slice is passed to retriever.read_records. If different stream_slice values produce different properties, subsequent calls with different slices will incorrectly return the first cached result.

Could you verify whether stream_slice affects the properties returned? If it does, the cache should be keyed by the slice. If not, perhaps the parameter could be removed or documented as unused, wdyt?

Run this script to check if any retriever implementations use stream_slice when reading property records:


🏁 Script executed:

#!/bin/bash
# Search for retriever implementations that might use stream_slice in read_records
# Look for read_records method implementations and usage patterns
ast-grep --pattern $'def read_records($$$, stream_slice$$$):
  $$$'

# Also check if PropertiesFromEndpoint is used in contexts where stream_slice varies
rg -n -A5 -B5 'PropertiesFromEndpoint' --type=py

Length of output: 42887


Now let me search for test cases and actual usage to confirm whether stream_slice variations would cause issues:

Based on my verification, the original review comment raises a valid and critical concern. Here's what I found:

Stream slices typically return a list of slices each with a different hourly timestamp to be used when creating requests, and the stream slice description (the input dict) is passed as input to each stream's read_stream method so the connector can use that to make queries as needed. The codebase demonstrates this through its factory comments that explicitly state the properties endpoint is called for every slice.

The factory code's comment—"properties endpoint will be called for every slice"—directly contradicts the current implementation. While there is HTTP-level caching on the retriever, the _cached_properties field on PropertiesFromEndpoint is keyed only by initialization, not by stream_slice. This means if the endpoint's response depends on the stream_slice parameter (which is clearly possible given it's passed as a parameter), you'll get incorrect results after the first slice.


Cache doesn't account for stream_slice parameter.

The _cached_properties field caches properties globally on the PropertiesFromEndpoint instance without considering the stream_slice parameter. Since stream_slice is passed to retriever.read_records() and can vary across multiple stream partitions (different dates, IDs, etc.), different slices may return different properties—but the cache will return the first result regardless.

If the properties endpoint accepts stream_slice as a filter (e.g., per-date properties), this will cause incorrect data to be used for all slices after the first one. Consider whether the cache should be keyed by stream_slice, or if the parameter is unused and could be removed or documented, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianjlai this comment make sense to me but I'm also confused as to why the stream slice would be useful in extracting properties. Do you have more context on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment only makes sense if we allow for the stream_slice or whatever to be a part of the interpolation context for the API request we make to fetch the properties. It doesn't really feel like we have a good use case for needing stream slice to be in the interpolation context because schemas are 1:1 with the stream and therefore global. I can't think of an api that would have different properties or schemas for different date ranges or parent records.

We would also lose a bit of optimization if we keyed by stream slice too since all date ranges would have to make at least 1 HTTP request to cache versus using the global properties on the initial request.

We could just:

  1. Not include the stream_slice context in the HttpRequester component since we control what we paste into the
  2. Raise an error if the properties from endpoint attempts to use the stream slice context

I feel like we won't run into this issue very often if ever since stream slice doesn't make sense to perform schema discovery. but this extra work would just be to protect against it. I'm not worried though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! So I think the step forward is to remove the stream_slice from the properties from endpoint stuff. I'll look at it a bit more then, stay tuned!

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

PyTest Results (Fast)

3 805 tests  +3   3 793 ✅ +3   6m 27s ⏱️ -16s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit d22170d. ± Comparison against base commit 20ae208.

♻️ This comment has been updated with latest results.

@maxi297
Copy link
Contributor Author

maxi297 commented Oct 22, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

if self._cached_properties is None:
self._cached_properties = list(
map(
self._get_property, # type: ignore # SimpleRetriever and AsyncRetriever only returns Record. Should we change the return type of Retriever.read_records?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous # type: ignore was catching both the return type of Retriever.read_records and the Mapping being actually a MutableMapping

]
yield dpath.get(property_obj, path, default=[]) # type: ignore # extracted will be a MutableMapping, given input data structure
def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
if self._cached_properties is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern to be resolved as if the cache should be per StreamSlice

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebac375 and f76ee08.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (0 hunks)
💤 Files with no reviewable changes (1)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • read_records (512-553)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

PyTest Results (Full)

3 808 tests   3 796 ✅  11m 15s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit d22170d.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few notes, but I think this change makes sense and nothing to block. Let me know how you feel about the notes I mentioned.

Comment on lines 34 to 37
def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
if self._cached_properties is None:
self._cached_properties = list(map(self._get_property, self.retriever.read_records(records_schema={}, stream_slice=stream_slice))) # type: ignore # extracted will be a MutableMapping, given input data structure
return self._cached_properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment only makes sense if we allow for the stream_slice or whatever to be a part of the interpolation context for the API request we make to fetch the properties. It doesn't really feel like we have a good use case for needing stream slice to be in the interpolation context because schemas are 1:1 with the stream and therefore global. I can't think of an api that would have different properties or schemas for different date ranges or parent records.

We would also lose a bit of optimization if we keyed by stream slice too since all date ranges would have to make at least 1 HTTP request to cache versus using the global properties on the initial request.

We could just:

  1. Not include the stream_slice context in the HttpRequester component since we control what we paste into the
  2. Raise an error if the properties from endpoint attempts to use the stream slice context

I feel like we won't run into this issue very often if ever since stream slice doesn't make sense to perform schema discovery. but this extra work would just be to protect against it. I'm not worried though

for node in self._property_field_path
]
yield dpath.get(property_obj, path, default=[]) # type: ignore # extracted will be a MutableMapping, given input data structure
def get_properties_from_endpoint(self, stream_slice: Optional[StreamSlice]) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one risk that this does pose because we cache is if we have a crazy large number of properties and we run out of memory. That was why I had originally made this use an iterable so that we yield properties as they come in.

Granted it feels pretty necessary in order to allow caching, but it does also help make the argument for only maintaining a global set of property mapping. Otherwise if we had a large set of properties and we had to store it for every slice that might be more concerning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other risk I've thought of is if someone removes a property during the sync. Then, it might fail because we ask for a property that HubSpot doesn't know about. That being said, I assume this happens rarely and if it's too much of a pain, we can have a refresh on properties maybe later...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the change. I know declarative_component_schema.yaml reference the SimpleRetriever component which allows for interpolation on StreamSlices but I don't have an easy way to remove that. Do you think we should improve the description of PropertiesFromEndpoint to clarify this or we don't think people would do anything in the context of stream slice for properties retrieval?

@maxi297 maxi297 requested a review from brianjlai October 23, 2025 19:23
@maxi297
Copy link
Contributor Author

maxi297 commented Oct 23, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)

43-48: Missing property path returns literal string "[]" instead of empty string.

When dpath.get doesn't find the property path, it returns the default value [] (empty list), which str() then converts to the literal string "[]". This means properties missing from the endpoint response will appear as the string "[]" in the property list rather than an empty string or being filtered out.

Is this the intended behavior, or should missing properties yield an empty string (using default="") or be skipped entirely, wdyt?

If you'd prefer empty strings for missing properties, apply this diff:

-        return str(dpath.get(property_obj, path, default=[]))  # type: ignore # extracted will be a MutableMapping, given input data structure
+        result = dpath.get(property_obj, path, default="")
+        return str(result) if result else ""  # type: ignore # extracted will be a MutableMapping, given input data structure

Alternatively, if missing properties should be skipped, you could filter them out in get_properties_from_endpoint instead.

🧹 Nitpick comments (1)
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (1)

88-90: Mock return type should match the actual implementation.

The mock returns an iterator (iter([...])) but get_properties_from_endpoint now returns a concrete List[str]. While Python's duck typing makes this work in practice, the mock should match the actual type for accuracy and to catch potential type-related issues, wdyt?

Apply this diff to align the mock with the actual return type:

-    properties_from_endpoint_mock.get_properties_from_endpoint.return_value = iter(
-        ["alice", "clover", "dio", "k", "luna", "phi", "quark", "sigma", "tenmyouji"]
-    )
+    properties_from_endpoint_mock.get_properties_from_endpoint.return_value = [
+        "alice", "clover", "dio", "k", "luna", "phi", "quark", "sigma", "tenmyouji"
+    ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76ee08 and 782a7b1.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
  • unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py (3 hunks)
  • unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • unit_tests/sources/declarative/requesters/query_properties/test_properties_from_endpoint.py
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • read_records (510-551)
airbyte_cdk/sources/types.py (3)
  • StreamSlice (75-169)
  • partition (99-104)
  • cursor_slice (107-112)
unit_tests/sources/declarative/requesters/query_properties/test_query_properties.py (2)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
  • get_request_property_chunks (28-46)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)
  • get_request_property_chunks (42-68)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (3)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)
  • get_request_property_chunks (42-68)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
  • PropertiesFromEndpoint (14-48)
  • get_properties_from_endpoint (33-41)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • PropertiesFromEndpoint (2771-2782)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

25-26: LGTM! Cache field appropriately initialized.

The _cached_properties field is properly typed as Optional[List[str]] and initialized to None, enabling the lazy-load pattern implemented below.


33-41: LGTM! Caching implementation is sound.

The caching logic correctly checks for None on first call, populates the cache via map, and returns the cached result on subsequent calls. This ensures properties are fetched once per instance, as intended.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)

10-10: Minor: Consider removing unused StreamSlice import?

After removing the stream_slice parameter from get_properties_from_endpoint, the StreamSlice import on line 10 appears to be unused. Would you like to clean it up to keep imports minimal, wdyt?

#!/bin/bash
# Verify if StreamSlice is used anywhere in the file
rg -n "StreamSlice" airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py | grep -v "^10:"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d22170d and 2adf3ea.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • read_records (512-553)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

3328-3328: LGTM! Clear documentation of the stream_slice limitation.

The note about stream_slices not being interpolatable from this retriever is a helpful clarification that aligns with the API changes in properties_from_endpoint.py. This will prevent users from attempting to use stream_slice context where it's not supported.

airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (3)

25-26: Good addition of the cache field.

The _cached_properties field with Optional[List[str]] type hint and None initialization is a clean way to implement lazy caching. The private naming convention is appropriate since this is an internal implementation detail.


33-41: Clean caching implementation for property retrieval.

The caching logic is straightforward and effective:

  • On first call, retrieves records and maps them through _get_property
  • Subsequent calls return the cached list
  • Using stream_slice=None aligns with the discussions about properties being global to the stream

One question: Line 37's # type: ignore comment mentions that the return type of Retriever.read_records might need updating. Is this something that should be addressed at the interface level, or is the type: ignore acceptable here, wdyt?


43-48: Clarify the intended behavior for missing property fields.

The tests all cover records with complete data—none test scenarios where the property field path is missing from a record. When dpath.get doesn't find the path, it returns [] (the default), which str() converts to the literal string "[]". This means incomplete records would contribute "[]" to the cached properties list.

The gap in test coverage makes it unclear whether this behavior is intentional. Should records with missing property fields contribute empty strings ("") instead, or is the "[]" behavior by design? Could you verify this and add a test case for missing properties to clarify the expected behavior, wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants