-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore: Fix formatting and update slash commands #13
Conversation
WalkthroughThe pull request includes a series of changes across various workflow files, source files, and test files. Key modifications involve renaming workflows, updating job configurations, and enhancing error handling mechanisms. The workflows for auto-fixing, linting, and testing have been restructured for clarity and functionality, while source files have seen updates to interfaces and classes to improve abstraction and error handling. Additionally, numerous test files have been refined to enhance readability and coverage, ensuring robust validation of functionality. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
/fix-pr
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (23)
unit_tests/sources/streams/http/error_handlers/test_json_error_message_parser.py (1)
25-26
: LGTM! Consider adding nested error message test cases.The addition of the 'status_message' test case is a good enhancement to the error message parsing coverage. The test case follows the established pattern and maintains consistency with existing tests.
Consider adding test cases for nested error messages, which are common in API responses. For example:
# Additional test cases to consider: (b'{"error": {"message": "nested error message"}}', "nested error message"), (b'{"meta": {"error": {"description": "deeply nested error"}}}', "deeply nested error"),airbyte_cdk/test/catalog_builder.py (1)
Line range hint
53-64
: Consider enhancing the implementation with proper deprecation notice and improved readability.The implementation correctly handles both cases, but could benefit from some improvements:
- Add a proper deprecation warning for the string-based interface
- Move the implementation comments to a docstring
- Consider more descriptive variable naming
Here's a suggested improvement:
def with_stream(self, name: Union[str, ConfiguredAirbyteStreamBuilder], sync_mode: Union[SyncMode, None] = None) -> "CatalogBuilder": - # As we are introducing a fully fledge ConfiguredAirbyteStreamBuilder, we would like to deprecate the previous interface - # with_stream(str, SyncMode) + """Add a stream to the catalog. + + The string-based interface (name: str, sync_mode: SyncMode) is deprecated in favor of + using ConfiguredAirbyteStreamBuilder for more flexibility and type safety. + """ + if isinstance(name, str): + import warnings + warnings.warn( + "Using with_stream(str, SyncMode) is deprecated. Please use ConfiguredAirbyteStreamBuilder instead.", + DeprecationWarning, + stacklevel=2 + ) - # to avoid a breaking change, `name` needs to stay in the API but this can be either a name or a builder - name_or_builder = name - builder = ( - name_or_builder - if isinstance(name_or_builder, ConfiguredAirbyteStreamBuilder) - else ConfiguredAirbyteStreamBuilder().with_name(name_or_builder).with_sync_mode(sync_mode) - ) + stream_input = name # name parameter is overloaded to accept either str or builder + builder = (stream_input if isinstance(stream_input, ConfiguredAirbyteStreamBuilder) + else ConfiguredAirbyteStreamBuilder().with_name(stream_input).with_sync_mode(sync_mode)) self._streams.append(builder) return selfbin/generate_component_manifest_files.py (1)
72-72
: LGTM! The simplified await expression improves readability.The removal of redundant parentheses makes the code cleaner while maintaining the same functionality. Both awaits are necessary: first for
post_process_codegen
and second for the export operation.Consider adding error handling to gracefully handle potential failures during code generation or export:
- await (await post_process_codegen(codegen_container)).directory("/generated_post_processed").export(LOCAL_OUTPUT_DIR_PATH) + try: + processed_container = await post_process_codegen(codegen_container) + await processed_container.directory("/generated_post_processed").export(LOCAL_OUTPUT_DIR_PATH) + except Exception as e: + print(f"Error during code generation or export: {e}", file=sys.stderr) + sys.exit(1)airbyte_cdk/models/airbyte_protocol.py (1)
Line range hint
1-85
: Consider improving type checking setup to eliminate type ignore comments.The file contains multiple
# type: ignore [name-defined]
comments which might indicate room for improvement in the type system setup. Consider:
- Explicitly importing the types that are currently marked as undefined
- Using forward references (
from __future__ import annotations
) if there are circular dependencies- Creating a separate types module if the types are defined elsewhere
This would enhance type safety and make the codebase more maintainable.
🧰 Tools
🪛 Ruff
63-63:
AirbyteStateType
may be undefined, or defined from star imports(F405)
69-69:
AirbyteStateStats
may be undefined, or defined from star imports(F405)
70-70:
AirbyteStateStats
may be undefined, or defined from star imports(F405)
.github/workflows/test-command.yml (5)
Line range hint
14-39
: Consider security implications of storing raw PR JSON.The PR JSON might contain sensitive information. Consider filtering the JSON to only store required fields (sha, ref, etc.).
- PR_JSON=$(gh api "repos/${{ github.repository }}/pulls/${{ github.event.inputs.pr }}") - echo "$PR_JSON" > pr-info.json + PR_JSON=$(gh api "repos/${{ github.repository }}/pulls/${{ github.event.inputs.pr }}") + echo "$PR_JSON" | jq '{sha: .head.sha, ref: .head.ref}' > pr-info.jsonEnhance comment formatting for better visibility.
- body: | - - > PR test job started... [Check job output.][1] - - [1]: ${{ steps.pr-info.outputs.run-url }} + body: | + ### 🚀 Test Run Started + + [View test execution details](${{ steps.pr-info.outputs.run-url }})
Line range hint
40-54
: Enhance configuration documentation.Consider adding detailed comments explaining:
- Why Windows testing is currently disabled
- The specific scenarios where UTF-8 encoding issues occur
strategy: matrix: python-version: [ '3.10', '3.11', ] os: [ Ubuntu, - # Windows, # For now, we don't include Windows in the test matrix. + # Windows is temporarily disabled due to path handling issues + # Track progress: <link-to-issue> ] fail-fast: false runs-on: "${{ matrix.os }}-latest" env: - # Enforce UTF-8 encoding so Windows runners don't fail inside the connector code. - # TODO: See if we can fully enforce this within PyAirbyte itself. + # Windows runners may use different default encodings (e.g., cp1252) + # This ensures consistent UTF-8 handling for connector I/O operations + # TODO: Move this to PyAirbyte's core configuration PYTHONIOENCODING: utf-8
Line range hint
55-71
: Optimize and secure checkout process.
- The double checkout appears redundant
- Missing SHA verification between artifact and checked out code
steps: - name: Download PR info uses: actions/download-artifact@v4 with: name: pr-info +- name: Verify PR SHA + run: | + PR_SHA=$(jq -r .sha < pr-info.json) + echo "PR_SHA=$PR_SHA" >> $GITHUB_ENV + -- name: Checkout PR - uses: actions/checkout@v4 - with: - token: ${{ secrets.GITHUB_TOKEN }} - name: Checkout PR (${{ github.event.inputs.pr }}) uses: dawidd6/action-checkout-pr@v1 with: pr: ${{ github.event.inputs.pr }} + ref: ${{ env.PR_SHA }} # Ensure we checkout the exact SHA + +- name: Verify checkout + run: | + if [ "$(git rev-parse HEAD)" != "${{ env.PR_SHA }}" ]; then + echo "Error: Checked out SHA does not match expected SHA" + exit 1 + fi
Line range hint
72-109
: Enhance test results visibility and status reporting.Consider adding test results as artifacts and providing more detailed status context.
- name: Run Pytest timeout-minutes: 60 env: GCP_GSM_CREDENTIALS: ${{ secrets.GCP_GSM_CREDENTIALS }} run: > poetry run pytest --verbose + --junitxml=test-results.xml -m "not super_slow and not flaky" +- name: Upload test results + if: always() + uses: actions/upload-artifact@v4 + with: + name: test-results + path: test-results.xml + retention-days: 14 - name: Post CI Success to GitHub run: | + TOTAL_TESTS=$(grep -c "testcase" test-results.xml || echo "0") curl --request POST \ --url "https://api.github.com/repos/${{ github.repository }}/statuses/$(jq -r .head.sha < pr-info.json)" \ --header 'authorization: Bearer ${{ secrets.GITHUB_TOKEN }}' \ --header 'content-type: application/json' \ --data '{ "state": "success", - "context": "Pytest (All, Python ${{ matrix.python-version }}, ${{ matrix.os }})", + "context": "Pytest (${{ matrix.os }}, py${{ matrix.python-version }}, $TOTAL_TESTS tests)", "target_url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}", }'
Line range hint
110-145
: Enhance notification messages with more context.Consider adding more detailed information in the notification comments.
log-failure-comment: steps: - name: Append failure comment uses: peter-evans/create-or-update-comment@v4 with: issue-number: ${{ github.event.inputs.pr }} comment-id: ${{ github.event.inputs.comment-id }} reactions: confused body: | - > ❌ Tests failed. + ### ❌ Test Run Failed + + <details> + <summary>Test Configuration</summary> + + - Python: ${{ matrix.python-version }} + - OS: ${{ matrix.os }} + - Run: [View Details](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}) + </details> + + Please check the [test results](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}) for more information.unit_tests/sources/declarative/requesters/request_options/test_datetime_based_request_options_provider.py (1)
98-98
: Consider adding test cases with non-empty parameters.The empty parameters dict is correctly added, but consider enhancing test coverage by adding test cases that exercise the provider with non-empty parameters to ensure proper parameter handling.
Example test case to consider adding:
pytest.param( RequestOption(field_name="after", inject_into=RequestOptionType.request_parameter, parameters={"format": "ISO"}), RequestOption(field_name="before", inject_into=RequestOptionType.request_parameter, parameters={"format": "ISO"}), None, None, StreamSlice(cursor_slice={"start_time": "2024-06-01", "end_time": "2024-06-02"}, partition={}), {"after": "2024-06-01", "before": "2024-06-02"}, id="test_request_params_with_custom_parameters", ),.github/workflows/autofix-command.yml (4)
Line range hint
19-31
: Consider documenting Python version choice.While the configuration is correct, it would be helpful to document why Python 3.10 was specifically chosen. This helps maintainers understand if this needs to be updated when new Python versions are released.
Line range hint
89-96
: Consider adding error handling for Poetry setup.While the Poetry setup is good, consider adding error handling and retries for potential network issues during Poetry installation.
- name: Set up Poetry uses: Gr1N/setup-poetry@v9 + id: poetry-setup with: poetry-version: "1.7.1" + continue-on-error: true + +- name: Retry Poetry setup + if: steps.poetry-setup.outcome == 'failure' + uses: Gr1N/setup-poetry@v9 + with: + poetry-version: "1.7.1"
Line range hint
98-102
: Optimize Ruff formatting steps to reduce duplication.The format step is duplicated in both safe and unsafe fix sections. Since formatting is idempotent, it could be run once after all fixes are applied.
- name: Auto-Fix Ruff Lint Issues run: poetry run ruff check --fix . || true -- name: Auto-Fix Ruff Format Issues - run: poetry run ruff format . || true # Check for changes in git ... # Fix any further 'unsafe' lint issues in a separate commit - name: Auto-Fix Ruff Lint Issues (Unsafe) run: poetry run ruff check --fix --unsafe-fixes . || true -- name: Auto-Fix Ruff Format Issues - run: poetry run ruff format . || true +- name: Auto-Fix Ruff Format Issues (After all fixes) + run: poetry run ruff format . || trueAlso applies to: 121-124
Line range hint
144-148
: Add safety checks before pushing changes.Consider adding validation steps before pushing to ensure the changes haven't broken the codebase.
- name: Push changes to '(${{ steps.pr-info.outputs.repo }})' if: steps.git-diff.outputs.changes == 'true' || steps.git-diff-2.outputs.changes == 'true' run: | + # Validate the changes haven't introduced syntax errors + poetry run python -m compileall . + if [ $? -ne 0 ]; then + echo "Syntax validation failed. Aborting push." + exit 1 + fi git remote add contributor https://github.com/${{ steps.pr-info.outputs.repo }}.git git push contributor HEAD:${{ steps.pr-info.outputs.branch }}airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py (1)
48-48
: LGTM: Abstract formatting method is well-defined.The flexible return type (Any) appropriately supports various output formats needed by subclasses.
Consider adding a docstring to document the expected format of the returned value for implementers, e.g.:
@abstractmethod def output_format(self, timestamp: datetime) -> Any: """Convert a datetime object to the format required for state persistence. Args: timestamp: The datetime object to format Returns: The formatted timestamp in the implementation-specific format (e.g., ISO string, epoch timestamp) """ ...unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
124-127
: Fix typo in test ID.There's a duplicate word "request" in the test ID.
- "request_body_json", {"start": "{{ slice_interval.get('start_date') }}"}, False, id="test_request_request_body_json_no_state" + "request_body_json", {"start": "{{ slice_interval.get('start_date') }}"}, False, id="test_request_body_json_no_state"unit_tests/sources/file_based/stream/concurrent/test_adapters.py (2)
73-76
: LGTM! Consider extracting the mock setup for better test readability.The test cases effectively validate record transformation with different configurations. The mock setup is correct and covers both transformation scenarios appropriately.
Consider extracting the common mock setup to reduce duplication:
def create_mock_partition(stream_name): return Mock(spec=FileBasedStreamPartition, stream_name=Mock(return_value=stream_name)) # Then in the test: mock_partition = create_mock_partition(_STREAM_NAME) expected_records = [ Record({"data": "1"}, mock_partition), Record({"data": "2"}, mock_partition), ]Also applies to: 81-84
Line range hint
367-374
: Fix duplicate test IDs in parameterized test.The test implementation is correct, but there's a duplicate test ID "test_no_display_message" used for both test cases.
Update the test IDs to be unique and descriptive:
[ - pytest.param(Exception("message"), None, id="test_no_display_message"), - pytest.param(ExceptionWithDisplayMessage("message"), "message", id="test_no_display_message"), + pytest.param(Exception("message"), None, id="test_regular_exception"), + pytest.param(ExceptionWithDisplayMessage("message"), "message", id="test_exception_with_display_message"), ],airbyte_cdk/sources/streams/http/http_client.py (1)
Line range hint
246-251
: Add comment explaining auth reapplication.The logic for tracking request attempts and reapplying authentication is correct. Consider adding a comment explaining why authentication needs to be reapplied on retries, as this might not be immediately obvious to future maintainers.
Add this comment before the auth reapplication:
else: self._request_attempt_count[request] += 1 + # Reapply authentication as some authenticators modify request headers/params that need to be refreshed on retries if hasattr(self._session, "auth") and isinstance(self._session.auth, AuthBase): self._session.auth(request)
unit_tests/test_entrypoint.py (1)
258-262
: Consider enhancing error handling test coverage.The test could be parameterized to cover different error scenarios and validate the exact error message content. This would ensure consistent error handling across various config error cases.
Example enhancement:
@pytest.mark.parametrize( "error_message, expected_status_message", [ ("Invalid credentials", "Invalid credentials"), ("Connection timed out", "Connection timed out"), ("Missing required field", "Missing required field"), ] ) def test_run_check_with_config_error(error_message, expected_status_message, entrypoint, mocker, spec_mock, config_mock): exception = AirbyteTracedException.from_exception(ValueError(error_message)) exception.failure_type = FailureType.config_error parsed_args = Namespace(command="check", config="config_path") mocker.patch.object(MockSource, "check", side_effect=exception) messages = list(entrypoint.run(parsed_args)) assert len(messages) == 3 status_message = AirbyteMessage(**orjson.loads(messages[2])) assert status_message.connectionStatus.message == expected_status_messageunit_tests/sources/streams/concurrent/test_cursor.py (1)
611-611
: FIXME comment needs attention.There's a FIXME comment indicating a potential issue with granularity handling at the beginning of slices.
Would you like me to help investigate and propose a fix for the granularity handling issue?
Also applies to: 628-633
unit_tests/sources/file_based/scenarios/csv_scenarios.py (2)
456-457
: Remove redundantenum
property fordelivery_type
In the JSON schema definition, the
delivery_type
field already uses theconst
keyword to enforce a fixed value. Including anenum
with the same single value is redundant and unnecessary.Apply this diff to remove the redundant
enum
property:"const": "use_records_transfer", - "enum": ["use_records_transfer"], "type": "string",
471-472
: Remove redundantenum
property fordelivery_type
Similarly, in this schema definition, the
delivery_type
field has aconst
value. Theenum
with the same value is redundant and can be removed to simplify the schema.Apply this diff to clean up the redundant property:
"const": "use_file_transfer", - "enum": ["use_file_transfer"], "type": "string",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (78)
.github/workflows/autofix-command.yml
(1 hunks).github/workflows/python_lint.yml
(1 hunks).github/workflows/slash_command_dispatch.yml
(1 hunks).github/workflows/test-command.yml
(1 hunks)airbyte_cdk/connector.py
(2 hunks)airbyte_cdk/connector_builder/main.py
(1 hunks)airbyte_cdk/destinations/destination.py
(0 hunks)airbyte_cdk/destinations/vector_db_based/embedder.py
(2 hunks)airbyte_cdk/models/airbyte_protocol.py
(1 hunks)airbyte_cdk/sources/connector_state_manager.py
(1 hunks)airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(0 hunks)airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/error_handlers/default_error_handler.py
(0 hunks)airbyte_cdk/sources/declarative/requesters/error_handlers/default_http_response_filter.py
(0 hunks)airbyte_cdk/sources/declarative/requesters/error_handlers/http_response_filter.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py
(0 hunks)airbyte_cdk/sources/declarative/requesters/request_options/datetime_based_request_options_provider.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/request_options/interpolated_nested_request_input_provider.py
(0 hunks)airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_input_provider.py
(0 hunks)airbyte_cdk/sources/declarative/retrievers/async_retriever.py
(0 hunks)airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py
(1 hunks)airbyte_cdk/sources/file_based/config/unstructured_format.py
(1 hunks)airbyte_cdk/sources/file_based/discovery_policy/abstract_discovery_policy.py
(1 hunks)airbyte_cdk/sources/file_based/file_types/jsonl_parser.py
(0 hunks)airbyte_cdk/sources/file_based/file_types/parquet_parser.py
(0 hunks)airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py
(1 hunks)airbyte_cdk/sources/file_based/stream/concurrent/cursor/abstract_concurrent_file_based_cursor.py
(1 hunks)airbyte_cdk/sources/file_based/stream/default_file_based_stream.py
(0 hunks)airbyte_cdk/sources/source.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/cursor.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py
(2 hunks)airbyte_cdk/sources/streams/http/exceptions.py
(0 hunks)airbyte_cdk/sources/streams/http/http.py
(0 hunks)airbyte_cdk/sources/streams/http/http_client.py
(1 hunks)airbyte_cdk/test/catalog_builder.py
(1 hunks)airbyte_cdk/utils/message_utils.py
(1 hunks)airbyte_cdk/utils/traced_exception.py
(1 hunks)bin/generate_component_manifest_files.py
(1 hunks)pyproject.toml
(1 hunks)unit_tests/conftest.py
(1 hunks)unit_tests/connector_builder/test_connector_builder_handler.py
(3 hunks)unit_tests/destinations/test_destination.py
(2 hunks)unit_tests/sources/concurrent_source/test_concurrent_source_adapter.py
(1 hunks)unit_tests/sources/declarative/async_job/test_integration.py
(4 hunks)unit_tests/sources/declarative/async_job/test_job_orchestrator.py
(10 hunks)unit_tests/sources/declarative/concurrency_level/test_concurrency_level.py
(3 hunks)unit_tests/sources/declarative/decoders/test_pagination_decoder_decorator.py
(1 hunks)unit_tests/sources/declarative/decoders/test_xml_decoder.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_filter.py
(5 hunks)unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py
(1 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(5 hunks)unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py
(2 hunks)unit_tests/sources/declarative/requesters/error_handlers/test_default_http_response_filter.py
(0 hunks)unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py
(2 hunks)unit_tests/sources/declarative/requesters/request_options/test_datetime_based_request_options_provider.py
(2 hunks)unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py
(1 hunks)unit_tests/sources/declarative/requesters/test_http_job_repository.py
(3 hunks)unit_tests/sources/declarative/test_concurrent_declarative_source.py
(27 hunks)unit_tests/sources/file_based/in_memory_files_source.py
(0 hunks)unit_tests/sources/file_based/scenarios/csv_scenarios.py
(2 hunks)unit_tests/sources/file_based/stream/concurrent/test_adapters.py
(1 hunks)unit_tests/sources/file_based/stream/test_default_file_based_stream.py
(2 hunks)unit_tests/sources/message/test_repository.py
(0 hunks)unit_tests/sources/mock_server_tests/mock_source_fixture.py
(0 hunks)unit_tests/sources/streams/concurrent/scenarios/stream_facade_builder.py
(1 hunks)unit_tests/sources/streams/concurrent/scenarios/thread_based_concurrent_stream_scenarios.py
(6 hunks)unit_tests/sources/streams/concurrent/test_adapters.py
(1 hunks)unit_tests/sources/streams/concurrent/test_cursor.py
(21 hunks)unit_tests/sources/streams/http/error_handlers/test_default_backoff_strategy.py
(0 hunks)unit_tests/sources/streams/http/error_handlers/test_http_status_error_handler.py
(0 hunks)unit_tests/sources/streams/http/error_handlers/test_json_error_message_parser.py
(1 hunks)unit_tests/sources/streams/http/test_http.py
(0 hunks)unit_tests/test/test_entrypoint_wrapper.py
(1 hunks)unit_tests/test_entrypoint.py
(1 hunks)unit_tests/utils/test_traced_exception.py
(2 hunks)
💤 Files with no reviewable changes (20)
- airbyte_cdk/destinations/destination.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
- airbyte_cdk/sources/declarative/requesters/error_handlers/default_error_handler.py
- airbyte_cdk/sources/declarative/requesters/error_handlers/default_http_response_filter.py
- airbyte_cdk/sources/declarative/requesters/http_requester.py
- airbyte_cdk/sources/declarative/requesters/request_options/interpolated_nested_request_input_provider.py
- airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_input_provider.py
- airbyte_cdk/sources/declarative/retrievers/async_retriever.py
- airbyte_cdk/sources/file_based/file_types/jsonl_parser.py
- airbyte_cdk/sources/file_based/file_types/parquet_parser.py
- airbyte_cdk/sources/file_based/stream/default_file_based_stream.py
- airbyte_cdk/sources/streams/http/exceptions.py
- airbyte_cdk/sources/streams/http/http.py
- unit_tests/sources/declarative/requesters/error_handlers/test_default_http_response_filter.py
- unit_tests/sources/file_based/in_memory_files_source.py
- unit_tests/sources/message/test_repository.py
- unit_tests/sources/mock_server_tests/mock_source_fixture.py
- unit_tests/sources/streams/http/error_handlers/test_default_backoff_strategy.py
- unit_tests/sources/streams/http/error_handlers/test_http_status_error_handler.py
- unit_tests/sources/streams/http/test_http.py
✅ Files skipped from review due to trivial changes (27)
- airbyte_cdk/connector_builder/main.py
- airbyte_cdk/destinations/vector_db_based/embedder.py
- airbyte_cdk/sources/connector_state_manager.py
- airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
- airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py
- airbyte_cdk/sources/declarative/requesters/error_handlers/http_response_filter.py
- airbyte_cdk/sources/declarative/requesters/request_options/datetime_based_request_options_provider.py
- airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py
- airbyte_cdk/sources/file_based/config/unstructured_format.py
- airbyte_cdk/sources/file_based/stream/concurrent/cursor/abstract_concurrent_file_based_cursor.py
- airbyte_cdk/sources/streams/concurrent/cursor.py
- airbyte_cdk/utils/message_utils.py
- airbyte_cdk/utils/traced_exception.py
- unit_tests/conftest.py
- unit_tests/connector_builder/test_connector_builder_handler.py
- unit_tests/sources/declarative/async_job/test_integration.py
- unit_tests/sources/declarative/concurrency_level/test_concurrency_level.py
- unit_tests/sources/declarative/decoders/test_pagination_decoder_decorator.py
- unit_tests/sources/declarative/decoders/test_xml_decoder.py
- unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py
- unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py
- unit_tests/sources/declarative/requesters/test_http_job_repository.py
- unit_tests/sources/streams/concurrent/scenarios/stream_facade_builder.py
- unit_tests/sources/streams/concurrent/scenarios/thread_based_concurrent_stream_scenarios.py
- unit_tests/test/test_entrypoint_wrapper.py
- unit_tests/utils/test_traced_exception.py
🔇 Additional comments (55)
airbyte_cdk/sources/file_based/discovery_policy/abstract_discovery_policy.py (1)
21-21
: LGTM: Abstract method signature remains unchanged.
The abstract method declaration is clean and maintains its interface contract.
.github/workflows/slash_command_dispatch.yml (1)
23-24
: LGTM! Command names are now more concise and clearer.
The updated command names (autofix
and test
) are more intuitive than their predecessors (fix-pr
and test-pr
).
Let's verify the integration with the corresponding workflow files:
✅ Verification successful
Command updates are properly integrated across workflows
The verification confirms that:
- Both
autofix-command.yml
andtest-command.yml
workflow files exist and are properly configured - The workflows are triggered by repository dispatch events with the new command names
- The slash command dispatch configuration correctly lists the new commands
- No references to old command names (
fix-pr
,test-pr
) were found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the workflow files for these commands exist and use consistent naming
# Test 1: Check if the autofix workflow file exists and uses the updated name
echo "Checking autofix workflow..."
rg -l "name: On-Demand AutoFix" .github/workflows/autofix-command.yml
# Test 2: Check if the test workflow file exists and uses the updated name
echo "Checking test workflow..."
rg -l "name: On-Demand Test" .github/workflows/test-command.yml
Length of output: 338
Script:
#!/bin/bash
# Let's verify the actual command names used in these workflow files
# to ensure complete consistency
echo "Checking command name in autofix workflow..."
rg "repository_dispatch.*autofix" .github/workflows/autofix-command.yml
echo "Checking command name in test workflow..."
rg "repository_dispatch.*test" .github/workflows/test-command.yml
# Also check if there are any remaining references to the old command names
echo "Checking for old command names..."
rg "fix-pr|test-pr" .github/workflows/
Length of output: 451
Script:
#!/bin/bash
# Let's try a different approach to verify the workflow triggers
# Using more context in the grep to catch the event triggers
echo "Checking autofix workflow content..."
rg "on:" -A 5 .github/workflows/autofix-command.yml
echo "Checking test workflow content..."
rg "on:" -A 5 .github/workflows/test-command.yml
# Also check the entire content of slash command dispatch file
echo "Checking complete slash command dispatch config..."
cat .github/workflows/slash_command_dispatch.yml
Length of output: 4236
.github/workflows/python_lint.yml (1)
85-85
: LGTM! Consistent use of Poetry for dependency management.
The addition of poetry run
prefix ensures MyPy runs within the Poetry-managed virtual environment, maintaining consistency with other commands in the workflow.
Let's verify that all Python tool executions use Poetry consistently:
✅ Verification successful
Poetry is consistently used across all Python tool executions
The verification confirms that all Python tool executions (ruff check, ruff format, and mypy) in the workflow are properly prefixed with poetry run
, ensuring consistent dependency management and execution environment.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if all Python tool executions use Poetry
# Expected: All commands should be prefixed with 'poetry run'
echo "Checking Python commands in workflow:"
rg 'run:.*(?:ruff|mypy|pytest|pip|python)' .github/workflows/python_lint.yml
Length of output: 382
airbyte_cdk/test/catalog_builder.py (1)
48-51
: LGTM! Well-defined method overloads.
The overloads clearly define the two supported patterns for stream creation, making the API more type-safe and self-documenting.
airbyte_cdk/models/airbyte_protocol.py (1)
65-67
: LGTM! The multi-line formatting improves readability.
The reformatting of the global_
attribute declaration enhances code readability while maintaining the same functionality. The comment clearly explains the reason for using the Alias
decorator.
airbyte_cdk/connector.py (1)
110-110
: LGTM! Verify impact on existing connectors.
Making the Connector
class fully abstract aligns well with its ABC nature and provides a clearer interface definition.
Let's verify this change doesn't break existing connectors:
✅ Verification successful
Let me check the actual implementations to verify if this change could cause any issues.
Change to abstract Connector class is safe and well-aligned
The verification shows this change is safe because:
- No direct instantiations of
Connector
class were found - All existing implementations (
MockConnector
,Destination
) already provide their own implementations - Other references (
BaseSource
,AirbyteConnectorError
) don't rely on concrete implementations fromConnector
The change properly enforces the abstract nature of this base class while maintaining backward compatibility.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for existing connector implementations that might be affected
# by the change to abstract Connector class
# Look for classes that inherit from Connector
echo "Searching for Connector implementations..."
rg -l "class \w+\(.*Connector.*\):"
# Look for direct instantiations of Connector class (which would now fail)
echo "Checking for direct Connector instantiations..."
ast-grep --pattern 'Connector($$$)'
Length of output: 429
Script:
#!/bin/bash
# Check the content of files that inherit from Connector
echo "=== Checking test_connector.py ==="
rg "class.*Connector" unit_tests/test_connector.py -A 5
echo -e "\n=== Checking source.py ==="
rg "class.*Connector" airbyte_cdk/sources/source.py -A 5
echo -e "\n=== Checking destination.py ==="
rg "class.*Connector" airbyte_cdk/destinations/destination.py -A 5
echo -e "\n=== Checking sql/exceptions.py ==="
rg "class.*Connector" airbyte_cdk/sql/exceptions.py -A 5
Length of output: 1770
.github/workflows/test-command.yml (1)
Line range hint 1-13
: LGTM: Well-structured workflow configuration.
The workflow configuration is clean with clear input parameters and proper requirement flags.
unit_tests/sources/declarative/requesters/request_options/test_datetime_based_request_options_provider.py (1)
89-90
: LGTM! Function signature formatting looks clean.
The consolidated parameter list improves readability while maintaining the same functionality.
.github/workflows/autofix-command.yml (3)
Line range hint 1-12
: LGTM! Clear and well-structured workflow configuration.
The workflow name change is more concise while maintaining clarity. The input parameters are well-documented with appropriate descriptions and types.
Line range hint 150-183
: LGTM! Well-implemented status updates.
The status update implementation is comprehensive, with:
- Clear differentiation between different outcomes (changes applied, no changes, failure)
- Appropriate use of reactions for quick visual feedback
- Consistent comment formatting
Line range hint 33-57
: Verify PAT token permissions and security.
The workflow uses GH_PAT_APPROVINGTON_OCTAVIA
for checkout to trigger CI checks. This is a good practice, but we should verify:
- The token has minimal required permissions
- The token is properly secured in repository secrets
✅ Verification successful
Token usage appears secure and follows established patterns
The GH_PAT_APPROVINGTON_OCTAVIA
token is used consistently across automation workflows:
- It's properly stored as a repository secret
- Its usage is limited to checkout operations to trigger CI checks
- Similar pattern is used in other automation workflows (poetry-lock-command.yml)
- The workflow runs only on the repository itself (restricted from forks)
- The token is used alongside GitHub's default
GITHUB_TOKEN
which handles API operations
The implementation follows security best practices by:
- Using repository secrets for token storage
- Limiting token exposure to specific checkout operations
- Using separate tokens for different automation purposes (e.g.,
GH_PAT_MAINTENANCE_OCTAVIA
for maintenance tasks)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the PAT token is properly configured in repository secrets
gh api repos/$GITHUB_REPOSITORY/actions/secrets | jq '.secrets[] | select(.name=="GH_PAT_APPROVINGTON_OCTAVIA")'
Length of output: 179
Script:
#!/bin/bash
# Let's check the workflow file for any other references to this token and its usage
rg -A 3 -B 3 "GH_PAT_APPROVINGTON_OCTAVIA" .
# Also check for any documentation about this token
fd README.md | xargs rg -i "pat|token|secret|permission"
Length of output: 1910
Script:
#!/bin/bash
# Let's check the workflow file more thoroughly to understand the token's usage context
cat .github/workflows/autofix-command.yml
# Also check for any other workflow files that might reference this token
fd -e yml -e yaml . .github/workflows | xargs rg "GH_PAT|PAT_APPROVINGTON|OCTAVIA"
Length of output: 6253
unit_tests/sources/concurrent_source/test_concurrent_source_adapter.py (2)
44-46
: Clean and idiomatic list comprehension!
The refactored list comprehension is more concise and follows Python's best practices while maintaining the same functionality.
Line range hint 142-142
: Great test coverage enhancement!
The parameterized test now properly covers both scenarios:
- When exceptions are raised (raise_exception_on_missing_stream=True)
- When incomplete stream status is emitted (raise_exception_on_missing_stream=False)
This ensures robust error handling validation and maintains backward compatibility.
Also applies to: 144-167
airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py (1)
67-67
:
Breaking Change: primary_key
property converted to method.
The change from property decorator to regular method will affect how primary_key
is accessed in child classes and client code.
Let's verify the impact on child classes:
airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py (3)
31-31
: LGTM: Abstract property definition is correct.
The explicit ellipsis syntax is more consistent with modern Python style while maintaining the abstract property behavior.
42-42
: LGTM: Abstract method signature is well-defined.
The method has proper type annotations and the explicit ellipsis syntax maintains the abstract method contract.
45-45
: LGTM: Abstract parsing method is properly defined.
The flexible input type (Any) appropriately supports various timestamp formats in subclasses while ensuring datetime output.
pyproject.toml (2)
101-102
: Verify the impact of skipping __init__.py
files.
The addition of skip = ["__init__.py"]
in the isort configuration appears to be a temporary workaround. While this addresses the immediate issue, skipping these files entirely from import sorting could lead to inconsistencies.
Let's check the content of __init__.py
files to understand the impact:
✅ Verification successful
Based on the output, I can now generate the final response since I have enough information about the __init__.py
files in the codebase.
Skipping __init__.py
files from isort is safe in this context
The __init__.py
files in this codebase follow a consistent pattern of only containing import statements for exposing public APIs. They primarily:
- Re-export classes and functions from internal modules
- Group related functionality
- Define package-level imports
These files don't contain complex logic or circular dependencies that would require special import ordering. The temporary skip is reasonable while issue #12 is being addressed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the content of __init__.py files to understand what we're skipping
# Test: Look for import statements in __init__.py files
rg "^(?:from|import)\s+" -g "__init__.py"
Length of output: 26748
105-106
: Consider using a more specific exclude pattern.
The ruff exclude configuration mirrors the isort skip setting. While this maintains consistency, consider using a more specific pattern if only certain __init__.py
files are problematic.
Let's analyze the ruff violations in __init__.py
files:
✅ Verification successful
Based on the analysis of the codebase and the linked issue #12, I can now provide a final response:
The current exclude pattern is appropriate and should be maintained.
The exclude pattern __init__.py
is intentionally broad because:
- There is an active issue (Bug: Root
__init__.py
import and unstable and cannot be sorted due to circular references #12) tracking circular import dependencies in the codebase's__init__.py
files - The comment in
airbyte_cdk/__init__.py
explicitly warns about import order sensitivity: "Warning: The below imports are not stable and will cause circular dependencies if auto-sorted with isort" - Multiple
__init__.py
files contain carefully ordered imports to prevent circular dependencies (e.g., intransformations/__init__.py
which has a special isort directive)
Until the circular dependency issues are resolved at an architectural level, applying ruff's import sorting to any __init__.py
files could break the codebase's import functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check what ruff violations we're potentially ignoring
# Test: Run ruff check on __init__.py files only
fd "__init__.py" --exec ruff check {} || true
Length of output: 3970
Script:
#!/bin/bash
# Let's first check the content of the linked issue to understand the specific problem
gh issue view https://github.com/airbytehq/airbyte-python-cdk/issues/12 || true
# Let's also look for any __init__.py files to understand their content and potential issues
fd "__init__.py" --exec cat {}
Length of output: 37507
Script:
#!/bin/bash
# Let's check if there are any specific ruff configuration settings in pyproject.toml
cat pyproject.toml | grep -A 10 "\[tool.ruff\]"
# Let's also check for any __init__.py files and their imports to understand potential circular dependencies
fd "__init__.py" -x grep -l "import" {} \; | xargs -I {} sh -c 'echo "=== {} ==="; cat {}'
Length of output: 38259
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
111-122
: LGTM! Improved readability through better formatting.
The reformatting of these test cases enhances code readability while maintaining the original test logic.
unit_tests/destinations/test_destination.py (2)
101-101
: LGTM! Clean formatting change.
The addition of a trailing comma in the type annotation is a good practice for better git diffs when adding new types in the future.
243-246
: LGTM! Improved readability.
The parameters are now properly split across multiple lines with consistent indentation, making the code more readable.
unit_tests/sources/file_based/stream/test_default_file_based_stream.py (2)
259-259
: LGTM: Well-structured test record definition.
The test record includes all necessary file metadata (size and paths) with clear distinction between relative and absolute paths.
284-284
: LGTM: Appropriate test configuration for file transfer.
The use_file_transfer parameter is correctly set to True in setUp, enabling proper isolation and testing of file transfer functionality. The test methods adequately verify both record reading and transformation behaviors.
unit_tests/sources/declarative/extractors/test_record_filter.py (5)
50-72
: LGTM! Test cases are well-structured.
The test parameters are properly formatted and cover essential filtering scenarios including state-based, slice-based, and token-based filtering.
114-161
: LGTM! Comprehensive datetime format test coverage.
The test cases effectively cover various datetime formats (with/without timezone) and different state combinations, ensuring robust datetime-based filtering.
221-252
: LGTM! Well-documented cursor type test cases.
The test cases effectively cover various cursor scenarios with clear comments explaining each case's purpose, including global cursor and partition-based states.
256-263
: LGTM! Clear and descriptive test identifiers.
The test case IDs clearly describe the purpose of each test scenario, making it easy to understand the test coverage.
292-309
: LGTM! Clear cursor type handling implementation.
The code effectively handles different cursor types with clear comments explaining each case, making the implementation easy to understand and maintain.
unit_tests/sources/declarative/async_job/test_job_orchestrator.py (4)
80-82
: LGTM: Improved method signature formatting.
The method signature reformatting enhances readability while maintaining consistent style across the test file.
Also applies to: 95-97, 115-117, 155-157, 243-245, 296-298
85-85
: LGTM: Simplified mock side effect definitions.
The side effect definitions for update_jobs_status
have been consolidated into a more concise format while maintaining clarity.
Also applies to: 120-120, 131-131
197-199
: LGTM: Improved error creation readability.
The reformatting of the MessageRepresentationAirbyteTracedErrors
creation makes the error type and message more distinct.
267-267
: LGTM: Minor formatting improvements.
The added spacing and simplified side effect definition enhance code organization and readability.
Also applies to: 270-270
unit_tests/sources/streams/concurrent/test_adapters.py (1)
388-390
: LGTM! Clean formatting improvement.
The consolidation of constructor parameters improves readability while maintaining proper indentation.
airbyte_cdk/sources/streams/http/http_client.py (1)
235-237
: LGTM! Improved readability of chained backoff handlers.
The multi-line formatting enhances code readability while maintaining the same functionality.
unit_tests/test_entrypoint.py (1)
260-260
: LGTM! Good improvement in error message handling.
The change simplifies the error message construction by reusing the exception's message, which:
- Improves code readability by reducing verbosity
- Ensures consistency between the trace message and connection status message
unit_tests/sources/streams/concurrent/test_cursor.py (10)
410-413
: Fix indentation in test assertion.
The indentation of the assertion tuple has been improved for better readability.
518-518
: Update state structure to include slices array.
The state structure has been updated to use a consistent format with slices array containing start, end, and most_recent_cursor_value fields.
Also applies to: 532-532, 546-546
556-556
: Simplify sequential state assertions.
The assertions for sequential state have been simplified to check only the cursor field value, which is the expected behavior.
Also applies to: 569-569
756-756
: Improve test data structure.
The test data has been updated to:
- Use a more realistic config structure
- Include well-structured record data with consistent fields
Also applies to: 794-803
862-863
: Enhance state message structure.
The state message structure has been improved to include:
- Properly formatted datetime strings
- Consistent slice structure with start, end, and most_recent_cursor_value
Also applies to: 874-881
931-932
: Standardize partition test data.
The partition test data has been standardized to:
- Use consistent datetime format
- Include meaningful test data for different mythologies
Also applies to: 935-936, 939-940, 943-944
959-970
: Update state assertions format.
The state assertions have been updated to use a consistent format with proper indentation and complete slice information.
981-981
: Update config structure.
The config structure has been updated to use a consistent format with start_time and dynamic_cursor_key.
1023-1024
: Improve test data organization.
The test data organization has been improved with:
- Consistent datetime formats
- Logical grouping of related partitions
- Meaningful test data using mythology theme
Also applies to: 1027-1028, 1031-1032, 1035-1036, 1039-1040, 1043-1044
1061-1072
: Update final state assertions structure.
The final state assertions structure has been updated to use:
- Proper indentation
- Consistent datetime format
- Complete slice information including most_recent_cursor_value
unit_tests/sources/declarative/test_concurrent_declarative_source.py (4)
Line range hint 41-64
: LGTM! Improved configuration formatting.
The formatting changes to _CONFIG
and _CATALOG
improve readability by using consistent indentation and structure.
67-98
: LGTM! Enhanced JSON response formatting.
The JSON response data (_LOCATIONS_RESPONSE
, _PALACES_RESPONSE
, and _PARTY_MEMBERS_SKILLS_RESPONSE
) is now better structured with consistent formatting.
570-575
: LGTM! Improved test assertion readability.
The test assertions have been reformatted to be more readable by:
- Using consistent multi-line formatting
- Clearly separating expected and actual values
- Maintaining proper indentation
Also applies to: 585-590, 702-707, 773-778, 786-791, 844-846
1171-1174
: LGTM! Enhanced helper function formatting.
The dictionary comprehension in get_mocked_read_records_output
has been reformatted for better readability while maintaining functionality.
unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (2)
649-649
: Initialization of stream_state
appears correct
The addition of stream_state=AirbyteStateBlob({"created_at": "2024-01-02T00:00:00Z"})
in the initial state is appropriate and aligns with the test requirements.
681-682
: Inclusion of partition state for id
: 12 enhances test coverage
Adding the partition state for id
: 12 ensures that the test covers scenarios involving new partitions, improving the robustness of the state management logic.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)
2563-2569
: Well-structured addition of test case
The new test parameter test_create_concurrent_cursor_with_state
appropriately enhances the coverage by testing the scenario where stream_state
is provided.
2622-2625
: Correct initialization of expected test values
The expected values for lookback_window
, datetime_format
, and cursor_granularity
are accurately set, ensuring the test aligns with the cursor's configuration.
2658-2680
: Comprehensive test cases for missing configuration fields
The added test parameters effectively cover scenarios where mandatory fields like partition_field_start
, partition_field_end
, step
, and cursor_granularity
are None
. This strengthens the robustness of the cursor creation logic by verifying that appropriate exceptions are raised or default values are used.
@@ -15,9 +15,7 @@ class AbstractDiscoveryPolicy(ABC): | |||
|
|||
@property | |||
@abstractmethod | |||
def n_concurrent_requests(self) -> int: | |||
... | |||
def n_concurrent_requests(self) -> int: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Property removal is a breaking change affecting existing code
The verification confirms that n_concurrent_requests
is currently being used as a property in both production and test code:
- In
airbyte_cdk/sources/file_based/stream/default_file_based_stream.py
: Property access in concurrent task creation logic - In
unit_tests/sources/file_based/stream/test_default_file_based_stream.py
: Property assignment in test setup
Converting it from a property to a method will break these existing usages. The code needs to be updated to use method calls (n_concurrent_requests()
) instead of property access, or the property decorator should be retained to maintain backward compatibility.
🔗 Analysis chain
Breaking Change: Property to method conversion requires client updates.
The removal of the @property
decorator from n_concurrent_requests
changes how this value is accessed. Clients previously accessing it as a property (policy.n_concurrent_requests
) will need to update their code to call it as a method (policy.n_concurrent_requests()
).
Let's verify the impact on implementations:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for implementations of AbstractDiscoveryPolicy
ast-grep --pattern 'class $_ (AbstractDiscoveryPolicy):'
# Search for property access usage
rg -l 'n_concurrent_requests[^(]'
Length of output: 232
Script:
#!/bin/bash
# Let's check the actual usage in the identified files
rg -A 2 'n_concurrent_requests[^(]' unit_tests/sources/file_based/stream/test_default_file_based_stream.py airbyte_cdk/sources/file_based/stream/default_file_based_stream.py
# Also check for any implementations of n_concurrent_requests
ast-grep --pattern 'def n_concurrent_requests'
Length of output: 1075
@@ -27,12 +27,10 @@ class ExperimentalClassWarning(DeprecationWarning): | |||
|
|||
class BaseSource(BaseConnector[TConfig], ABC, Generic[TConfig, TState, TCatalog]): | |||
@abstractmethod | |||
def read_state(self, state_path: str) -> TState: | |||
... | |||
def read_state(self, state_path: str) -> TState: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert abstract method declarations to use ellipsis
The changes from ...
to .
in abstract method declarations are problematic:
- Python's standard way to declare abstract methods is with ellipsis (
...
) - Using a single dot (
.
) could be interpreted as a statement rather than a method placeholder - This deviates from PEP 484 typing conventions
Apply this diff to restore the standard abstract method declarations:
- def read_state(self, state_path: str) -> TState: .
+ def read_state(self, state_path: str) -> TState: ...
- def read_catalog(self, catalog_path: str) -> TCatalog: .
+ def read_catalog(self, catalog_path: str) -> TCatalog: ...
Also applies to: 33-33
💡 Codebase verification
No need to change abstract method declarations across the codebase
Looking at the codebase, all abstract method declarations consistently use ellipsis (...
) for their empty implementations, not a single dot. The file under review should maintain this consistent pattern.
airbyte_cdk/sources/source.py
: Should use...
instead of.
for abstract method declarations to maintain consistency with the rest of the codebase.
🔗 Analysis chain
Verify similar abstract method declarations across the codebase
Let's check if similar changes were made to other abstract method declarations in the codebase.
Also applies to: 33-33
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all Python abstract method declarations that use a single dot instead of ellipsis
# This helps identify if this is a broader pattern that needs to be addressed
echo "Searching for abstract method declarations using single dot..."
rg "@abstractmethod\s+def.*:\s*\." -g "*.py"
echo -e "\nComparing with standard abstract method declarations..."
rg "@abstractmethod\s+def.*:\s*\.\.\." -g "*.py"
Length of output: 359
Script:
#!/bin/bash
# Let's try a different approach to find abstract method declarations
# First, let's find files containing @abstractmethod decorator
echo "Files with abstract methods:"
rg -l "@abstractmethod" -g "*.py"
# Then, let's look at the actual abstract method declarations with context
echo -e "\nAbstract method declarations with context:"
rg -B 1 -A 1 "@abstractmethod" -g "*.py"
Length of output: 55565
@@ -96,8 +96,7 @@ def check(self, logger: logging.Logger, config: TConfig) -> AirbyteConnectionSta | |||
|
|||
class _WriteConfigProtocol(Protocol): | |||
@staticmethod | |||
def write_config(config: Mapping[str, Any], config_path: str) -> None: | |||
... | |||
def write_config(config: Mapping[str, Any], config_path: str) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove @staticmethod decorator from protocol method.
The @staticmethod
decorator in protocol definitions is unusual and potentially confusing. Protocols typically define instance methods, and the static behavior should be determined by the implementing classes.
Apply this change:
- @staticmethod
- def write_config(config: Mapping[str, Any], config_path: str) -> None: ...
+ def write_config(config: Mapping[str, Any], config_path: str) -> None: ...
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def write_config(config: Mapping[str, Any], config_path: str) -> None: ... | |
def write_config(config: Mapping[str, Any], config_path: str) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai - The goal of this PR is just to run the autoformat task from ruff
. I don't want to introduce new changes that could potentially have negative side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronsteers, understood. I'll refrain from suggesting additional changes in this PR to avoid unintended side effects.
✏️ Learnings added
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#13
File: airbyte_cdk/connector.py:99-99
Timestamp: 2024-11-10T04:50:11.704Z
Learning: When a PR's goal is to run the autoformat task from `ruff`, avoid suggesting code changes beyond formatting to prevent potential negative side effects.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
@@ -155,5 +155,4 @@ def parse_value(self, value: Any) -> Any: | |||
|
|||
@property | |||
@abstractmethod | |||
def zero_value(self) -> Any: | |||
... | |||
def zero_value(self) -> Any: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Restore @AbstractMethod decorator on zero_value to maintain the contract
The verification confirms that zero_value
is actively implemented by DatetimeStreamStateConverter
and the abstract class is imported in multiple places. The removal of @abstractmethod
would weaken the contract that ensures proper implementation.
DatetimeStreamStateConverter
implementszero_value
with a concrete return type ofdatetime
- The abstract class is imported by
cursor.py
andconcurrent_source_adapter.py
, indicating its role in the type hierarchy - The ellipsis implementation in the base class could lead to runtime errors if new subclasses miss implementing this method
- def zero_value(self) -> Any: ...
+ @abstractmethod
+ def zero_value(self) -> Any:
+ pass
🔗 Analysis chain
Revert the removal of @AbstractMethod decorator from zero_value property.
The change from an abstract property to a concrete one with ellipsis (...
) implementation has several issues:
- It removes the compile-time enforcement of the contract requiring subclasses to implement this property
- This could lead to runtime errors if subclasses fail to implement it
- The ellipsis implementation suggests it's still meant to be overridden, making the removal of
@abstractmethod
potentially unintentional
Apply this diff to restore the abstract contract:
- def zero_value(self) -> Any: ...
+ @abstractmethod
+ def zero_value(self) -> Any:
+ pass
Let's verify if there are any existing implementations that rely on this property:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for classes that inherit from AbstractStreamStateConverter
ast-grep --pattern 'class $_ (AbstractStreamStateConverter):'
# Search for implementations of zero_value
ast-grep --pattern 'def zero_value(self)'
Length of output: 107
Script:
#!/bin/bash
# Search for classes that inherit from AbstractStreamStateConverter with better pattern
rg -l "class.*\(.*AbstractStreamStateConverter.*\)"
# Search for implementations of zero_value with context
rg "def zero_value" -A 2
# Search for imports of AbstractStreamStateConverter
rg "from.*AbstractStreamStateConverter import"
rg "import.*AbstractStreamStateConverter"
Length of output: 1135
This PR runs auto-format of the repo using
ruff
and fixes some smaller issues in the slash commands.Summary by CodeRabbit
New Features
FixtureAvailabilityStrategy
class for enhanced error handling with HTTP 403 status codes.DefaultFileBasedStream
tests to ensure proper message propagation and logging.Bug Fixes
ConcurrentCursor
tests.Documentation
Style