-
Notifications
You must be signed in to change notification settings - Fork 30
feat: Add metadata models package with dynamic schema download #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add airbyte_cdk.metadata_models package with auto-generated Pydantic models - Models are generated from JSON schemas in airbytehq/airbyte repository - Schemas are downloaded on-demand during build process (no submodules) - Uses pydantic.v1 compatibility layer for consistency with declarative models - Includes comprehensive README with usage examples - Adds py.typed marker for type hint compliance The metadata models enable validation of connector metadata.yaml files using the same schemas maintained in the main Airbyte repository. Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new airbyte_cdk.metadata_models package containing auto-generated Pydantic models for validating connector metadata.yaml files. The models are dynamically generated during build by downloading JSON schemas from the airbytehq/airbyte repository.
Key changes:
- New metadata_models package with auto-generated Pydantic validation models
- Build script extension to download schemas from GitHub during build process
- Comprehensive README with usage examples and documentation
Reviewed Changes
Copilot reviewed 3 out of 35 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| airbyte_cdk/metadata_models/py.typed | Marker file for PEP 561 type hint compliance |
| airbyte_cdk/metadata_models/init.py | Package initialization re-exporting generated models |
| airbyte_cdk/metadata_models/README.md | Documentation covering usage, available models, and regeneration process |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1760999875-add-metadata-models#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1760999875-add-metadata-modelsHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughAdds infrastructure to download external metadata YAML schemas and generate Python models alongside existing declarative component model generation; refactors the main orchestration to run a two-phase flow and introduces new helper functions, constants, and a connector_metadata test package. Changes
Sequence Diagram(s)sequenceDiagram
participant Main
participant Phase1 as Declarative Components
participant Phase2 as Metadata Models
participant GitHub as GitHub Schemas
participant Codegen as Datamodel-Codegen
Main->>Phase1: get_all_yaml_files_from_dir()
Phase1->>Codegen: generate models from YAML list
Codegen-->>Phase1: generated models
Phase1->>Phase1: post_process_codegen
Phase1-->>Main: phase1 complete
Main->>Phase2: download_metadata_schemas(temp_dir)
Phase2->>GitHub: fetch YAMLs (raw URLs, token if present)
GitHub-->>Phase2: YAML files
Phase2->>Phase2: write to temp_dir
Phase2-->>Main: return yaml_stems
Main->>Phase2: generate_models_from_schemas(..., metadata_models=True)
Phase2->>Codegen: generate metadata single-file models
Codegen-->>Phase2: generated models
Phase2->>Phase2: post_process_metadata_models
Phase2-->>Main: phase2 complete
Main->>Main: consolidate_yaml_schemas_to_json() & finish
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Would you agree this estimate looks right? Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (2)
🧰 Additional context used🪛 markdownlint-cli2 (0.18.1)airbyte_cdk/test/models/connector_metadata/README.md8-8: Fenced code blocks should have a language specified (MD040, fenced-code-language) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (4)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
airbyte_cdk/metadata_models/__init__.py (1)
1-1: Consider adding an explicit__all__for API clarity, wdyt?While wildcard imports are acceptable for package
__init__.pyfiles, defining an explicit__all__list would make the public API surface more obvious and give you finer control over what gets exported. This is especially useful for generated code where you might want to filter out internal types or helpers.Example approach:
+__all__ = [ + "ConnectorMetadataDefinitionV0", + # Add other public models as needed +] + from .generated import *Alternatively, you could generate the
__all__list ingenerated.pyitself during the build process.airbyte_cdk/metadata_models/README.md (1)
37-47: Second example assumes undefinedmetadata_dict, wdyt?The second usage example references
metadata_dicton line 40, but this variable isn't defined in that code block. While readers might understand it comes from the first example, making this example standalone (or explicitly noting it continues from the first) would improve clarity.Consider either:
- Making it self-contained:
from airbyte_cdk.metadata_models import ConnectorMetadataDefinitionV0 import yaml metadata_dict = yaml.safe_load(Path("path/to/metadata.yaml").read_text()) metadata = ConnectorMetadataDefinitionV0(**metadata_dict) # Access fields with full type safety print(f"Connector: {metadata.data.name}") # ...
- Or adding a comment linking to the first example:
# Continuing from the validation example above... from airbyte_cdk.metadata_models import ConnectorMetadataDefinitionV0 metadata = ConnectorMetadataDefinitionV0(**metadata_dict) # ...bin/generate_component_manifest_files.py (2)
26-27: Consider consolidating the YAML file discovery functions, wdyt?Both
get_all_yaml_files_without_ext()andget_all_yaml_files_from_dir()perform the same operation, with the new one being more parameterized. The old function is still used on line 237 for declarative models.You could simplify by using the new parameterized version everywhere:
-def get_all_yaml_files_without_ext() -> list[str]: - return [Path(f).stem for f in glob(f"{LOCAL_YAML_DIR_PATH}/*.yaml")] - - def get_all_yaml_files_from_dir(directory: str) -> list[str]: return [Path(f).stem for f in glob(f"{directory}/*.yaml")]Then update line 237:
- declarative_yaml_files = get_all_yaml_files_without_ext() + declarative_yaml_files = get_all_yaml_files_from_dir(LOCAL_YAML_DIR_PATH)Also applies to: 30-32
160-179: Consider extracting common post-processing logic, wdyt?There's code duplication between
post_process_metadata_modelsandpost_process_codegen(lines 136-157). Both functions:
- Create
/generated_post_processeddirectory- Iterate through
.pyfiles- Replace pydantic imports
- Write to new files
You could extract the common pattern:
async def apply_post_processing( codegen_container: dagger.Container, transform_fn: callable[[str], str] ) -> dagger.Container: """Apply a transformation function to all generated Python files.""" codegen_container = codegen_container.with_exec( ["mkdir", "/generated_post_processed"], use_entrypoint=True ) for generated_file in await codegen_container.directory("/generated").entries(): if generated_file.endswith(".py"): original_content = await codegen_container.file( f"/generated/{generated_file}" ).contents() post_processed_content = transform_fn(original_content) codegen_container = codegen_container.with_new_file( f"/generated_post_processed/{generated_file}", contents=post_processed_content ) return codegen_container async def post_process_metadata_models(codegen_container: dagger.Container): """Post-process metadata models to use pydantic.v1 compatibility layer.""" def transform(content: str) -> str: return content.replace("from pydantic", "from pydantic.v1") return await apply_post_processing(codegen_container, transform)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
airbyte_cdk/metadata_models/generated/ActorDefinitionResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/AirbyteInternal.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/AllowedHosts.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorBreakingChanges.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorBuildOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorIPCOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorMetadataDefinitionV0.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorMetrics.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorPackageInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryDestinationDefinition.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryReleases.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistrySourceDefinition.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryV0.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorReleases.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorTestSuiteOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/GeneratedFields.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/GitInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/JobType.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/NormalizationDestinationDefinitionConfig.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RegistryOverrides.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ReleaseStage.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RemoteRegistries.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RolloutConfiguration.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/Secret.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SecretStore.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SourceFileInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SuggestedStreams.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SupportLevel.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/TestConnections.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/__init__.pyis excluded by!**/generated/**
📒 Files selected for processing (4)
airbyte_cdk/metadata_models/README.md(1 hunks)airbyte_cdk/metadata_models/__init__.py(1 hunks)airbyte_cdk/metadata_models/py.typed(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/metadata_models/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
airbyte_cdk/metadata_models/py.typed (1)
1-1: Looks good! Quick packaging question though.The py.typed marker is correctly placed and will signal to type checkers that the package is fully typed. Just to confirm—have you verified that
py.typedwill be included in the package distribution? This is sometimes missed if the build configuration (pyproject.toml, setup.py, or MANIFEST.in) doesn't explicitly include package marker files. Wdyt?bin/generate_component_manifest_files.py (2)
5-5: LGTM on the new imports!The addition of
tempfilefor temporary schema storage andhttpxfor async HTTP requests are appropriate choices for this functionality.Also applies to: 11-11
224-231: Clarify mutual exclusivity ofpost_processandmetadata_modelsflags, wdyt?The conditional logic here suggests
post_processandmetadata_modelsshould be mutually exclusive, but the function signature allows both to beTrue. If both areTrue, theelifon line 227 meansmetadata_modelstakes precedence andpost_processis silently ignored.Consider either:
- Adding validation to ensure only one is True:
) -> None: """Generate Pydantic models from YAML schemas using datamodel-codegen.""" + if post_process and metadata_models: + raise ValueError("post_process and metadata_models cannot both be True") + init_module_content = generate_init_module_content(yaml_files)
- Or simplifying to a single enum parameter:
async def generate_models_from_schemas( dagger_client: dagger.Client, yaml_dir_path: str, output_dir_path: str, yaml_files: list[str], processing_mode: Literal["none", "declarative", "metadata"] = "none", ) -> None:This would make the intent clearer and prevent misuse.
| ``` | ||
| airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifier to fenced code block, wdyt?
The fenced code block showing the repository path should have a language specifier for proper syntax highlighting and markdown lint compliance.
Apply this diff:
-```
+```text
airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In airbyte_cdk/metadata_models/README.md around lines 8 to 10, the fenced code
block lacks a language specifier; update the opening fence from totext
so the block becomes a plain-text fenced code block (i.e., replace the existing
fence with ```text and keep the content and closing fence unchanged).
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
| METADATA_SCHEMAS_GITHUB_URL = "https://api.github.com/repos/airbytehq/airbyte/contents/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" | ||
| METADATA_SCHEMAS_RAW_URL_BASE = "https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" | ||
| LOCAL_METADATA_OUTPUT_DIR_PATH = "airbyte_cdk/metadata_models/generated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pinning to a stable ref instead of master, wdyt?
The URLs point to the master branch, which means builds will always fetch the latest schemas. If the upstream repository introduces breaking changes or the master branch becomes temporarily unstable, CDK builds will fail.
Consider one of these approaches:
- Pin to a specific commit SHA or tag for reproducible builds
- Add a configuration mechanism to specify the ref (defaulting to a known-good SHA)
- Implement a fallback to cached/vendored schemas if download fails
This would improve build reliability and reproducibility.
Example with pinning:
-METADATA_SCHEMAS_GITHUB_URL = "https://api.github.com/repos/airbytehq/airbyte/contents/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src"
-METADATA_SCHEMAS_RAW_URL_BASE = "https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src"
+# Pin to a specific commit for reproducible builds
+METADATA_SCHEMAS_REF = "a1b2c3d4e5f6" # Update this when schemas change
+METADATA_SCHEMAS_GITHUB_URL = f"https://api.github.com/repos/airbytehq/airbyte/contents/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src?ref={METADATA_SCHEMAS_REF}"
+METADATA_SCHEMAS_RAW_URL_BASE = f"https://raw.githubusercontent.com/airbytehq/airbyte/{METADATA_SCHEMAS_REF}/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| METADATA_SCHEMAS_GITHUB_URL = "https://api.github.com/repos/airbytehq/airbyte/contents/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" | |
| METADATA_SCHEMAS_RAW_URL_BASE = "https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" | |
| LOCAL_METADATA_OUTPUT_DIR_PATH = "airbyte_cdk/metadata_models/generated" | |
| # Pin to a specific commit for reproducible builds | |
| METADATA_SCHEMAS_REF = "a1b2c3d4e5f6" # Update this when schemas change | |
| METADATA_SCHEMAS_GITHUB_URL = f"https://api.github.com/repos/airbytehq/airbyte/contents/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src?ref={METADATA_SCHEMAS_REF}" | |
| METADATA_SCHEMAS_RAW_URL_BASE = f"https://raw.githubusercontent.com/airbytehq/airbyte/{METADATA_SCHEMAS_REF}/airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" | |
| LOCAL_METADATA_OUTPUT_DIR_PATH = "airbyte_cdk/metadata_models/generated" |
| async def download_metadata_schemas(temp_dir: Path) -> list[str]: | ||
| """Download metadata schema YAML files from GitHub to a temporary directory.""" | ||
| headers = { | ||
| "User-Agent": "airbyte-python-cdk-build", | ||
| "Accept": "application/vnd.github.v3+json", | ||
| } | ||
|
|
||
| async with httpx.AsyncClient(headers=headers, timeout=30.0) as client: | ||
| try: | ||
| response = await client.get(METADATA_SCHEMAS_GITHUB_URL) | ||
| response.raise_for_status() | ||
| files_info = response.json() | ||
| except httpx.HTTPStatusError as e: | ||
| if e.response.status_code == 403: | ||
| print( | ||
| "Warning: GitHub API rate limit exceeded. Using cached schemas if available.", | ||
| file=sys.stderr, | ||
| ) | ||
| raise | ||
| raise | ||
|
|
||
| yaml_files = [] | ||
| for file_info in files_info: | ||
| if file_info["name"].endswith(".yaml"): | ||
| file_name = file_info["name"] | ||
| file_url = f"{METADATA_SCHEMAS_RAW_URL_BASE}/{file_name}" | ||
|
|
||
| print(f"Downloading {file_name}...", file=sys.stderr) | ||
| file_response = await client.get(file_url) | ||
| file_response.raise_for_status() | ||
|
|
||
| file_path = temp_dir / file_name | ||
| file_path.write_text(file_response.text) | ||
| yaml_files.append(Path(file_name).stem) | ||
|
|
||
| return yaml_files | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rate limit and reliability concerns in schema download, wdyt?
The function has several reliability issues that could cause builds to fail:
-
Rate limit handling is misleading: Lines 54-60 catch 403 (rate limit) and print "Using cached schemas if available" but then re-raise the exception, causing the build to fail anyway. There's no actual fallback to cached schemas.
-
No GitHub authentication: Unauthenticated requests have a much lower rate limit (60/hour vs 5000/hour). Builds could easily hit this limit, especially in CI environments.
-
No fallback mechanism: If GitHub is down or rate-limited, builds will always fail. Consider vendoring a copy of the schemas as a fallback.
-
No retry logic: Transient network errors will cause immediate build failure.
Suggested improvements:
async def download_metadata_schemas(temp_dir: Path) -> list[str]:
"""Download metadata schema YAML files from GitHub to a temporary directory."""
+ # Try to use GitHub token for higher rate limits
+ github_token = os.environ.get("GITHUB_TOKEN")
headers = {
"User-Agent": "airbyte-python-cdk-build",
"Accept": "application/vnd.github.v3+json",
}
+ if github_token:
+ headers["Authorization"] = f"token {github_token}"
- async with httpx.AsyncClient(headers=headers, timeout=30.0) as client:
+ async with httpx.AsyncClient(headers=headers, timeout=30.0, max_retries=3) as client:
try:
response = await client.get(METADATA_SCHEMAS_GITHUB_URL)
response.raise_for_status()
files_info = response.json()
except httpx.HTTPStatusError as e:
if e.response.status_code == 403:
- print(
- "Warning: GitHub API rate limit exceeded. Using cached schemas if available.",
- file=sys.stderr,
- )
- raise
+ print("Error: GitHub API rate limit exceeded.", file=sys.stderr)
+ print("Set GITHUB_TOKEN environment variable for higher rate limits.", file=sys.stderr)
+ # TODO: Fall back to vendored schemas here
+ raise
raiseWould you like me to open an issue to track adding a proper fallback mechanism?
Committable suggestion skipped: line range outside the PR's diff.
PyTest Results (Fast)1 tests - 3 801 0 ✅ - 3 790 2s ⏱️ - 6m 32s For more details on these errors, see this check. Results for commit 933d478. ± Comparison against base commit 6b747fe. This pull request removes 3802 and adds 1 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
PyTest Results (Full)18 tests 0 ✅ 14s ⏱️ For more details on these errors, see this check. Results for commit 933d478. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, let's move the generated models module until this module instead: airbyte_cdk.test.models.connector_metadata
…etadata - Moved from airbyte_cdk.metadata_models.generated to airbyte_cdk.test.models.connector_metadata - Updated build script to output to new location - Updated README with new import paths Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
♻️ Duplicate comments (13)
airbyte_cdk/test/models/connector_metadata/README.md (1)
8-10: Address the markdown lint issue.A previous review flagged this fenced code block for missing a language specifier. While this is a minor formatting issue, it's good practice to fix for markdown compliance.
Apply this diff:
-``` +```text airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/</blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ReleaseStage.py (1)</summary><blockquote> `9-13`: **Same duplication issue as SupportLevel, wdyt?** The `ReleaseStage` enum is also defined identically in at least 5 other generated files (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py), causing the same type identity and import conflicts described in the SupportLevel comment. This is part of the same systemic code generation issue where shared types are being duplicated rather than imported. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py (1)</summary><blockquote> `11-16`: **Same duplication issue affects SourceFileInfo, wdyt?** The `SourceFileInfo` model is defined identically in at least 6 other files (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py, GeneratedFields.py). This is part of the same systemic code generation issue. Since this is a Pydantic model rather than just an enum, the type conflicts are even more severe—instances won't validate against models from different modules. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (1)</summary><blockquote> `11-18`: **ResourceRequirements also duplicated across 5+ files, wdyt?** The `ResourceRequirements` model is defined identically in at least 5 other files (ActorDefinitionResourceRequirements.py, ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, RegistryOverrides.py), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/TestConnections.py (1)</summary><blockquote> `9-14`: **TestConnections duplicated in 2 files, wdyt?** The `TestConnections` model is defined identically in ConnectorMetadataDefinitionV0.py (lines 56-61) and ConnectorTestSuiteOptions.py (lines 31-36), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py (1)</summary><blockquote> `11-23`: **Both PyPi and RemoteRegistries duplicated, wdyt?** Both `PyPi` (lines 11-16) and `RemoteRegistries` (lines 19-23) are defined identically in ConnectorMetadataDefinitionV0.py (PyPi at lines 203-208, RemoteRegistries at lines 322-326), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)</summary><blockquote> `13-32`: **GitInfo duplication already flagged.** This `GitInfo` definition is identical to the one in `GitInfo.py`. See the earlier comment on that file regarding the broader duplication pattern. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/GitInfo.py (1)</summary><blockquote> `12-31`: **Duplicate model definition - part of broader pattern.** The `GitInfo` model is duplicated across at least 6 files in this package (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py, and GeneratedFields.py). This is the most widely duplicated model in the codebase. Consider this as part of the broader duplication issue flagged in Secret.py. The generation script should be configured to create these shared models once and use imports. Wdyt? </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (1)</summary><blockquote> `11-29`: **Model structure looks good; duplication noted.** The `RolloutConfiguration` model has well-chosen defaults and appropriate validation constraints (percentage ranges, minimum delay). However, it's duplicated in ConnectorMetadataDefinitionV0.py, ConnectorRegistryReleases.py, and ConnectorRegistrySourceDefinition.py. This is part of the broader duplication pattern. Wdyt about consolidating these shared models? </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (1)</summary><blockquote> `29-39`: **Model structure is appropriate; duplication noted.** The `AirbyteInternal` model correctly uses `Extra.allow` (unlike most other models in this package that use `Extra.forbid`), which is appropriate for internal/extensible metadata. The defaults are sensible (isEnterprise=False, requireVersionIncrementsInPullRequests=True). However, this model along with the `Sl` and `Ql` enums is duplicated in ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, and ConnectorRegistrySourceDefinition.py—part of the broader duplication pattern. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)</summary><blockquote> `13-70`: **Breaking change models are well-structured; duplication noted.** The breaking change models use appropriate Pydantic v1 patterns: - `const=True` for type discrimination in `StreamBreakingChangeScope` - `__root__` wrappers for dict-based schemas - `date` and `AnyUrl` types for proper validation However, all five models are duplicated in ConnectorMetadataDefinitionV0.py—part of the broader duplication pattern discussed in earlier files. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)</summary><blockquote> `19-49`: **Duplicate definitions: Should import from Secret.py instead.** `SecretStore` (lines 19-29) and `Secret` (lines 40-49) are identical to the models defined in `Secret.py` within the same PR. Since both files are in the same package (`airbyte_cdk/test/models/connector_metadata/`), `ConnectorTestSuiteOptions.py` should import these models rather than redefining them: ```python from airbyte_cdk.test.models.connector_metadata.Secret import Secret, SecretStoreThis would eliminate the duplication and ensure consistency. Wdyt?
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
22-48: Resource requirements models are well-structured; duplication noted.The models provide good flexibility with global defaults and per-job overrides. The
JobTypeenum comprehensively covers connector operations.However, all four models are duplicated in ConnectorMetadataDefinitionV0.py—part of the broader duplication pattern.
🧹 Nitpick comments (22)
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (1)
12-15: Consider stronger typing for metrics fields, wdyt?The
ConnectorMetricsclass usesOptional[Any]for all three fields (all, cloud, oss), which sacrifices type safety. This might be intentional for flexible metric schemas, but if the metrics structure is known, consider using more specific types (e.g.,Optional[ConnectorMetric]or a typed dictionary).That said, the flexibility might be needed if metrics schemas vary. What's the rationale here?
bin/generate_component_manifest_files.py (1)
160-179: Consider extracting the shared post-processing logic, wdyt?The
post_process_metadata_modelsfunction has nearly identical logic topost_process_codegen(lines 136-157), both iterating through generated files and applying string replacements. The main difference is thatpost_process_metadata_modelsonly replaces pydantic imports whilepost_process_codegendoes additional replacements.Consider extracting the common pattern into a shared helper:
+async def post_process_generated_files( + codegen_container: dagger.Container, + replacements: list[tuple[str, str]], + apply_deprecation_fix: bool = False +) -> dagger.Container: + """Apply string replacements to generated Python files.""" + codegen_container = codegen_container.with_exec( + ["mkdir", "/generated_post_processed"], use_entrypoint=True + ) + for generated_file in await codegen_container.directory("/generated").entries(): + if generated_file.endswith(".py"): + original_content = await codegen_container.file( + f"/generated/{generated_file}" + ).contents() + + post_processed_content = original_content + for old, new in replacements: + post_processed_content = post_processed_content.replace(old, new) + + if apply_deprecation_fix: + post_processed_content = replace_base_model_for_classes_with_deprecated_fields( + post_processed_content + ) + + codegen_container = codegen_container.with_new_file( + f"/generated_post_processed/{generated_file}", contents=post_processed_content + ) + return codegen_container async def post_process_codegen(codegen_container: dagger.Container): - codegen_container = codegen_container.with_exec( - ["mkdir", "/generated_post_processed"], use_entrypoint=True - ) - for generated_file in await codegen_container.directory("/generated").entries(): - if generated_file.endswith(".py"): - original_content = await codegen_container.file( - f"/generated/{generated_file}" - ).contents() - post_processed_content = original_content.replace( - " _parameters:", " parameters:" - ).replace("from pydantic", "from pydantic.v1") - - post_processed_content = replace_base_model_for_classes_with_deprecated_fields( - post_processed_content - ) - - codegen_container = codegen_container.with_new_file( - f"/generated_post_processed/{generated_file}", contents=post_processed_content - ) - return codegen_container + return await post_process_generated_files( + codegen_container, + replacements=[(" _parameters:", " parameters:"), ("from pydantic", "from pydantic.v1")], + apply_deprecation_fix=True + ) async def post_process_metadata_models(codegen_container: dagger.Container): """Post-process metadata models to use pydantic.v1 compatibility layer.""" - codegen_container = codegen_container.with_exec( - ["mkdir", "/generated_post_processed"], use_entrypoint=True - ) - for generated_file in await codegen_container.directory("/generated").entries(): - if generated_file.endswith(".py"): - original_content = await codegen_container.file( - f"/generated/{generated_file}" - ).contents() - - post_processed_content = original_content.replace( - "from pydantic", "from pydantic.v1" - ) - - codegen_container = codegen_container.with_new_file( - f"/generated_post_processed/{generated_file}", contents=post_processed_content - ) - return codegen_container + return await post_process_generated_files( + codegen_container, + replacements=[("from pydantic", "from pydantic.v1")], + apply_deprecation_fix=False + )airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
12-19: Consider adding validation for resource requirement formats.The
ResourceRequirementsfields areOptional[str]with no format validation. While this flexibility allows Kubernetes-style values like "100m" or "1Gi", it also accepts invalid values like "abc" or "100xyz" that would fail at runtime.Since these are auto-generated models, would it make sense to add format validation in the upstream YAML schema (e.g., regex patterns or custom validators)? Wdyt?
airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
49-58: Consider consolidating identical enums.
UsageandSyncSuccessRatehave identical members (low, medium, high). If these represent the same conceptual scale, they could be consolidated into a single enum likeMetricLevelto reduce duplication. However, keeping them separate provides better semantic clarity if they represent distinct dimensions.Wdyt—worth consolidating or better to keep separate for clarity?
airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
12-18: Should Extra be forbid on simple leaf models to catch typos?AllowedHosts/SuggestedStreams/NormalizationDestinationDefinitionConfig use Extra.allow, which can mask misspelled fields on these small shapes. Would switching to Extra.forbid for these specific leaves help catch config mistakes, or is “additionalProperties” intentionally allowed in the upstream schema, wdyt?
Also applies to: 22-37
airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py (2)
43-44: Prefer Literal for const fields for clearer runtime checksUsing Any with const works but is looser than necessary. Shall we make this a Literal for better type clarity, wdyt?
-from typing import Any, Dict, List, Optional +from typing import Any, Dict, List, Optional, Literal @@ class StreamBreakingChangeScope(BaseModel): - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = "stream"
17-32: Add cross-field check: initialPercentage ≤ maxPercentageA simple validator will prevent inconsistent rollout configs. Should we add it, wdyt?
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + + from pydantic.v1 import root_validator + + @root_validator + def _validate_rollout_bounds(cls, values): + init = values.get("initialPercentage") + maxp = values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return valuesairbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
298-301: Tighten URL/time/number types and add a small integrity checkWould you be open to these stricter types and a light data check, wdyt?
- Use AnyUrl where fields are URLs:
- documentationUrl: str + documentationUrl: AnyUrl @@ - sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for timestamps in SourceFileInfo:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Enforce positive values for maxSecondsBetweenMessages:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union @@ - maxSecondsBetweenMessages: Optional[int] = Field( + maxSecondsBetweenMessages: Optional[conint(ge=1)] = Field( None, description="Number of seconds allowed between 2 airbyte protocol messages. The source will timeout if this delay is reach", )
- Ensure unique jobType in jobSpecific (same pattern as suggested elsewhere):
class ActorDefinitionResourceRequirements(BaseModel): @@ jobSpecific: Optional[List[JobTypeResourceLimit]] = None + from pydantic.v1 import validator + @validator("jobSpecific") + def _unique_job_types(cls, v): + if not v: + return v + seen = set() + for item in v: + if item.jobType in seen: + raise ValueError(f"duplicate jobType '{item.jobType.value}' in jobSpecific") + seen.add(item.jobType) + return vAlso applies to: 331-334, 240-241, 165-170, 186-188, 326-329, 247-252
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
303-304: Optional tightenings: URLs, datetimes, and rollout boundsWould you consider these small improvements, wdyt?
- Use AnyUrl for documentationUrl and sbomUrl:
- documentationUrl: str + documentationUrl: AnyUrl @@ - sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for SourceFileInfo timestamps:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Add rollout bound check:
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + from pydantic.v1 import root_validator + @root_validator + def _validate_rollout_bounds(cls, values): + init, maxp = values.get("initialPercentage"), values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return valuesAlso applies to: 395-399, 240-241, 183-189, 79-94
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
236-241: Minor hardening: URLs, datetimes, Literal scope type, rollout bound, uniquenessWould you consider these small improvements, wdyt?
- Use AnyUrl for sbomUrl:
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for SourceFileInfo timestamps:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Prefer Literal for scopeType:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union, Literal @@ class StreamBreakingChangeScope(BaseModel): - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = "stream"
- Add rollout bounds validator:
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + from pydantic.v1 import root_validator + @root_validator + def _validate_rollout_bounds(cls, values): + init, maxp = values.get("initialPercentage"), values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return values
- Ensure unique jobType in jobSpecific:
class ActorDefinitionResourceRequirements(BaseModel): @@ jobSpecific: Optional[List[JobTypeResourceLimit]] = None + from pydantic.v1 import validator + @validator("jobSpecific") + def _unique_job_types(cls, v): + if not v: + return v + seen = set() + for item in v: + if item.jobType in seen: + raise ValueError(f"duplicate jobType '{item.jobType.value}' in jobSpecific") + seen.add(item.jobType) + return vAlso applies to: 165-170, 146-161, 221-225, 35-37, 18-32
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (6)
1-3: Reproducibility: embed generator + schema provenance?Since builds fetch schemas at build time, could we stamp the generator version, source schema URL, and commit SHA in this header to make artifacts reproducible and debuggable, wdyt?
101-111: Use Literal for const field to tighten validation and typingscopeType is effectively a constant. Shall we use Literal["stream"] instead of Any+const for stronger type-checking and clearer errors, wdyt?
Apply this diff:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union, Literal @@ -class StreamBreakingChangeScope(BaseModel): +class StreamBreakingChangeScope(BaseModel): @@ - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = Field("stream", const=True)
236-241: URL typing: prefer AnyUrl for sbomUrlsbomUrl looks like a URL. Would you switch to AnyUrl for stricter validation, wdyt?
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
323-327: Semver: add a regex constraint on protocolVersionIf protocolVersion is semver, shall we add a constraint to avoid accidental non-semver strings, wdyt?
- protocolVersion: Optional[str] = Field( + protocolVersion: Optional[constr(regex=r'^\d+\.\d+\.\d+(?:-[0-9A-Za-z-.]+)?$')] = Field( None, description="the Airbyte Protocol version supported by the connector" )
294-301: <!-- [scratchpad_update]
[observations_updated]
- Web search confirms AnyUrl is a valid Pydantic type from the networks module
- AnyUrl provides URL validation that catches malformed values at parse-time
- Pydantic v2 behavior: AnyUrl appends trailing slash; may affect downstream usage
- The suggestion to use AnyUrl is technically sound and recommended by Pydantic docs
- Need to verify: (1) Pydantic version used in this codebase, (2) whether AnyUrl is imported
[pending]
- Check Pydantic import and version in ConnectorRegistryDestinationDefinition.py
- Verify if changing to AnyUrl would break any dependent code (due to type differences)
- Determine if this needs coordination across all 8 instances found
[scratchpad_end] --><!-- [scratchpad_update_2]
[key_finding]- Airbyte CDK 2.0 (released mid-2024) uses Pydantic V2
- The files being reviewed are test/models/connector_metadata files
- These are likely part of the updated Pydantic V2 stack
- AnyUrl is available in Pydantic V2
[critical_observation]
- Need to check if these files are already using Pydantic V2 imports
- Suggested change (str -> AnyUrl) is valid for Pydantic V2
- The pattern exists across 8 locations in 4 files
- This represents a systemic opportunity for coordinated refactoring
[pending]
- Verify imports in target file to confirm Pydantic V2 usage
- Check if AnyUrl is already imported
- Determine scope of refactor (all 8 instances or just the target file?)
[scratchpad_end] -->documentationUrl/iconUrl: widen refactor scope—apply across all registry modelsThe suggestion to use AnyUrl for validation is sound. However, the verification found this pattern repeated in 8 locations across 4 files:
- ConnectorRegistryV0.py (lines 303, 395)
- ConnectorRegistrySourceDefinition.py (lines 298, 389)
- ConnectorRegistryReleases.py (lines 335, 389)
- ConnectorRegistryDestinationDefinition.py (lines 298, 390)
Rather than fixing just the target file, consider coordinating this into a single refactor pass across all registry models to maintain consistency. With CDK 2.0, Pydantic V2 is available, so AnyUrl is a valid choice. Would updating all four files at once work for your workflow, or should we proceed incrementally?
436-439: Consider addinglocals()and includingSourceDefinitionfor robustness and consistencyYour suggestion is spot on. The file defines
ConnectorRegistrySourceDefinitionat line 382, yet it's not receiving anupdate_forward_refs()call while the other three registry classes are. Passinglocals()also aligns with Pydantic best practices for ensuring forward references resolve against all types in the current module scope.The refactored version would look like:
_ns = locals() ConnectorRegistryDestinationDefinition.update_forward_refs(**_ns) ConnectorRegistryReleases.update_forward_refs(**_ns) ConnectorReleaseCandidates.update_forward_refs(**_ns) VersionReleaseCandidate.update_forward_refs(**_ns) ConnectorRegistrySourceDefinition.update_forward_refs(**_ns)This makes the handling symmetric and more resilient. Worth adopting?
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
1-3: Reproducibility: capture generator + schema commit in headerTo help future debugging (and rate-limit workarounds), could we include the datamodel-codegen version, source schema URL(s), and pinned commit SHA here, wdyt?
166-172: Use Literal for const fieldSame as the other module: would you switch scopeType to Literal["stream"] for stricter typing, wdyt?
- scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = Field("stream", const=True)
330-335: URL typing: sbomUrl could be AnyUrlAlign with other URL fields to validate early. Shall we switch, wdyt?
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
397-404: connectorSubtype override: align type with enum while keeping flexibilityFor RegistryOverrides.connectorSubtype, would you type it as Optional[Union[str, ConnectorSubtype]] so we validate known values but still accept future strings, wdyt?
- connectorSubtype: Optional[str] = None + connectorSubtype: Optional[Union[str, ConnectorSubtype]] = None
441-488: StandardizedocumentationUrltype across metadata and registry definitionsThe inconsistency is confirmed:
ConnectorMetadataDefinitionV0.pyusesAnyUrlfordocumentationUrl(line 452), while all registry definition files (ConnectorRegistry*) usestr(6 instances across multiple files). This divergence could introduce parsing surprises when the same field is handled in different contexts. Would standardizing onAnyUrlacross both codegen flows make sense for your use cases?
471-474: Consider refactoring tags field to usedefault_factory=listwith non-optional typeThe current implementation uses a mutable default argument with an Optional type, which is a Python anti-pattern. The suggested change to
List[str] = Field(default_factory=list, ...)aligns with Pydantic best practices and eliminates the mutable default concern. A broader search of the codebase didn't surface other similar patterns, so this appears to be an isolated instance. Wdyt about making this change?- tags: Optional[List[str]] = Field( - [], - description="An array of tags that describe the connector. E.g: language:python, keyword:rds, etc.", - ) + tags: List[str] = Field( + default_factory=list, + description="An array of tags that describe the connector. E.g: language:python, keyword:rds, etc.", + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py(1 hunks)airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py(1 hunks)airbyte_cdk/test/models/connector_metadata/AllowedHosts.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorBuildOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py(1 hunks)airbyte_cdk/test/models/connector_metadata/GitInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/JobType.py(1 hunks)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py(1 hunks)airbyte_cdk/test/models/connector_metadata/README.md(1 hunks)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ReleaseStage.py(1 hunks)airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py(1 hunks)airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py(1 hunks)airbyte_cdk/test/models/connector_metadata/Secret.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SecretStore.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SupportLevel.py(1 hunks)airbyte_cdk/test/models/connector_metadata/TestConnections.py(1 hunks)airbyte_cdk/test/models/connector_metadata/__init__.py(1 hunks)airbyte_cdk/test/models/connector_metadata/py.typed(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
✅ Files skipped from review due to trivial changes (3)
- airbyte_cdk/test/models/connector_metadata/py.typed
- airbyte_cdk/test/models/connector_metadata/init.py
- airbyte_cdk/test/models/connector_metadata/JobType.py
🧰 Additional context used
🧬 Code graph analysis (21)
airbyte_cdk/test/models/connector_metadata/SupportLevel.py (5)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
SupportLevel(72-75)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
SupportLevel(21-24)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
SupportLevel(66-69)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
SupportLevel(28-31)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
SupportLevel(21-24)
airbyte_cdk/test/models/connector_metadata/AllowedHosts.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
AllowedHosts(78-85)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
AllowedHosts(65-72)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
AllowedHosts(92-99)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
AllowedHosts(54-61)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
AllowedHosts(65-72)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
AllowedHosts(12-19)
airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (5)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (4)
ResourceRequirements(12-19)Config(13-14)Config(33-34)Config(41-42)airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
ResourceRequirements(116-123)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ResourceRequirements(27-34)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ResourceRequirements(72-79)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
ResourceRequirements(50-57)
airbyte_cdk/test/models/connector_metadata/Secret.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
SecretStore(44-54)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Secret(296-305)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (2)
SecretStore(19-29)Secret(40-49)airbyte_cdk/test/models/connector_metadata/SecretStore.py (1)
SecretStore(11-21)
airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (11)
SupportedSerializationEnum(269-272)SupportedTransportEnum(275-277)DataChannel(280-286)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)ConnectorIPCOptions(289-293)
airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
NormalizationDestinationDefinitionConfig(88-103)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
NormalizationDestinationDefinitionConfig(47-62)
airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (13)
DeadlineAction(157-159)StreamBreakingChangeScope(162-171)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)Config(137-138)BreakingChangeScope(316-320)VersionBreakingChange(362-384)ConnectorBreakingChanges(406-414)
airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (4)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (7)
Config(13-14)Config(33-34)Config(41-42)ResourceRequirements(12-19)JobType(22-29)JobTypeResourceLimit(32-37)ActorDefinitionResourceRequirements(40-48)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (1)
NormalizationDestinationDefinitionConfig(9-24)airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (1)
ResourceRequirements(11-18)airbyte_cdk/test/models/connector_metadata/JobType.py (1)
JobType(9-16)
airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
PyPi(204-209)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)RemoteRegistries(323-327)
airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py (4)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ConnectorPackageInfo(217-218)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ConnectorPackageInfo(199-200)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
ConnectorPackageInfo(217-218)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
ConnectorPackageInfo(217-218)
airbyte_cdk/test/models/connector_metadata/TestConnections.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
TestConnections(57-62)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)
TestConnections(32-37)
airbyte_cdk/test/models/connector_metadata/ConnectorBuildOptions.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
ConnectorBuildOptions(30-34)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)
airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
SourceFileInfo(234-239)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
SourceFileInfo(164-169)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
SourceFileInfo(35-40)
airbyte_cdk/test/models/connector_metadata/GitInfo.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
GitInfo(212-231)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
GitInfo(142-161)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
GitInfo(13-32)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (12)
ResourceRequirements(116-123)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)Config(137-138)JobType(126-133)JobTypeResourceLimit(308-313)ActorDefinitionResourceRequirements(351-359)
airbyte_cdk/test/models/connector_metadata/ReleaseStage.py (5)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
ReleaseStage(65-69)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ReleaseStage(14-18)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ReleaseStage(59-63)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
ReleaseStage(21-25)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
ReleaseStage(14-18)
airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
RolloutConfiguration(136-154)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
RolloutConfiguration(14-32)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
RolloutConfiguration(74-92)
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (9)
ConnectorMetrics(242-245)Usage(248-251)SyncSuccessRate(254-257)ConnectorMetric(260-266)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (4)
ConnectorMetrics(43-46)Usage(49-52)SyncSuccessRate(55-58)ConnectorMetric(61-67)
airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (12)
GitInfo(212-231)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)SourceFileInfo(234-239)ConnectorMetrics(242-245)Usage(248-251)SyncSuccessRate(254-257)ConnectorMetric(260-266)GeneratedFields(330-334)
airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (9)
Sl(174-178)Ql(181-188)AirbyteInternal(191-201)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (3)
Sl(130-134)Ql(137-144)AirbyteInternal(147-157)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (3)
Sl(130-134)Ql(137-144)AirbyteInternal(147-157)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (11)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (28)
RolloutConfiguration(74-92)DeadlineAction(95-97)StreamBreakingChangeScope(100-109)SourceType(14-18)ReleaseStage(21-25)SupportLevel(28-31)ResourceRequirements(34-41)JobType(44-51)AllowedHosts(54-61)SuggestedStreams(64-71)Sl(130-134)Ql(137-144)AirbyteInternal(147-157)GitInfo(160-179)SourceFileInfo(182-187)ConnectorMetrics(190-193)ConnectorMetric(208-214)ConnectorPackageInfo(217-218)NormalizationDestinationDefinitionConfig(112-127)BreakingChangeScope(229-233)JobTypeResourceLimit(221-226)GeneratedFields(236-240)VersionBreakingChange(254-276)ActorDefinitionResourceRequirements(243-251)ConnectorBreakingChanges(279-287)ConnectorRegistryReleases(344-354)ConnectorRegistrySourceDefinition(290-341)ConnectorRegistryDestinationDefinition(381-433)airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (1)
RolloutConfiguration(11-29)airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (7)
Config(13-14)Config(33-34)Config(41-42)ResourceRequirements(12-19)JobType(22-29)JobTypeResourceLimit(32-37)ActorDefinitionResourceRequirements(40-48)airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (4)
Config(30-31)Sl(12-16)Ql(19-26)AirbyteInternal(29-39)airbyte_cdk/test/models/connector_metadata/AllowedHosts.py (2)
Config(12-13)AllowedHosts(11-18)airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (8)
Config(19-20)Config(38-39)Config(63-64)DeadlineAction(13-15)StreamBreakingChangeScope(18-27)BreakingChangeScope(30-34)VersionBreakingChange(37-59)ConnectorBreakingChanges(62-70)airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py (1)
SuggestedStreams(11-18)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (5)
GitInfo(13-32)SourceFileInfo(35-40)ConnectorMetrics(43-46)ConnectorMetric(61-67)GeneratedFields(70-74)airbyte_cdk/test/models/connector_metadata/GitInfo.py (1)
GitInfo(12-31)airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py (1)
ConnectorPackageInfo(11-12)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (1)
NormalizationDestinationDefinitionConfig(9-24)
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/test/models/connector_metadata/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (15)
airbyte_cdk/test/models/connector_metadata/README.md (2)
1-79: The documentation structure and examples are excellent!The README provides clear usage examples, describes the model generation workflow, and properly documents the schema source. The validation examples and field access patterns will be helpful for users.
21-21: The import path in the README is correct—no changes needed.The script confirms the package exists at
airbyte_cdk/test/models/connector_metadata/, which matches exactly what the README imports on line 21. The filesystem structure validates the documentation.The original concern about a path mismatch appears to have been based on PR objectives that differ from the actual implementation. The README accurately reflects the real package structure.
Likely an incorrect or invalid review comment.
airbyte_cdk/test/models/connector_metadata/SecretStore.py (1)
11-21: The model structure looks good!The use of
Extra.forbidis appropriate for security-sensitive secret store configuration, preventing accidental inclusion of unvalidated fields. TheLiteral["GSM"]type for the secret store ensures type safety.Quick check: Is the restriction to only GSM (Google Secret Manager) intentional for the current scope? Just want to confirm this isn't limiting support for other secret stores (AWS Secrets Manager, Azure Key Vault, etc.) that might be needed.
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (1)
30-36: Nice design: Union types provide flexibility!The use of
Union[str, Usage]andUnion[str, SyncSuccessRate]is a good pattern—it allows both string literals and typed enums, providing backwards compatibility while enabling type-safe code when using the enums directly.airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py (2)
12-20: Good IPC serialization and transport options!The enum choices make sense:
- Serialization: JSONL (human-readable), PROTOBUF (efficient), FLATBUFFERS (zero-copy)
- Transport: STDIO (standard streams), SOCKET (network/IPC sockets)
This provides a solid foundation for connector IPC configuration.
23-36: Strict validation is appropriate for IPC config.Using
Extra.forbidin bothDataChannelandConnectorIPCOptionsis the right choice for IPC configuration—it prevents misconfiguration from typos or unsupported fields. Making all DataChannel fields required ensures complete IPC specifications.bin/generate_component_manifest_files.py (1)
181-232: Nice refactoring to extract the common generation flow!The
generate_models_from_schemasfunction nicely consolidates the common logic for generating models from YAML schemas, making it reusable for both declarative components and metadata models. Thepost_processandmetadata_modelsflags provide clear control over the different post-processing paths.airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py (1)
11-18: Model structure looks good!The
SuggestedStreamsmodel is well-defined with a clear description explaining the semantics of missing vs empty arrays. The use ofExtra.allowprovides forward compatibility for additional fields.airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)
52-63: ConnectorTestSuiteOptions model structure looks good.The model appropriately aggregates test suite configuration with required
suitefield and optional lists of secrets and test connections. The use ofExtra.forbidis appropriate for a configuration model.airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (2)
43-46: Flexible typing withAnyfields.The
ConnectorMetricsmodel usesAnyfor theall,cloud, andossfields, which sacrifices type safety for schema flexibility. This is likely intentional to accommodate varying metric structures across environments, but it means invalid data won't be caught at validation time.If metric schemas are known and stable, consider using more specific types. Otherwise, this trade-off seems reasonable for extensibility.
70-74: GeneratedFields aggregation looks good.The model appropriately aggregates various generated metadata fields (git info, source file info, metrics, SBOM URL). The structure makes sense for generated/computed connector metadata.
airbyte_cdk/test/models/connector_metadata/Secret.py (1)
11-33: Confirmed: Systematic duplication of models across auto-generated files. Consider centralizing generation configuration.The duplication you identified is real and extends far beyond
SecretStoreandSecret—GitInfoappears in 8 files,RolloutConfigurationin 7, and the same pattern repeats across the generated models. This suggests the code generator is inlining these shared schemas rather than generating them once and importing them.Your suggestion about configuring
datamodel-codegento generate shared models is the right direction. Since these are auto-generated files, the fix should be in the generation configuration or source schema (not by modifying the generated output directly).Could you clarify: Is the generation configuration set up to inline these common models intentionally, or would this be worth investigating as part of the code generation pipeline improvement?
airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)
66-66: Regex pattern is intentional, not a limitation.The regex
^\d+\.\d+\.\d+$reflects a deliberate versioning strategy rather than an oversight. The project's publish workflow checks version format with the same pattern: versions matching the strict MAJOR.MINOR.PATCH format are marked as stable releases (IS_PRERELEASE=false), while versions with additional identifiers are handled separately as pre-releases. Breaking changes in this schema are scoped to production releases only, not pre-releases. Since the file is auto-generated from a YAML schema (datamodel-codegen), the pattern is maintained by the build process rather than manual editing anyway.No further action needed here—this is working as designed.
Likely an incorrect or invalid review comment.
airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py (1)
87-91: Confirm semver rule: stable versions only for breakingChanges?Regex ^\d+.\d+.\d+$ excludes pre-releases/build metadata. Is this intentional (i.e., breaking changes only tracked on stable releases), or should RCs be allowed here too, wdyt?
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
65-76: <!-- [scratchpad_update]
[task verify_duplication 1/1]
[observations added]
- Web search confirms datamodel-codegen has
--reuse-modelflag (reuses models when module has same content)- Duplication is widespread across 8 distinct files + multiple redefinitions in ConnectorRegistry*.py
- Dedicated enum modules exist (ReleaseStage.py, SupportLevel.py, JobType.py) but redefinitions occur in 8 other places
- Review comment mentioned using datamodel-codegen's model reuse feature—this is a real capability
[analysis added]
- The original review comment's suggestion about datamodel-codegen's model reuse is technically valid
- The extent of duplication is more severe than what's in ConnectorMetadataDefinitionV0.py alone—it's a systemic issue
- The concern about "mypy friction" and drift is valid given 8 redundant definitions
- Standalone enum modules appear to exist but are not being imported by dependent files
[decision]
The review comment is essentially correct. The duplication is verified. This should be marked as since the suggestion requires team decision on implementation strategy (datamodel-codegen reuse vs manual consolidation).
[scratchpad_end] -->Verified: Extensive duplication of ReleaseStage/SupportLevel/JobType enums across the connector_metadata module
Your concern is spot-on. The script confirms these enums are redefined across 8+ files instead of being reused:
- ReleaseStage: 6 definitions (standalone module + ConnectorMetadataDefinitionV0, ConnectorRegistryV0, ConnectorRegistryDestinationDefinition, ConnectorRegistryReleases, ConnectorRegistrySourceDefinition)
- SupportLevel: 6 definitions (same pattern)
- JobType: 8 definitions (including ActorDefinitionResourceRequirements and RegistryOverrides)
The original suggestion to enable datamodel-codegen's
--reuse-modelflag to reuse models when a module has the model with the same content, or consolidate into a single shared module, remains a solid approach to eliminate drift and type-checking friction.Consider whether the code generation pipeline should enforce the
--reuse-modeloption or whether refactoring toward explicit imports from the dedicated enum modules (ReleaseStage.py, SupportLevel.py, JobType.py) fits your architecture better—which path appeals to you?
airbyte_cdk/test/models/connector_metadata/generated/AirbyteInternal.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/AllowedHosts.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorBuildOptions.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorIPCOptions.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetrics.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryV0.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorReleases.py
Outdated
Show resolved
Hide resolved
...yte_cdk/test/models/connector_metadata/generated/NormalizationDestinationDefinitionConfig.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/RegistryOverrides.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/SupportLevel.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, Great, that's much better, I think. Now, let's once again put the module until a 'generated' submodule breadcrumb. And import into airbyte_cdk/tests/models/init.py:
- the top level manifest file model (I don't know the name)
- the test suites definition model
This gives us most of what we would need to import for our use cases. Imports through the "generated" breadcrumb will still be possible but we can reduce surface area to maintain by only importing the models we need to call or instantiate directly.
…mports - Moved all generated models to connector_metadata/generated/ - Added __init__.py in connector_metadata to export key models - Updated airbyte_cdk/test/models/__init__.py to import: - ConnectorMetadataDefinitionV0 (top-level manifest model) - ConnectorTestSuiteOptions (test suites definition model) - Updated README with new import paths and examples - Updated build script to output to new location This reduces the API surface area while keeping all models accessible through the generated submodule. Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
bin/generate_component_manifest_files.py (3)
17-18: Master branch dependency affects build reproducibility.These URLs point to the
masterbranch, meaning every build fetches the latest schemas. If upstream introduces breaking changes, CDK builds will fail unpredictably.This concern was previously raised. The recommendation was to pin to a specific commit SHA or tag for reproducible builds.
41-77: Multiple reliability concerns remain unaddressed.Previous reviews identified several critical issues:
- Misleading error message (lines 55-59): Claims "Using cached schemas if available" but immediately re-raises, causing build failure with no actual fallback
- No GitHub authentication: Unauthenticated requests hit 60/hour rate limit vs. 5000/hour with auth
- No retry logic: Transient network errors cause immediate build failure
- No fallback mechanism: If GitHub is unavailable, builds always fail
These concerns were raised in earlier reviews and would significantly impact build reliability.
246-257: Missing error handling for schema download.If
download_metadata_schemasfails at line 249 (due to rate limits, network issues, etc.), users will see a cryptic stack trace without actionable guidance.A previous review suggested wrapping this call in try/except to catch specific exceptions and provide helpful error messages (e.g., "Set GITHUB_TOKEN environment variable to avoid rate limits" for 403 errors, "Check network connectivity" for network errors).
airbyte_cdk/test/models/connector_metadata/README.md (1)
8-10: Add language specifier to fenced code block.The code block showing the repository path should specify a language (e.g.,
text) for proper syntax highlighting and markdown lint compliance.This was flagged in a previous review.
🧹 Nitpick comments (1)
bin/generate_component_manifest_files.py (1)
224-231: Clarify mutually exclusive flags, wdyt?The
if/elif/elsestructure meanspost_processandmetadata_modelsare mutually exclusive—if both areTrue, only post-processing runs and metadata post-processing is skipped.Is this intentional? If so, consider adding a docstring note or an assertion at the function start to make the constraint explicit, or using an enum parameter instead of two booleans.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
airbyte_cdk/test/models/connector_metadata/generated/ActorDefinitionResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/AirbyteInternal.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/AllowedHosts.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorBreakingChanges.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorBuildOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorIPCOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetrics.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorPackageInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryDestinationDefinition.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryReleases.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistrySourceDefinition.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryV0.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorReleases.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/GeneratedFields.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/GitInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/JobType.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/NormalizationDestinationDefinitionConfig.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RegistryOverrides.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ReleaseStage.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RemoteRegistries.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RolloutConfiguration.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/Secret.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SecretStore.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SourceFileInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SuggestedStreams.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SupportLevel.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/TestConnections.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/__init__.pyis excluded by!**/generated/**
📒 Files selected for processing (4)
airbyte_cdk/test/models/__init__.py(1 hunks)airbyte_cdk/test/models/connector_metadata/README.md(1 hunks)airbyte_cdk/test/models/connector_metadata/__init__.py(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/test/models/__init__.py (2)
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.py (2)
ConnectorMetadataDefinitionV0(490-495)ConnectorTestSuiteOptions(337-348)airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.py (1)
ConnectorTestSuiteOptions(52-63)
airbyte_cdk/test/models/connector_metadata/__init__.py (2)
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.py (2)
ConnectorMetadataDefinitionV0(490-495)ConnectorTestSuiteOptions(337-348)airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.py (1)
ConnectorTestSuiteOptions(52-63)
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/test/models/connector_metadata/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (7)
bin/generate_component_manifest_files.py (3)
30-32: Good refactoring for reusability!Parameterizing the directory path makes this function reusable for both local and downloaded schemas.
34-38: LGTM!Accepting
yaml_filesas a parameter improves testability and makes the function more flexible for different use cases.
160-179: LGTM!The post-processing logic correctly applies the pydantic.v1 compatibility layer to generated metadata models, consistent with the approach used for declarative component models.
airbyte_cdk/test/models/__init__.py (2)
4-7: LGTM!The imports correctly expose the new metadata models from the
connector_metadatapackage.
11-16: LGTM!The
__all__list is correctly updated to include the new metadata models, maintaining alphabetical ordering.airbyte_cdk/test/models/connector_metadata/README.md (1)
1-92: Excellent documentation!The README is comprehensive and well-structured, with clear usage examples covering validation, field access, and model imports. The examples accurately reflect the package structure.
airbyte_cdk/test/models/connector_metadata/__init__.py (1)
1-7: LGTM!The package initialization correctly re-exports the primary metadata models with proper
__all__definition, making them accessible via theconnector_metadatanamespace.
…tput - Modified build script to generate all models into a single models.py file - Added consolidated JSON schema generation (metadata_schema.json) - Updated imports to reference generated.models module - Removed 30+ individual model files in favor of single-file approach - Updated README to document new structure and outputs - Added GitHub token support to avoid rate limiting Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
feat: Add metadata models package with dynamic schema download
Summary
This PR adds a new
airbyte_cdk.metadata_modelspackage containing auto-generated Pydantic models for validating connectormetadata.yamlfiles. The models are generated from JSON Schema YAML files maintained in the airbytehq/airbyte repository.Key changes:
airbyte_cdk/metadata_models/package with generated Pydantic models (~3400 lines)bin/generate_component_manifest_files.pyto download schemas from GitHub on-demand during buildpydantic.v1compatibility layer for consistency with existing declarative component modelspy.typedmarker for type hint complianceUsage example:
Review & Testing Checklist for Human
This is a yellow risk PR - the implementation is straightforward but has network dependencies and generated code that need verification:
metadata.yamlfile to ensure the generated models work correctlypoetry run poe buildin CI to ensure it works in that environment (may hit GitHub rate limits)Notes
Requested by: AJ Steers ([email protected]) / @aaronsteers
Devin session: https://app.devin.ai/sessions/9b487ed33f5842c087a1de30d33db888
Known limitations:
Why this approach:
datamodel-code-generatorinfrastructure used for declarative component modelsSummary by CodeRabbit
New Features
Documentation