-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Upload all docs script #30603
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments! But the only thing holding this part is making sure we are very explicit and thorough with our connector directory exclusions
@click.argument("connectors-dir", type=click.Path(exists=True, path_type=pathlib.Path)) | ||
@click.argument("docs-dir", type=click.Path(exists=True, path_type=pathlib.Path)) | ||
@click.argument("bucket-name", type=click.STRING) | ||
def upload_all_docs(connectors_dir: pathlib.Path, docs_dir: pathlib.Path, bucket_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗Lets be clear this is a dev command by prefixing the command with experimental: experimental_upload_all_docs
@@ -262,3 +262,91 @@ def upload_metadata_to_gcs( | |||
), | |||
] | |||
) | |||
|
|||
def upload_all_docs_to_gcs(connectors_dir: Path, docs_dir: Path, bucket_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📚 Lets write a docstring explaining what it does, why we added it, and if theres any criteria to remove this
connector_infos = [] | ||
|
||
for connector_dir in connectors_dir.iterdir(): | ||
if connector_dir.is_dir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 would be great to have a single exclusion criteria
💅 would be great not to nest so deep
❗ also I think were missing certain criteria
def is_valid_connector_dir(connector_dir):
# Check if directory
# Check starts with source- or destination-
# Check has metadata.yaml file
# Do not allow `-scaffold`
# Do not allow `-secure` or `-strict-encrypt`
# What about our third party folder?
for connector_dir in connectors_dir.iterdir() if is_valid_connector_dir(connector_dir):
#...
if connector_type and connector_name: # Skip folders that don't match the pattern | ||
metadata_file_path = connector_dir / METADATA_FILE_NAME | ||
if metadata_file_path.exists(): | ||
metadata = read_metadata_yaml(metadata_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡(Very Optional) if we import ci_connector_ops (example import) we can use Connector
and get_all_connectors_in_repo to do alot of this. Particularly now that the related airbyte-ci PR adds helper properties for getting the doc and inapp doc paths.
metadata_file_path = connector_dir / METADATA_FILE_NAME | ||
if metadata_file_path.exists(): | ||
metadata = read_metadata_yaml(metadata_file_path) | ||
doc_path, inapp_doc_path = get_doc_paths(metadata, connector_name) # 'source' becomes 'sources', 'destination' becomes 'destinations' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓Why did we go with this method instead of just calling commands.upload(metadata_file_path, doc_path, bucket_name
directly?
afdf9cb
to
b7490dd
Compare
c42c47d
to
2058eb9
Compare
2058eb9
to
18c208c
Compare
What
Based on #30410
Draft PR to add a command to upload the metadata for every connector's current version to GCS.
Note: I don't think we should actually merge this PR in, since this shouldn't be needed more than once; instead I think we should keep it as a draft and run the command from this branch, and then close the branch. We can always find this PR again in github if we need the script again for any reason.
How
Add a command to metadata_service's commands.py which iterates over all connectors and calls the
upload
method for each to upload their metadata to GCS.For now, I've added a
break
to the loop so it only uploads the metadata for a single connector. Comment this out to upload metadata for all connectors.But when I actually do run it for real on the prod bucket, I plan to uncomment that last section so that it uploads the docs for all connectors.
Example usage: