Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Upload all docs script #30603

Closed
wants to merge 6 commits into from

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Sep 20, 2023

What

Based on #30410

Draft PR to add a command to upload the metadata for every connector's current version to GCS.

Note: I don't think we should actually merge this PR in, since this shouldn't be needed more than once; instead I think we should keep it as a draft and run the command from this branch, and then close the branch. We can always find this PR again in github if we need the script again for any reason.

How

Add a command to metadata_service's commands.py which iterates over all connectors and calls the upload method for each to upload their metadata to GCS.

For now, I've added a break to the loop so it only uploads the metadata for a single connector. Comment this out to upload metadata for all connectors.

But when I actually do run it for real on the prod bucket, I plan to uncomment that last section so that it uploads the docs for all connectors.

Example usage:

$ poetry run metadata_service upload-all-metadata "/Users/lakemossman/code/airbyte/" "/Users/lakemossman/code/airbyte/docs/" dev-airbyte-cloud-connector-metadata-service
~~~~~~ Uploading metadata for source-secoda
Running validator: validate_all_tags_are_keyvalue_pairs
Running validator: validate_at_least_one_language_tag
Running validator: validate_major_version_bump_has_breaking_change_entry
Running validator: validate_docs_path_exists
Running validator: validate_metadata_images_in_dockerhub
Checking that the following images are on dockerhub: [('airbyte/source-secoda', '0.1.0')]
Local /Users/lakemossman/code/airbyte/airbyte-integrations/connectors/source-secoda/icon.svg md5_hash: 6QjWZXjaCqp3LDCmCgS3SA==
Remote metadata/airbyte/source-secoda/latest/icon.svg md5_hash: 6QjWZXjaCqp3LDCmCgS3SA==
Local /Users/lakemossman/code/airbyte/airbyte-integrations/connectors/source-secoda/metadata.yaml md5_hash: B8/QzJaHMHoKq6nLVxIBeA==
Remote metadata/airbyte/source-secoda/0.1.0/metadata.yaml md5_hash: B8/QzJaHMHoKq6nLVxIBeA==
Local /Users/lakemossman/code/airbyte/docs/integrations/sources/secoda.md md5_hash: qa3oJHdX3onzbAiWnbVFOA==
Remote metadata/airbyte/source-secoda/0.1.0/doc.md md5_hash: None
Uploading /Users/lakemossman/code/airbyte/docs/integrations/sources/secoda.md to metadata/airbyte/source-secoda/0.1.0/doc.md...
Local /Users/lakemossman/code/airbyte/airbyte-integrations/connectors/source-secoda/metadata.yaml md5_hash: B8/QzJaHMHoKq6nLVxIBeA==
Remote metadata/airbyte/source-secoda/latest/metadata.yaml md5_hash: B8/QzJaHMHoKq6nLVxIBeA==
Local /Users/lakemossman/code/airbyte/docs/integrations/sources/secoda.md md5_hash: qa3oJHdX3onzbAiWnbVFOA==
Remote metadata/airbyte/source-secoda/latest/doc.md md5_hash: None
Uploading /Users/lakemossman/code/airbyte/docs/integrations/sources/secoda.md to metadata/airbyte/source-secoda/latest/doc.md...

@vercel
Copy link

vercel bot commented Sep 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 6, 2023 9:54pm

@lmossman lmossman requested a review from bnchrch September 20, 2023 01:56
@lmossman lmossman changed the base branch from master to lmossman/upload-versioned-connector-docs September 20, 2023 01:56
@lmossman
Copy link
Contributor Author

@bnchrch could you take a look at the upload_all_docs command/script that I've added in this draft PR, and let me know if you have any thoughts? This is what I plan to use to upload all docs for all of the current connector versions after we merge #30410

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments! But the only thing holding this part is making sure we are very explicit and thorough with our connector directory exclusions

@click.argument("connectors-dir", type=click.Path(exists=True, path_type=pathlib.Path))
@click.argument("docs-dir", type=click.Path(exists=True, path_type=pathlib.Path))
@click.argument("bucket-name", type=click.STRING)
def upload_all_docs(connectors_dir: pathlib.Path, docs_dir: pathlib.Path, bucket_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗Lets be clear this is a dev command by prefixing the command with experimental: experimental_upload_all_docs

@@ -262,3 +262,91 @@ def upload_metadata_to_gcs(
),
]
)

def upload_all_docs_to_gcs(connectors_dir: Path, docs_dir: Path, bucket_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📚 Lets write a docstring explaining what it does, why we added it, and if theres any criteria to remove this

connector_infos = []

for connector_dir in connectors_dir.iterdir():
if connector_dir.is_dir():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 would be great to have a single exclusion criteria
💅 would be great not to nest so deep
❗ also I think were missing certain criteria

def is_valid_connector_dir(connector_dir):
  # Check if directory
  # Check starts with source- or destination-
  # Check has metadata.yaml file
  # Do not allow `-scaffold`
  # Do not allow `-secure` or `-strict-encrypt`
  # What about our third party folder?

for connector_dir in connectors_dir.iterdir() if is_valid_connector_dir(connector_dir):
   #...

if connector_type and connector_name: # Skip folders that don't match the pattern
metadata_file_path = connector_dir / METADATA_FILE_NAME
if metadata_file_path.exists():
metadata = read_metadata_yaml(metadata_file_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡(Very Optional) if we import ci_connector_ops (example import) we can use Connector and get_all_connectors_in_repo to do alot of this. Particularly now that the related airbyte-ci PR adds helper properties for getting the doc and inapp doc paths.

metadata_file_path = connector_dir / METADATA_FILE_NAME
if metadata_file_path.exists():
metadata = read_metadata_yaml(metadata_file_path)
doc_path, inapp_doc_path = get_doc_paths(metadata, connector_name) # 'source' becomes 'sources', 'destination' becomes 'destinations'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓Why did we go with this method instead of just calling commands.upload(metadata_file_path, doc_path, bucket_name directly?

@lmossman lmossman force-pushed the lmossman/upload-versioned-connector-docs branch from afdf9cb to b7490dd Compare September 27, 2023 18:01
Base automatically changed from lmossman/upload-versioned-connector-docs to master September 27, 2023 23:40
@lmossman lmossman force-pushed the lmossman/upload-all-script branch from c42c47d to 2058eb9 Compare September 27, 2023 23:43
@lmossman lmossman force-pushed the lmossman/upload-all-script branch from 2058eb9 to 18c208c Compare September 27, 2023 23:44
@lmossman lmossman requested a review from bnchrch September 27, 2023 23:46
@lmossman lmossman requested review from bnchrch and removed request for bnchrch September 30, 2023 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants