-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload versioned connector docs on publish #30410
Conversation
@bnchrch what is the best way to test this change on CI? Is there a slash command I can run try to run the publishing step? I assume we have something like this for pre-releases I've tested it locally by running |
One thing I just realized I forgot to address here is images in connector docs - if the markdown docs are pointing at a relative path, e.g. to the
EDIT: I've thought about the images issue a bit more, and have some more thoughts:
The main problem I see with this approach is that if a user is using an old connector version whose docs reference an image which no longer exists on the master branch (e.g. image was removed or renamed during some later release), then that link won't work and we won't show the image. However, I think this is edge-casey enough that it is acceptable, and preferrable to having to manage versioned images as well. At least, this should be an okay placeholder until we do the full refactor to move docs and their associated images into connector folders, at which point it would be easier to just upload the images for that connector to its metadata folder. What do you think @bnchrch? |
# TEST UPLOAD COMMAND | ||
@pytest.mark.parametrize( | ||
"latest_uploaded, version_uploaded, icon_uploaded", | ||
"latest_uploaded, version_uploaded, icon_uploaded, doc_version_uploaded, doc_inapp_version_uploaded, doc_latest_uploaded, doc_inapp_latest_uploaded", | ||
[ | ||
(False, False, False), | ||
(True, False, False), | ||
(False, True, False), | ||
(False, False, True), | ||
(True, True, False), | ||
(True, False, True), | ||
(False, True, True), | ||
(True, True, True), | ||
(False, False, False, False, False, False, False), | ||
(True, False, False, False, False, False, False), | ||
(False, True, False, False, False, False, False), | ||
(False, False, True, False, False, False, False), | ||
(True, True, False, False, False, False, False), | ||
(True, False, True, False, False, False, False), | ||
(False, True, True, False, False, False, False), | ||
(True, True, True, False, False, False, False), | ||
(True, True, True, True, True, True, True), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including every permutation of 7 different boolean parameters would be 2^7 = 128 different test cases which felt unnecessary, so I just included one test case where all are set to true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be helpful in this case (I've often found myself wanting the feature in java):
(From https://docs.pytest.org/en/latest/how-to/parametrize.html#pytest-mark-parametrize-parametrizing-test-functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If indeed we are just enumerating all the options
Hey @lmossman let me answer some of these larger questions! Then ill be back for an indepth code review later.
So I believe you can test using
Heres an example of the env vars I set export METADATA_SERVICE_GCS_CREDENTIALS=`cat ~/Development/secrets/dev-metadata-service-account.json`
export METADATA_SERVICE_BUCKET_NAME="dev-airbyte-cloud-connector-metadata-service"
export SPEC_CACHE_BUCKET_NAME="io-airbyte-cloud-spec-cache"
export SPEC_CACHE_GCS_CREDENTIALS=`cat ~/Development/secrets/prod-metadata-service-account.json`
export GCP_GSM_CREDENTIALS=`cat ~/Development/secrets/ben-dev-account-dataline-service-key.json`
export DOCKER_HUB_USERNAME="benairbyte"
export DOCKER_HUB_PASSWORD="REDACTED" Afterwards you should be free to run airbyte-ci connectors --name=source-faker test # which triggers metadata validate
airbyte-ci connectors --name=source-faker publish --main-release # which triggers metadata upload
So the main question lies with "who is responsible for turning relative paths to absolute urls?" The FE already does this so it would not be a stretch to do it there again. However I think a good set principals to follow is
With that in mind I (naively) think that
Where this breaks down / requires extra thought is
|
@bnchrch I just checked and there are actually only 8 connector docs containing images, and all of them are either full github URLs, or they point to the So that should make the image uploading logic fairly straightforward, i.e.:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmossman This is great!
The only comment I have that is a "blocker" is the question around should we fail if there are no docs?
color="green" | ||
) | ||
if metadata_upload_info.doc_inapp_latest_uploaded: | ||
click.secho( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: We're getting a little bit long in the ifs here
I know I started it 😅
But if we created a helper I would not be mad
def log_if_uploaded(metadata_upload_info: MetadataUploadInfo, upload_success_key: str, upload_id_key: str, name: str):
success = metadata_upload_info[upload_success_key]
if success:
path = metadata_upload_info[upload_id_key]
click.secho(
f"The {name} file for {metadata_upload_info.metadata_file_path} was uploaded to {path}.",
color="green"
)
log_if_uploaded(metadata_upload_info, "icon_uploaded", "icon_blob_id", "icon")
local_doc_path = docs_folder_path / f"{doc_file_name}.md" | ||
local_inapp_doc_path = docs_folder_path / f"{doc_file_name}.inapp.md" | ||
|
||
remote_doc_path = get_doc_remote_file_path(metadata.data.dockerRepository, "latest" if latest else metadata.data.dockerImageTag, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 very nice
if local_doc_path.exists(): | ||
doc_uploaded, doc_blob_id = upload_file_if_changed(local_doc_path, bucket, remote_doc_path) | ||
else: | ||
doc_uploaded, doc_blob_id = False, f"No doc found at {local_doc_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Do we want to enforce that they have docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this should be enforced, I will add an exception here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Lets run test on a few GA connectors so that were sure we didnt cause a regression :)
airbyte-ci/connectors/metadata_service/lib/metadata_service/gcs_upload.py
Outdated
Show resolved
Hide resolved
@@ -189,4 +234,12 @@ def upload_metadata_to_gcs( | |||
icon_blob_id=icon_blob_id, | |||
icon_uploaded=icon_uploaded, | |||
metadata_file_path=str(metadata_file_path), | |||
doc_version_uploaded=doc_version_uploaded, | |||
doc_version_blob_id=doc_version_blob_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a comment about "hey we have a lot of ifs in the log function"
I think thats a symptom of the MetadataUploadInfo being flat.
@lmossman Do you have the capacity to move this into a more generic upload_files
field?
It should go a long way in removing boiler plate
# Updated models
@dataclass(frozen=True)
class MetadataUploadInfo:
metadata_uploaded: bool
metadata_file_path: str
uploaded_files: List[UploadedFile]
class UploadedFile:
id: str
uploaded: bool
description: str
blob_id: Optional[str]
# Updated upload helpers
def _version_upload(metadata: ConnectorMetadataDefinitionV0, bucket: storage.bucket.Bucket, metadata_file_path: Path) -> UploadedFile:
version_path = get_metadata_remote_file_path(metadata.data.dockerRepository, metadata.data.dockerImageTag)
uploaded, blob_id = upload_file_if_changed(metadata_file_path, bucket, version_path, disable_cache=True)
return UploadedFile(
id="version_metadata"
uploaded=uploaded,
description="versioned metadata file",
blob_id=blob_id,
)
# Updated main method
version_metadata_upload = _version_upload(metadata, bucket, metadata_file_path)
return MetadataUploadInfo(
uploaded=version_metadata_upload.uploaded or latest_metadata_upload.uploaded,
metadata_file_path=str(metadata_file_path),
uploaded_files=[
version_metadata_upload,
latest_metadata_upload,
]
)
# updated logger
for file in metadata_upload_info.uploaded_files:
if file.uploaded:
click.secho(
f"The {file.description} file for {metadata_upload_info.metadata_file_path} was uploaded to {file.plob_id}.",
color="green"
)
Also @lmossman I like your approach with the images Question on that. What path will you put them at in the docs folder? I would prefer a relative path like `../../../../folder1/folder2/folder3/image.jpg to become Thoughts? |
@bnchrch I've addressed your comments and finally got the tests working after quite a bit of trial and error 😅 The only thing I haven't addressed is the images - I talked with Tim about this and we decided that since the frontend will still need to assume the location of the connector doc file within the docs folder in order to properly resolve relative links to other docs pages, we are just going to go with the frontend-github-url replacement approach that I laid out in this comment in order to keep the scope of this change small, and get to supporting versioned docs more quickly while still having time to complete the other FE projects we have planned for this quarter. Once connector ops can prioritize a more thorough migration of all doc files along with their images into the connector folders, that uploading logic will hopefully be simpler, so we don't want to invest the effort to make that work for this interim state. Hopefully you are okay with this more minimal approach for now! |
Totally ok! Done not perfect perfectly applies here! Rereviewing now :) |
f"The icon file {metadata_upload_info.metadata_file_path} was uploaded to {metadata_upload_info.icon_blob_id}.", color="green" | ||
) | ||
for file in metadata_upload_info.uploaded_files: | ||
if file.uploaded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
description="versioned inapp doc", | ||
blob_id=doc_inapp_version_blob_id, | ||
), | ||
UploadedFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 Ideally we return these from their respective upload functions.
@lmossman Do you think thats a reasonable change we can make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnchrch the reason I did it this way was because in some cases we wet the uploaded
and blob_id
to False, None
without calling the upload function. So if UploadedFile
is returned from the function, then we would need to duplicate the id
and the description
in upload_metadata_to_gcs
as well for those cases where the upload function isn't called, so doing it this way felt DRYer to me.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks just fine to me!
Theres one comment that would be great if we can get it
But it is not a blocker so approving now 👍
Thanks for all the context sharing and being so open to feedback @lmossman 💎
@bnchrch thanks for the quick review! I tried running
Have you seen that before? I also tried running the |
@lmossman Hmm youre not targeting colima in your cli by any chance are you? |
|
@bnchrch I've made the changes that we discussed yesterday, in these commits:
As discussed I will leave it to you to update the MetadataUpload class in airbyte-ci accordingly! |
|
||
click.echo(f"Validating {file_path}...") | ||
click.echo(f"Validating {metadata_file_path}...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍 Like this name change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small changes! Working on the CI side of it now
def validate(file_path: pathlib.Path): | ||
file_path = file_path if not file_path.is_dir() else file_path / METADATA_FILE_NAME | ||
@click.argument("metadata_file_path", type=click.Path(exists=True, path_type=pathlib.Path)) | ||
@click.argument("doc_path", type=click.Path(exists=True, path_type=pathlib.Path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗️ We should mark this required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we need to update the readme with an example https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/metadata_service/lib/README.md?plain=1#L35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked the metadata-file-path, doc-path, and bucket-name arguments as required for the validate
and upload
commands, and I've updated the validate example in the readme
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
afdf9cb
to
b7490dd
Compare
For anyone tagged here - this was caused by CI applying auto-formatting changes to this PR which for some reason changed a ton of files. Not sure why this happened but please feel free to ignore this PR |
Co-authored-by: Ella Rohm-Ensing <[email protected]> Co-authored-by: bnchrch <[email protected]>
What
Resolves #29400
This PR adds logic to the metadata_service which uploads connector documentation to GCS along with the other metadata uploads.
This will be utilized in an upcoming airbyte server API endpoint to retrieve docs for specific connector versions, which will be utilized in the UI to show the correct documentation for the connector version actively being used.
How
Adds a
_docs_upload()
function to thegcs_upload.py
file which does the following:.inapp.md
and.md
files, and uploads both if they existlatest=False
to upload to the versioned path, and once withlatest=True
to upload to the latest path, ensuring that doc files are present in both.latest
path so that the fallback is always presentNote: I decided to keep the connectors docs where they are today, as opposed to moving them next to the
metadata.yaml
files, because Sherif and I decided that we wanted to keep this change as small in scope as possible so that we can complete the other UI doc tasks we have planned for this quarter.This small change will unlock the ability to have versioned docs in the UI, and once Connector Ops feels the need to prioritize it, they can do the full migration of these docs into the connector folders, updating docusaurus, etc.
🚨 User Impact 🚨
No breaking changes