Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Source S3: Add handling NoSuchBucket error #31383

Merged
merged 7 commits into from
Oct 24, 2023

Conversation

tolik0
Copy link
Contributor

@tolik0 tolik0 commented Oct 13, 2023

What

Add raising config error in case of wrong bucket name.
Closes:
#25835

How

Added a new type of error to file-based cdk to avoid catching in file-based cdk and just raise AirbyteTracedException config error.

@vercel
Copy link

vercel bot commented Oct 13, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Oct 24, 2023 7:52pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation CDK Connector Development Kit labels Oct 13, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 13, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit 75069a5c82) - ❌

⏲️ Total pipeline duration: 28.60s

Step Result
Build source-s3 docker image for platform(s) linux/x86_64
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@girarda girarda requested a review from clnoll October 13, 2023 16:53
@@ -64,6 +64,8 @@ def check_availability_and_parsability(
def _check_list_files(self, stream: "AbstractFileBasedStream") -> List[RemoteFile]:
try:
files = stream.list_files()
except CustomFileBasedSourceException as custom_exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tolik0 thanks for this PR, I agree that we want to be able customize the exceptions.

Looking at this code: the error that's relevant to the user is still that there was a problem listing files, so I think we want to keep the CheckAvailabilityError, but want to make sure we can customize the message, e.g. for things like missing bucket.

Since we're already raising CheckAvailabilityError ... from exc, we should get the missing bucket error as part of the stacktrace when we raise the CheckAvailabilityError, right?

If that's obscuring the error too much, instead of raising CustomFileBasedSourceException, WDYT about giving it a "message" attribute, then here we'll raise CheckAvailabilityError(custom_exc.message, stream=stream.name) from exc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clnoll Thanks for feedback!
I've removed CustomFileBasedSourceException. Do you think this is better? Should I split this PR into two: one for file-based CDK and another for S3 changes?

@tolik0 tolik0 force-pushed the tolik0/source-s3/wrong-bucket-config-error branch from 9f586f9 to db3ff06 Compare October 18, 2023 22:24
@octavia-squidington-iv octavia-squidington-iv requested a review from a team October 18, 2023 22:25
@tolik0 tolik0 requested a review from clnoll October 18, 2023 22:29
Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tolik0! I think I'm actually preferring to keep the CustomFileBasedSourceException. I left an example of what I have in mind in a comment.

I think you will have to split this into 2 PRs since the new CDK version won't be available yet when the source-s3 connector is published.

@@ -71,7 +71,8 @@ def _check_list_files(self, stream: "AbstractFileBasedStream") -> List[RemoteFil
try:
files = stream.list_files()
except Exception as exc:
raise CheckAvailabilityError(FileBasedSourceError.ERROR_LISTING_FILES, stream=stream.name) from exc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I'd do here is continue to raise the CustomException from your previous version, but catch it and re-raise with a CheckAvailabilityError, like this:

try:
    ...
except CustomFileBasedException as exc:
    raise CheckAvailabilityError(exc.error_message, stream=stream.name) from exc
except Exception as exc:
    ...

This way we can avoid having to do the hasattr check. Plus, I think we'll probably want to use your CustomFileBasedException in other places too as the file-based CDK matures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use CustomFileBasedException and re-raise as CheckAvailabilityError, removing the hasattr check. Thanks for the suggestion.

yield remote_file
else:
for remote_file in self._page(s3, globs, self.config.bucket, None, seen, logger):
for current_prefix in (prefixes if prefixes else [None]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the code to avoid duplication

) from exc
self.raise_error_listing_files(FileBasedSourceError.ERROR_LISTING_FILES, globs, exc)

def raise_error_listing_files(self, error_message: Union[str, FileBasedSourceError], globs: List[str], exc: Optional[Exception] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a helper method for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the helper method to reduce repetitive error-raising code.

@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit c60529b0df) - ❌

⏲️ Total pipeline duration: 16mn30s

Step Result
Build source-s3 docker image for platform(s) linux/x86_64
Unit tests
Acceptance tests
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@tolik0 tolik0 force-pushed the tolik0/source-s3/wrong-bucket-config-error branch from c60529b to 3c18702 Compare October 18, 2023 23:59
@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit 3c18702032) - ❌

⏲️ Total pipeline duration: 02mn36s

Step Result
Build source-s3 docker image for platform(s) linux/x86_64
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit 30e354ac81) - ❌

⏲️ Total pipeline duration: 14.54s

Step Result
Build source-s3 docker image for platform(s) linux/x86_64
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tolik0, looks great!

@tolik0 tolik0 force-pushed the tolik0/source-s3/wrong-bucket-config-error branch from 30e354a to 3edc8af Compare October 23, 2023 11:17
@octavia-squidington-iii octavia-squidington-iii removed the CDK Connector Development Kit label Oct 23, 2023
@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit 3edc8af363) - ❌

⏲️ Total pipeline duration: 03mn22s

Step Result
Build source-s3 docker image for platform(s) linux/amd64
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit 0550b77564) - ❌

⏲️ Total pipeline duration: 03mn29s

Step Result
Build source-s3 docker image for platform(s) linux/amd64
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@tolik0 tolik0 force-pushed the tolik0/source-s3/wrong-bucket-config-error branch from 0550b77 to 2c9c7b6 Compare October 24, 2023 19:39
@airbyte-oss-build-runner
Copy link
Collaborator

source-s3 test report (commit fe9d87a2fd) - ✅

⏲️ Total pipeline duration: 17mn56s

Step Result
Build source-s3 docker image for platform(s) linux/amd64
Unit tests
Acceptance tests
Check our base image is used
Code format checks
Validate metadata for source-s3
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@tolik0 tolik0 merged commit 053d08e into master Oct 24, 2023
17 checks passed
@tolik0 tolik0 deleted the tolik0/source-s3/wrong-bucket-config-error branch October 24, 2023 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants