Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector Ops: Fix third party support in get_all_connectors_in_repo #31166

Merged
merged 4 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 55 additions & 17 deletions airbyte-ci/connectors/connector_ops/connector_ops/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

DIFFED_BRANCH = os.environ.get("DIFFED_BRANCH", "origin/master")
OSS_CATALOG_URL = "https://connectors.airbyte.com/files/registries/v0/oss_registry.json"
BASE_AIRBYTE_DOCS_URL = "https://docs.airbyte.com"
CONNECTOR_PATH_PREFIX = "airbyte-integrations/connectors"
SOURCE_CONNECTOR_PATH_PREFIX = CONNECTOR_PATH_PREFIX + "/source-"
DESTINATION_CONNECTOR_PATH_PREFIX = CONNECTOR_PATH_PREFIX + "/destination-"
Expand Down Expand Up @@ -251,7 +252,8 @@ class ConnectorLanguageError(Exception):
class Connector:
"""Utility class to gather metadata about a connector."""

technical_name: str
# The technical name of the connector, e.g. source-google-sheets or third-party/farosai/airbyte-pagerduty-source
_relative_connector_path: str

def _get_type_and_name_from_technical_name(self) -> Tuple[str, str]:
if "-" not in self.technical_name:
Expand All @@ -260,35 +262,53 @@ def _get_type_and_name_from_technical_name(self) -> Tuple[str, str]:
name = self.technical_name[len(_type) + 1 :]
return _type, name

@property
def technical_name(self) -> str:
"""
Return the technical name of the connector from the given _relative_connector_path
e.g. source-google-sheets -> google-sheets or third-party/farosai/airbyte-pagerduty-source -> airbyte-pagerduty-source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e.g. source-google-sheets -> google-sheets or third-party/farosai/airbyte-pagerduty-source -> airbyte-pagerduty-source
e.g. source-google-sheets -> source-google-sheets or third-party/farosai/airbyte-pagerduty-source -> airbyte-pagerduty-source

"""
return self._relative_connector_path.split("/")[-1]

@property
def name(self):
return self._get_type_and_name_from_technical_name()[1]

@property
def connector_type(self) -> str:
return self._get_type_and_name_from_technical_name()[0]
return self.metadata["connectorType"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is outside the scope of this PR but I'm wondering if we should deserialize metadata with the metadata lib to interact with pydantic models instead of interacting with dict.


@property
def is_third_party(self) -> bool:
return THIRD_PARTY_GLOB in self._relative_connector_path

@property
def documentation_directory(self) -> Path:
if self.has_external_documentation:
return None
return Path(f"./docs/integrations/{self.connector_type}s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .docs is one of the reason why airbyte-ci is required to run from the repo root. For more flexibility we shall add a constant like GIT_REPO_ROOT = git.Repo(search_parent_directories=True) and build path from this root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would have love to do this inside this PR but it got a bit big so I broke it out into this branch
#31409

Though it definately raises the question of if we would want to do this way....

The alternative is just having airbyte-ci set the working directory to the repo root 🤔


@property
def documentation_file_path(self) -> Path:
readme_file_name = f"{self.name}.md"
return self.documentation_directory / readme_file_name
def relative_documentation_path_str(self) -> str:
documentation_url = self.metadata["documentationUrl"]
relative_documentation_path = documentation_url.replace(BASE_AIRBYTE_DOCS_URL, "")

# strip leading and trailing slashes
relative_documentation_path = relative_documentation_path.strip("/")

return f"./docs/{relative_documentation_path}"

@property
def inapp_documentation_file_path(self) -> Path:
readme_file_name = f"{self.name}.inapp.md"
return self.documentation_directory / readme_file_name
def documentation_file_path(self) -> Path:
return Path(f"{self.relative_documentation_path_str}.md")

@property
def migration_guide_file_name(self) -> str:
return f"{self.name}-migrations.md"
def inapp_documentation_file_path(self) -> Path:
return Path(f"{self.relative_documentation_path_str}.inapp.md")

@property
def migration_guide_file_path(self) -> Path:
return self.documentation_directory / self.migration_guide_file_name
return Path(f"{self.relative_documentation_path_str}-migrations.md")

@property
def icon_path(self) -> Path:
Expand All @@ -297,7 +317,7 @@ def icon_path(self) -> Path:

@property
def code_directory(self) -> Path:
return Path(f"./airbyte-integrations/connectors/{self.technical_name}")
return Path(f"./{CONNECTOR_PATH_PREFIX}/{self._relative_connector_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above: build the path from the GIT_REPO_ROOT


@property
def metadata_file_path(self) -> Path:
Expand Down Expand Up @@ -529,6 +549,26 @@ def get_changed_connectors(
return {Connector(get_connector_name_from_path(changed_file)) for changed_file in changed_source_connector_files}


def _get_relative_connector_folder_name_from_metadata_path(metadata_file_path: str) -> str:
"""Get the relative connector folder name from the metadata file path.

Args:
metadata_file_path (Path): Path to the metadata file.

Returns:
str: The relative connector folder name.
"""
# remove CONNECTOR_PATH_PREFIX and anything before
metadata_file_path = metadata_file_path.split(CONNECTOR_PATH_PREFIX)[-1]

# remove metadata.yaml
metadata_file_path = metadata_file_path.replace(METADATA_FILE_NAME, "")

# remove leading and trailing slashes
metadata_file_path = metadata_file_path.strip("/")
return metadata_file_path


def get_all_connectors_in_repo() -> Set[Connector]:
"""Retrieve a set of all Connectors in the repo.
We globe the connectors folder for metadata.yaml files and construct Connectors from the directory name.
Expand All @@ -540,11 +580,9 @@ def get_all_connectors_in_repo() -> Set[Connector]:
repo_path = repo.working_tree_dir

return {
Connector(Path(metadata_file).parent.name)
for metadata_file in glob(f"{repo_path}/airbyte-integrations/connectors/**/metadata.yaml", recursive=True)
# HACK: The Connector util is not good at fetching metadata for third party connectors.
# We want to avoid picking a connector that does not have metadata.
if SCAFFOLD_CONNECTOR_GLOB not in metadata_file and THIRD_PARTY_GLOB not in metadata_file
Connector(_get_relative_connector_folder_name_from_metadata_path(metadata_file))
for metadata_file in glob(f"{repo_path}/{CONNECTOR_PATH_PREFIX}/**/metadata.yaml", recursive=True)
if SCAFFOLD_CONNECTOR_GLOB not in metadata_file
}


Expand Down
2 changes: 1 addition & 1 deletion airbyte-ci/connectors/connector_ops/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "connector_ops"
version = "0.2.5"
version = "0.2.6"
description = "Packaged maintained by the connector operations team to perform CI for connectors"
authors = ["Airbyte <[email protected]>"]

Expand Down
6 changes: 4 additions & 2 deletions airbyte-ci/connectors/connector_ops/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,5 +199,7 @@ def test_get_all_gradle_dependencies(with_test_dependencies):
def test_get_all_connectors_in_repo():
all_connectors = utils.get_all_connectors_in_repo()
assert len(all_connectors) > 0
assert all([isinstance(connector, utils.Connector) for connector in all_connectors])
assert all([connector.metadata is not None for connector in all_connectors])
for connector in all_connectors:
assert isinstance(connector, utils.Connector)
assert connector.metadata is not None
assert connector.documentation_file_path.exists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would fail if a connector does not have a doc. Do we want this lib test to fail in this scenario? At the moment missing doc would lead to a failing QA check. What do you think about asserting the documention path structure instead of checking its existence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I updated to check if a connector has an airbyte.com documentation url before running this check
#31324

2 changes: 1 addition & 1 deletion airbyte-ci/connectors/pipelines/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docusaurus/redirects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
to: /operator-guides/upgrading-airbyte
- from: /catalog
to: /understanding-airbyte/airbyte-protocol
- from: /integrations/sources/appstore
to: /integrations/sources/appstore-singer
- from: /integrations/sources/appstore-singer
to: /integrations/sources/appstore
- from:
- /project-overview/security
- /operator-guides/securing-airbyte
Expand Down