Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][Gitlab] Added support for gitlab member ingestion #767

Merged
merged 67 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
d3dec00
Added support for gitlab user ingestion
mk-armah Jul 3, 2024
b88d530
edited changelog
mk-armah Jul 3, 2024
cfeddb0
edited changelog
mk-armah Jul 3, 2024
8a48e1b
Merge branch 'main' into improvement/gitlab
mk-armah Jul 12, 2024
81905c5
updated group -> member relationship
mk-armah Jul 12, 2024
58665c6
Merge branch 'main' into improvement/gitlab
mk-armah Jul 12, 2024
2917e45
Lint
mk-armah Jul 12, 2024
9820b9a
Added error handling to get_all_group_members
mk-armah Jul 12, 2024
24b3551
Added error handling to get_all_group_members
mk-armah Jul 12, 2024
7d324db
Merge branch 'main' into improvement/gitlab
mk-armah Jul 25, 2024
12ba800
updated blueprints
mk-armah Jul 25, 2024
191292b
Merge branch 'improvement/gitlab' of github.com:port-labs/ocean into …
mk-armah Jul 25, 2024
9f9b706
related service to groups
mk-armah Jul 26, 2024
9df3202
lint
mk-armah Jul 26, 2024
5daceeb
fixed lint issues
mk-armah Jul 26, 2024
4d98b1d
change default behavior for filterBots flag
mk-armah Jul 26, 2024
1b69d29
made the filterBots and publicEmailVisibility flags unset by default,…
mk-armah Jul 26, 2024
2a81da0
Merge branch 'main' into improvement/gitlab
mk-armah Jul 31, 2024
35ee321
Merge branch 'main' into improvement/gitlab
mk-armah Jul 31, 2024
ff737e6
gitlab member webhook development
mk-armah Aug 1, 2024
00617e7
Merge branch 'main' into improvement/gitlab
mk-armah Aug 2, 2024
47e2e24
support for member webhook
mk-armah Aug 2, 2024
5ce70f3
Merge branch 'improvement/gitlab' of github.com:port-labs/ocean into …
mk-armah Aug 2, 2024
e4c1238
Merge branch 'main' into improvement/gitlab
Tankilevitch Aug 4, 2024
3a8e14a
Merge branch 'main' into improvement/gitlab
mk-armah Aug 7, 2024
4457ba3
updated scripts for syncing members
mk-armah Aug 7, 2024
952fac7
Merge branch 'improvement/gitlab' of github.com:port-labs/ocean into …
mk-armah Aug 7, 2024
9721fd9
Update CHANGELOG.md
mk-armah Aug 7, 2024
62a11f8
removed groups and members from spec.yaml
mk-armah Aug 7, 2024
33e9b19
refactored fetch group members function in members resync
mk-armah Aug 7, 2024
d5b4ea1
renamed fetch_group_members function to process_group_members
mk-armah Aug 8, 2024
4882b3f
Merge branch 'main' into improvement/gitlab
mk-armah Aug 11, 2024
7d35201
Merge branch 'main' into improvement/gitlab
mk-armah Aug 12, 2024
b15ba21
bumped ocean version
mk-armah Aug 12, 2024
cad6210
Merge branch 'main' into improvement/gitlab
mk-armah Aug 21, 2024
1458fac
updated resync groups with members
mk-armah Aug 21, 2024
56f7b44
updated resync with groups
mk-armah Aug 21, 2024
b9f7e25
merged main
mk-armah Aug 21, 2024
c534466
removed group hook
mk-armah Aug 21, 2024
386b318
Merge branch 'main' into improvement/gitlab
mk-armah Aug 30, 2024
6ae7156
lint
mk-armah Aug 30, 2024
0a91673
Merge branch 'improvement/gitlab' of https://github.com/port-labs/oce…
mk-armah Aug 30, 2024
44c3b7f
rephrased comments
mk-armah Sep 2, 2024
fe2e7a2
updated group webhook to cater for groupswithmembers kind
mk-armah Sep 5, 2024
68354d1
Merge branch 'main' into improvement/gitlab
mk-armah Nov 6, 2024
0a279ae
remove debug logs
mk-armah Nov 6, 2024
99e3724
test fix
mk-armah Nov 6, 2024
82bfc94
updated webhook logic
mk-armah Nov 8, 2024
c81ab10
revert get_group function, stick with optional response
mk-armah Nov 8, 2024
bb7f146
added support for project members
mk-armah Nov 11, 2024
0056a27
checking for bot members in username works for both webhooks and requ…
mk-armah Nov 11, 2024
be3dc5a
clean unused functions
mk-armah Nov 11, 2024
c1289c7
refactored codebase against DRY
mk-armah Nov 11, 2024
2073632
Merge branch 'main' into improvement/gitlab
mk-armah Nov 11, 2024
56b031c
added tests
mk-armah Nov 12, 2024
936f781
Merge branch 'improvement/gitlab' of https://github.com/port-labs/oce…
mk-armah Nov 12, 2024
7f98500
Update integrations/gitlab/gitlab_integration/git_integration.py
mk-armah Nov 13, 2024
8df02ae
addressed comments
mk-armah Nov 13, 2024
48329c0
all enrichments are performed on object level directly
mk-armah Nov 13, 2024
986e9fa
remove unnecessary comment
mk-armah Nov 13, 2024
5b0d7f9
Merge branch 'main' into improvement/gitlab
mk-armah Nov 13, 2024
57ad257
removed public email querying
mk-armah Nov 13, 2024
632d8c2
Merge branch 'improvement/gitlab' of https://github.com/port-labs/oce…
mk-armah Nov 13, 2024
56aaaef
lint
mk-armah Nov 13, 2024
f42ee30
control resync batch size to avoid hitting rate limits
mk-armah Nov 14, 2024
62db938
Added error handling for group not found
mk-armah Nov 14, 2024
4718261
Update integrations/gitlab/CHANGELOG.md
Tankilevitch Nov 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions integrations/gitlab/.port/resources/blueprints.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,99 @@
"calculationProperties": {},
"aggregationProperties": {},
"relations": {}
},
{
"identifier": "member",
"title": "Member",
"icon": "GitLab",
"schema": {
"properties": {
"state": {
"title": "State",
"type": "string",
"icon": "GitLab",
"description": "The current state of the GitLab item (e.g., open, closed)."
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are talking about member not gitlab item, please make sure its readable and straight forward for the users

"locked": {
"type": "string",
"title": "Locked",
"icon": "GitLab",
"description": "Indicates if the GitLab item is locked."
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this locked?

"link": {
"icon": "Link",
"type": "string",
"title": "Link",
"format": "url",
"description": "URL link to the GitLab item."
},
"email": {
"type": "string",
"title": "Email",
"description": "GitLab primary email address.",
"icon": "User",
"format": "user"
},
"publicEmail": {
"type": "string",
"title": "Public Email",
"description": "User's GitLab public email.",
"icon": "User",
"format": "user"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what case the user will have a public email? I think we might want to remove it by default

}
},
"required": []
},
"mirrorProperties": {},
"calculationProperties": {},
"aggregationProperties": {},
"relations": {}
Tankilevitch marked this conversation as resolved.
Show resolved Hide resolved
},
{
"identifier": "gitlabGroup",
"title": "Group",
"icon": "GitLab",
"schema": {
"properties": {
"visibility": {
"icon": "Lock",
"title": "Visibility",
"type": "string",
"enum": [
"public",
"internal",
"private"
],
"enumColors": {
"public": "red",
"internal": "yellow",
"private": "green"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add description

},
"url": {
"title": "URL",
"format": "url",
"type": "string",
"icon": "Link"
},
"description": {
"title": "Description",
"type": "string",
"icon": "BlankPage"
}
},
"required": []
},
"mirrorProperties": {},
"calculationProperties": {},
"aggregationProperties": {},
"relations": {
"members": {
"title": "Members",
"target": "member",
"required": false,
"many": true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relationship is group -> member

}
}
]
20 changes: 20 additions & 0 deletions integrations/gitlab/.port/resources/port-app-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,23 @@ resources:
readme: file://README.md
description: .description
language: .__languages | to_entries | max_by(.value) | .key
- kind: member
selector:
query: 'true'
publicEmailVisibility: 'false'
filterBots: 'false'
port:
entity:
mappings:
identifier: .username
title: .name
blueprint: '"member"'
properties:
state: .state
locked: .locked
link: .web_url
email: .email
publicEmail: .__public_email
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove by default

relations:
gitlabGroup: '[.__groups[].full_path]'
createdBy: .created_by.username
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the blueprint relation you only have gitlabGroup, while in your mapping you have both createdBy and gitlabGroup. I am not sure it is of interest for users who created the user, lets remove it. Let me know if you think otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is createdBy we don't have that kind of relation please remove

2 changes: 2 additions & 0 deletions integrations/gitlab/.port/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ features:
section: Git Providers
resources:
- kind: projects
- kind: members
- kind: groups
configurations:
- name: tokenMapping
required: true
Expand Down
8 changes: 8 additions & 0 deletions integrations/gitlab/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

<!-- towncrier release notes start -->

0.1.92 (2024-07-12)
===================

### Features

- Added support for gitlab member ingestion (PORT-7708)


0.1.91 (2024-07-10)
===================

Expand Down
10 changes: 9 additions & 1 deletion integrations/gitlab/gitlab_integration/core/async_fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@
import gitlab.exceptions
from gitlab import GitlabList
from gitlab.base import RESTObject, RESTObjectList
from gitlab.v4.objects import Project, ProjectPipelineJob, ProjectPipeline, Issue, Group
from gitlab.v4.objects import (
Project,
ProjectPipelineJob,
ProjectPipeline,
Issue,
Group,
User,
)
from loguru import logger

from port_ocean.core.models import Entity
Expand All @@ -28,6 +35,7 @@ async def fetch_single(
Issue,
Project,
Group,
User,
],
],
*args,
Expand Down
21 changes: 19 additions & 2 deletions integrations/gitlab/gitlab_integration/git_integration.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Dict, Any, Tuple, List, Type
from typing import Dict, Any, Tuple, List, Type, Literal

from gitlab.v4.objects import Project
from loguru import logger
Expand Down Expand Up @@ -122,6 +122,18 @@ class GitlabResourceConfig(ResourceConfig):
selector: GitlabSelector


class GitlabMembersResourceConfig(ResourceConfig):
class MembersSelector(Selector):
public_email_visibility: bool | None = Field(
alias="publicEmailVisibility",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrich_with_public_email

default=False,
description="If set to true, the integration will enrich members with public email field. Default value is false",
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize the class outside of GitlabMembersResourceConfig that way we would be able to re-use it

kind: Literal["member"]
selector: MembersSelector


class GitlabPortAppConfig(PortAppConfig):
spec_path: str | List[str] = Field(alias="specPath", default="**/port.yml")
branch: str | None
Expand All @@ -131,7 +143,12 @@ class GitlabPortAppConfig(PortAppConfig):
project_visibility_filter: str | None = Field(
alias="projectVisibilityFilter", default=None
)
resources: list[GitlabResourceConfig] = Field(default_factory=list) # type: ignore
filter_bots: bool | None = Field(
alias="filterBots",
default=True,
description="If set to true, bots will be filtered out from the members list. Default value is true",
)
resources: list[GitlabMembersResourceConfig | GitlabResourceConfig] = Field(default_factory=list) # type: ignore


def _get_project_from_cache(project_id: int) -> Project | None:
Expand Down
87 changes: 84 additions & 3 deletions integrations/gitlab/gitlab_integration/gitlab_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
Issue,
Group,
ProjectPipeline,
User,
GroupMember,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both, depends on the context

GroupMergeRequest,
ProjectPipelineJob,
)
Expand All @@ -26,11 +28,13 @@
from port_ocean.core.models import Entity

PROJECTS_CACHE_KEY = "__cache_all_projects"
GROUPS_CACHE_KEY = "__cache_all_groups"
MEMBERS_CACHE_KEY = "__cache_all_members"
Tankilevitch marked this conversation as resolved.
Show resolved Hide resolved

MAX_CONCURRENT_TASKS = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? how can we actually validate and handle it? we want to be able to handle the rate limits most of third parties return headers, but we don't use gitlab api straightforward but rather through the client.

Here are some notes on the rate limits

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just found this one - https://python-gitlab.readthedocs.io/en/stable/api-usage-advanced.html#rate-limits which means that gitlab client handles this one for us, so we are good 👍


if TYPE_CHECKING:
from gitlab_integration.git_integration import (
GitlabPortAppConfig,
)
from gitlab_integration.git_integration import GitlabPortAppConfig


class GitlabService:
Expand Down Expand Up @@ -322,6 +326,15 @@ async def get_group(self, group_id: int) -> Group | None:

async def get_all_groups(self) -> typing.AsyncIterator[List[Group]]:
logger.info("fetching all groups for the token")

cached_groups = event.attributes.setdefault(GROUPS_CACHE_KEY, {}).setdefault(
self.gitlab_client.private_token, {}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not use this cache, we already have a prebuilt one in ocean core


if cached_groups:
yield cached_groups.values()
return

async for groups_batch in AsyncFetcher.fetch_batch(
fetch_func=self.gitlab_client.groups.list,
validation_func=self.should_run_for_group,
Expand All @@ -333,6 +346,7 @@ async def get_all_groups(self) -> typing.AsyncIterator[List[Group]]:
logger.info(
f"Queried {len(groups)} groups {[group.path for group in groups]}"
)
cached_groups.update({group.id: group for group in groups})
yield groups

async def get_all_projects(self) -> typing.AsyncIterator[List[Project]]:
Expand Down Expand Up @@ -533,6 +547,73 @@ async def get_all_issues(self, group: Group) -> typing.AsyncIterator[List[Issue]
issues: List[Issue] = typing.cast(List[Issue], issues_batch)
yield issues

async def get_all_group_members(
self, group: Group
) -> typing.AsyncIterator[List[GroupMember]]:

try:
port_app_config: GitlabPortAppConfig = typing.cast(
"GitlabPortAppConfig", event.port_app_config
)
filter_bots = port_app_config.filter_bots

def skip_validation(_: User):
return True

def should_run_for_member(member: User):
return not member.username.__contains__("bot")

validation_func = should_run_for_member if filter_bots else skip_validation

logger.info(f"Fetching all members of group {group.name}")
async for users_batch in AsyncFetcher.fetch_batch(
fetch_func=group.members.list,
validation_func=validation_func,
pagination="offset",
order_by="id",
sort="asc",
):
members: List[GroupMember] = typing.cast(List[GroupMember], users_batch)
logger.info(
f"Queried {len(members)} members {[user.username for user in members]} from {group.name}"
)
yield members
except Exception as e:
logger.error(f"Failed to get members for group={group.name}. error={e}")
return

async def enrich_group_with_members(self, group: Group) -> dict[str, Any]:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant line

group_members = [
member
async for members in self.get_all_group_members(group)
for member in members
]
group_dict: dict[str, Any] = group.asdict()
group_dict.update(
{
"__members": [
{"id": group_member.id, "username": group_member.username}
for group_member in group_members
]
}
)
return group_dict

async def enrich_member_with_public_email(self, member) -> dict[str, Any]:
user: User = await self.get_user(member.id)
member_dict: dict[str, Any] = member.asdict()
member_dict.update({"__public_email": user.public_email})
return member_dict

async def get_user(self, user_id: str) -> User:
logger.info(f"fetching user {user_id}")
user_response = await AsyncFetcher.fetch_single(
self.gitlab_client.users.get, user_id
)
user: User = typing.cast(User, user_response)
return user

def get_entities_diff(
self,
project: Project,
Expand Down
27 changes: 25 additions & 2 deletions integrations/gitlab/gitlab_integration/ocean.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
WebhookMappingConfig,
)
from gitlab_integration.events.setup import setup_application
from gitlab_integration.git_integration import GitlabResourceConfig
from gitlab_integration.git_integration import (
GitlabResourceConfig,
GitlabMembersResourceConfig,
)
from gitlab_integration.utils import ObjectKind, get_cached_all_services
from port_ocean.context.event import event
from port_ocean.context.ocean import ocean
Expand Down Expand Up @@ -108,7 +111,9 @@ async def on_start() -> None:
async def resync_groups(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
for service in get_cached_all_services():
async for groups_batch in service.get_all_groups():
yield [group.asdict() for group in groups_batch]
tasks = [service.enrich_group_with_members(group) for group in groups_batch]
enriched_groups = await asyncio.gather(*tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we do such things, we should add it under a feature flag, as we wouldn't want to just do all those extra requests to get the members if it wasn't actually intended by the user.

Therefor I suggest leaving the group kind as it was.
And add the following kinds.

groupMembers - which will do this exact logic that you have implemented in resync_groups - this kind will allow users to decide whether they want to export the group with or without members.

member / user - which will bring the information about a user. such as the email etc..

yield enriched_groups
Tankilevitch marked this conversation as resolved.
Show resolved Hide resolved


@ocean.on_resync(ObjectKind.PROJECT)
Expand Down Expand Up @@ -207,3 +212,21 @@ async def resync_pipelines(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
{**pipeline.asdict(), "__project": project.asdict()}
for pipeline in pipelines_batch
]


@ocean.on_resync(ObjectKind.MEMBER)
async def resync_members(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
gitlab_resource_config: GitlabMembersResourceConfig = typing.cast(
GitlabMembersResourceConfig, event.resource_config
)
selector = gitlab_resource_config.selector
for service in get_cached_all_services():
for group in service.get_root_groups():
async for members in service.get_all_group_members(group):
if selector.public_email_visibility:
yield [
await service.enrich_member_with_public_email(member)
for member in members
]
else:
yield [member.asdict() for member in members]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when I have the same user in multiple groups? how would that behave? will I have to perform repeated upserts?

maybe in this method ^ we should use members = group.members_all.list(get_all=True) which will return all and reduce the amount of extra requests that we will have to perform?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my reason for using members as opposed to member_all was because the members_all request returns not the the user in that group but also all inherited and invited members.

Thereby resulting in all groups the same members since the members most commonly belong to the parent group - details

aside this, the behavior and how we will retrieve members does not differ from member and members_all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i understand what you are saying, so if we use the members_all only for the members kind wouldn't it reduce the amount of requests by a lot? as we will only have to bring the members for the root groups rather than the subgroups as well.

also just making sure that you have tested subgroups as well. please confirm

Copy link
Member Author

@mk-armah mk-armah Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling /members on root groups returns the same results as /members/all, calling /members on subgroups comes with less data than members/all, due to the exclusion of invited and inherited members, for root groups, concept of inherited members does not apply, all members of the root groups are returned regardless how we choose to call them.

I believe the optimization here was getting members from root groups instead of all groups (including subgroups), which would have taken more time.

1 change: 1 addition & 0 deletions integrations/gitlab/gitlab_integration/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,4 @@ class ObjectKind:
PIPELINE = "pipeline"
PROJECT = "project"
FOLDER = "folder"
MEMBER = "member"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seems used anymore

2 changes: 1 addition & 1 deletion integrations/gitlab/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "gitlab"
version = "0.1.91"
version = "0.1.92"
description = "Gitlab integration for Port using Port-Ocean Framework"
authors = ["Yair Siman-Tov <[email protected]>"]

Expand Down