Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Slack migration to low code #35477

Merged
merged 95 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
bd8a4c8
Revert "updated Prerequisites"
midavadim Sep 21, 2023
56daf02
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Sep 27, 2023
8ad4061
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 11, 2023
2cab1ff
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 16, 2023
6aae53d
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 19, 2023
e45624c
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 23, 2023
c923aa9
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 23, 2023
be763e2
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 26, 2023
4883642
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Oct 26, 2023
3e2c793
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 1, 2023
d948037
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 6, 2023
4736e76
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 6, 2023
b4ef214
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 13, 2023
c0376ab
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 14, 2023
ed7dbd7
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 15, 2023
703880f
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 17, 2023
a22e333
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 17, 2023
cb00fc2
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 17, 2023
854f87d
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 20, 2023
c68a752
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Nov 24, 2023
d13e69f
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Dec 1, 2023
9795a7d
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Dec 12, 2023
8358894
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Dec 14, 2023
0b8ee3b
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 5, 2024
7d55bad
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 8, 2024
9fa8a29
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 8, 2024
7cab07a
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 9, 2024
85e72ed
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 15, 2024
bdde638
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 25, 2024
0031711
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Jan 30, 2024
cd2c611
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Feb 7, 2024
8d23ce2
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Feb 12, 2024
162b096
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Feb 15, 2024
d94c4dd
Merge branch 'master' of https://github.com/airbytehq/airbyte
midavadim Feb 20, 2024
d2083b9
low code migration
midavadim Feb 20, 2024
602b92f
added join_channels and channel_filter options
midavadim Feb 20, 2024
e84a1c2
added threads
midavadim Feb 20, 2024
53718a1
inclusive params and lookback_window
midavadim Feb 20, 2024
e7a034f
use_lookback_window handle
midavadim Feb 20, 2024
3788cc2
refactoring
midavadim Feb 21, 2024
44ef205
refactoring
midavadim Feb 21, 2024
21a2e82
added channel_id to threads
midavadim Feb 26, 2024
175970d
added oauth2 support, fixed expected records, added error handler
midavadim Feb 27, 2024
c45a766
cleanup
midavadim Feb 27, 2024
7348e7f
updated auth in manifest, removed custom component
darynaishchenko Mar 13, 2024
563daa1
added selective auth, updated streams impelemntation
darynaishchenko Mar 19, 2024
7192ba2
updated expected records
darynaishchenko Mar 19, 2024
af74589
updated components
darynaishchenko Mar 19, 2024
3d48905
added migration for legacy config
darynaishchenko Mar 19, 2024
320a675
updated unittests
darynaishchenko Mar 19, 2024
53e93d2
added dependencies
darynaishchenko Mar 19, 2024
f622dcb
delete unused fiels, added .coveragerc
darynaishchenko Mar 19, 2024
e874d06
updated source.py and run.py
darynaishchenko Mar 19, 2024
5c36ec9
updated abnormal_state
darynaishchenko Mar 19, 2024
a5b26ca
added request param for channels stream
darynaishchenko Mar 19, 2024
42a499f
updated dependencies
darynaishchenko Mar 19, 2024
b567508
bump version, added migration docs
darynaishchenko Mar 19, 2024
af8d4e4
format fix
darynaishchenko Mar 19, 2024
3ad394f
Merge branch 'master' into midavadim/source-slack-to-low-code
darynaishchenko Mar 20, 2024
3d571a3
updated to latesl cdk
darynaishchenko Mar 20, 2024
5cebd2d
updated stream slices in custom partition router
darynaishchenko Mar 20, 2024
a898a70
added header for migration guide
darynaishchenko Mar 20, 2024
0c24c5f
updated tags
darynaishchenko Mar 20, 2024
dfd3eac
moved join channels logic to custom retriever for channels stream
darynaishchenko Mar 21, 2024
d0f1b69
Merge branch 'master' into midavadim/source-slack-to-low-code
darynaishchenko Mar 21, 2024
42f8309
refactor code
darynaishchenko Mar 21, 2024
bf4cb5d
fix channel messages transformation
darynaishchenko Mar 21, 2024
b95bbd8
updated expected records
darynaishchenko Mar 22, 2024
47e850c
Merge branch 'master' into midavadim/source-slack-to-low-code
darynaishchenko Mar 22, 2024
2c94e4e
updated cases for check test
darynaishchenko Mar 25, 2024
b89e7f4
updated channel messages and threads streams with correct float_ts value
darynaishchenko Mar 25, 2024
53c3207
added validation for request params in threads stream
darynaishchenko Mar 25, 2024
935e818
updated tests
darynaishchenko Mar 25, 2024
29979bf
Merge branch 'master' into midavadim/source-slack-to-low-code
alafanechere Mar 26, 2024
a1ae4f4
fix channels filter
darynaishchenko Apr 1, 2024
87b1544
poetry update
darynaishchenko Apr 1, 2024
92c5f05
fix requests in threads stream
darynaishchenko Apr 1, 2024
82e288f
format fix
darynaishchenko Apr 1, 2024
a36d6cd
reverted threads stream to python implementation
darynaishchenko Apr 2, 2024
2537e03
updated unit test
darynaishchenko Apr 2, 2024
6ce6404
Merge branch 'master' into midavadim/source-slack-to-low-code
darynaishchenko Apr 2, 2024
f23f55e
update poetry.lock
darynaishchenko Apr 2, 2024
37de0b8
refactor code+updated expected records
darynaishchenko Apr 2, 2024
0e92f78
Merge branch 'master' into midavadim/source-slack-to-low-code
alafanechere Apr 3, 2024
8f967ac
format fix
darynaishchenko Apr 4, 2024
9b48589
updated migration guide
darynaishchenko Apr 4, 2024
f1a317b
add lookback for channel_messages_stream and 503 handling
darynaishchenko Apr 5, 2024
6fadec0
add 500 handling
darynaishchenko Apr 9, 2024
f67b342
updated threads stream parent streams
darynaishchenko Apr 9, 2024
47c6400
updated unit tests
darynaishchenko Apr 9, 2024
3163df5
updated breakingChanges message
darynaishchenko Apr 10, 2024
e3ee23b
updated upgradeDeadline
darynaishchenko Apr 11, 2024
ca452f4
updated docs
darynaishchenko Apr 12, 2024
af613b2
updated changelog
darynaishchenko Apr 15, 2024
deb7918
updated poetry.lock
darynaishchenko Apr 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions airbyte-integrations/connectors/source-slack/.coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
source_slack/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ acceptance_tests:
- spec_path: "source_slack/spec.json"
backward_compatibility_tests_config:
# edited `min`/`max` > `minimum`/`maximum` for `lookback_window` field
disable_for_version: "0.1.26"
#disable_for_version: "0.1.26"
# slight changes: removed doc url, added new null oauth param
disable_for_version: "0.3.10"
connection:
tests:
- config_path: "secrets/config.json"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,49 @@
{
"type": "STREAM",
"stream": {
"stream_state": { "float_ts": 7270247822 },
"stream_descriptor": { "name": "channel_messages" }
"stream_descriptor": {
"name": "channel_messages"
},
"stream_state": {
"states": [
{
"partition": {
"channel_id": "C04LTCM2Y56",
"parent_slice": {}
},
"cursor": {
"float_ts": "2534945416"
}
},
{
"partition": {
"channel": "C04KX3KEZ54",
"parent_slice": {}
},
"cursor": {
"float_ts": "2534945416"
}
},
{
"partition": {
"channel": "C04L3M4PTJ6",
"parent_slice": {}
},
"cursor": {
"float_ts": "2534945416"
}
},
{
"partition": {
"channel": "C04LTCM2Y56",
"parent_slice": {}
},
"cursor": {
"float_ts": "2534945416"
}
}
]
}
}
}
]

Large diffs are not rendered by default.

17 changes: 15 additions & 2 deletions airbyte-integrations/connectors/source-slack/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ data:
connectorSubtype: api
connectorType: source
definitionId: c2281cee-86f9-4a86-bb48-d23286b4c7bd
dockerImageTag: 0.4.1
dockerImageTag: 1.0.0
dockerRepository: airbyte/source-slack
documentationUrl: https://docs.airbyte.com/integrations/sources/slack
githubIssueLabel: source-slack
Expand All @@ -27,6 +27,19 @@ data:
oss:
enabled: true
releaseStage: generally_available
releases:
breakingChanges:
1.0.0:
message:
The source Slack connector is being migrated from the Python CDK to our declarative low-code CDK.
Due to changes in the handling of state format for incremental substreams, this migration constitutes a breaking change for the channel_messages stream.
Users will need to reset source configuration, refresh the source schema and reset the channel_messages stream after upgrading.
For more information, see our migration documentation for source Slack.
upgradeDeadline: "2024-04-29"
scopedImpact:
- scopeType: stream
impactedScopes:
- "channel_messages"
suggestedStreams:
streams:
- users
Expand All @@ -37,5 +50,5 @@ data:
supportLevel: certified
tags:
- language:python
- cdk:python
- cdk:low-code
metadataSpecVersion: "1.0"
138 changes: 76 additions & 62 deletions airbyte-integrations/connectors/source-slack/poetry.lock

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion airbyte-integrations/connectors/source-slack/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ requires = [ "poetry-core>=1.0.0",]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
version = "0.4.1"
version = "1.0.0"
name = "source-slack"
description = "Source implementation for Slack."
authors = [ "Airbyte <[email protected]>",]
Expand All @@ -19,6 +19,7 @@ include = "source_slack"
python = "^3.9,<3.12"
pendulum = "==2.1.2"
airbyte-cdk = "^0"
freezegun = "^1.4.0"

[tool.poetry.scripts]
source-slack = "source_slack.run:run"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.

from dataclasses import dataclass
from typing import List

import requests
from airbyte_cdk.sources.declarative.extractors import DpathExtractor
from airbyte_cdk.sources.declarative.types import Record


@dataclass
class ChannelMembersExtractor(DpathExtractor):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should properly handle errors from slack. like it was before:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give me an example that is not covered in this handling errors implementation? Now it works the same as in #35477 (comment), but instead of AirbyteTracedException it raises airbyte_cdk.sources.declarative.exceptions.ReadException.

"""
Transform response from list of strings to list dicts:
from: ['aa', 'bb']
to: [{'member_id': 'aa'}, {{'member_id': 'bb'}]
"""

def extract_records(self, response: requests.Response) -> List[Record]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def extract_records(self, response: requests.Response) -> List[Record]:
@dataclass
class SlackDpathExtractor(DpathExtractor):
"""
Handle error from Slack API:
{
"body": "{\"ok\":false,\"error\":\"invalid_auth\"}",
"status": "200"
}
"""
def extract_records(self, response: requests.Response) -> List[Record]:
response_body = self.decoder.decode(response)
if not response_body.get('ok'):
error_message = response_body.get('error')
message = f"Request failed with error: {error_message}"
if 'invalid_auth' in error_message:
raise AirbyteTracedException(
message='Authentication has failed, please update your credentials',
internal_message=message,
failure_type=FailureType.config_error,
)
else:
raise AirbyteTracedException(
message=message,
internal_message=message,
failure_type=FailureType.system_error,
)
return super().extract_records(response)

records = super().extract_records(response)
return [{"member_id": record} for record in records]
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.

import logging
from functools import partial
from typing import Any, Iterable, List, Mapping, Optional

import requests
from airbyte_cdk.models import SyncMode
from airbyte_cdk.sources.declarative.partition_routers import SinglePartitionRouter
from airbyte_cdk.sources.declarative.retrievers import SimpleRetriever
from airbyte_cdk.sources.declarative.types import Record, StreamSlice
from airbyte_cdk.sources.streams.core import StreamData
from airbyte_cdk.sources.streams.http import HttpStream
from airbyte_cdk.sources.streams.http.auth import TokenAuthenticator

LOGGER = logging.getLogger("airbyte_logger")


class JoinChannelsStream(HttpStream):
"""
This class is a special stream which joins channels because the Slack API only returns messages from channels this bot is in.
Its responses should only be logged for debugging reasons, not read as records.
"""

url_base = "https://slack.com/api/"
http_method = "POST"
primary_key = "id"

def __init__(self, channel_filter: List[str] = None, **kwargs):
self.channel_filter = channel_filter or []
super().__init__(**kwargs)

def path(self, **kwargs) -> str:
return "conversations.join"

def parse_response(self, response: requests.Response, stream_slice: Mapping[str, Any] = None, **kwargs) -> Iterable:
"""
Override to simply indicate that the specific channel was joined successfully.
This method should not return any data, but should return an empty iterable.
"""
is_ok = response.json().get("ok", False)
if is_ok:
self.logger.info(f"Successfully joined channel: {stream_slice['channel_name']}")
else:
self.logger.info(f"Unable to joined channel: {stream_slice['channel_name']}. Reason: {response.json()}")
return []

def request_body_json(self, stream_slice: Mapping = None, **kwargs) -> Optional[Mapping]:
if stream_slice:
return {"channel": stream_slice.get("channel")}

def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
"""
The pagination is not applicable to this Service Stream.
"""
return None


class ChannelsRetriever(SimpleRetriever):
def __post_init__(self, parameters: Mapping[str, Any]):
super().__post_init__(parameters)
self.stream_slicer = SinglePartitionRouter(parameters={})
self.record_selector.transformations = []

def should_join_to_channel(self, config: Mapping[str, Any], record: Record) -> bool:
"""
The `is_member` property indicates whether the API Bot is already assigned / joined to the channel.
https://api.slack.com/types/conversation#booleans
"""
return config["join_channels"] and not record.get("is_member")

def make_join_channel_slice(self, channel: Mapping[str, Any]) -> Mapping[str, Any]:
channel_id: str = channel.get("id")
channel_name: str = channel.get("name")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose channel_name is not needed for join channel ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it used for logging

LOGGER.info(f"Joining Slack Channel: `{channel_name}`")
bazarnov marked this conversation as resolved.
Show resolved Hide resolved
return {"channel": channel_id, "channel_name": channel_name}

def join_channels_stream(self, config) -> JoinChannelsStream:
token = config["credentials"].get("api_token") or config["credentials"].get("access_token")
authenticator = TokenAuthenticator(token)
channel_filter = config["channel_filter"]
return JoinChannelsStream(authenticator=authenticator, channel_filter=channel_filter)

def join_channel(self, config: Mapping[str, Any], record: Mapping[str, Any]):
list(
self.join_channels_stream(config).read_records(
sync_mode=SyncMode.full_refresh,
stream_slice=self.make_join_channel_slice(record),
)
)

def read_records(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "read_records" method was chosen for "join channel" injection?

Main disadvantage here is that we had to copy full content of original "read_records" instead of just calling super().read_records(....) and then adding our customization.

maybe better point of injection could be parse_response or transform functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joining to channel logic is not transformation or parse response. We should join to channel once we receive a channel that was not joined. As it was implemented in python code. We perform all api requests in a one place and have proper handling in case of fail. As this is not logic of parse response or transformation, in feature we can remove this custom component when similar logic appear in low-code cdk.

self,
records_schema: Mapping[str, Any],
stream_slice: Optional[StreamSlice] = None,
) -> Iterable[StreamData]:
_slice = stream_slice or StreamSlice(partition={}, cursor_slice={}) # None-check

self._paginator.reset()

most_recent_record_from_slice = None
record_generator = partial(
self._parse_records,
stream_state=self.state or {},
stream_slice=_slice,
records_schema=records_schema,
)

for stream_data in self._read_pages(record_generator, self.state, _slice):
# joining channel logic
if self.should_join_to_channel(self.config, stream_data):
self.join_channel(self.config, stream_data)

current_record = self._extract_record(stream_data, _slice)
if self.cursor and current_record:
self.cursor.observe(_slice, current_record)

most_recent_record_from_slice = self._get_most_recent_record(most_recent_record_from_slice, current_record, _slice)
yield stream_data

if self.cursor:
self.cursor.observe(_slice, most_recent_record_from_slice)
return
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.

import logging
from typing import Any, List, Mapping

from airbyte_cdk import AirbyteEntrypoint
from airbyte_cdk.config_observation import create_connector_config_control_message
from airbyte_cdk.sources.message import InMemoryMessageRepository, MessageRepository
from source_slack import SourceSlack

logger = logging.getLogger("airbyte_logger")


class MigrateLegacyConfig:
message_repository: MessageRepository = InMemoryMessageRepository()

@classmethod
def _should_migrate(cls, config: Mapping[str, Any]) -> bool:
"""
legacy config:
{
"start_date": "2021-07-22T20:00:00Z",
"end_date": "2021-07-23T20:00:00Z",
"lookback_window": 1,
"join_channels": True,
"channel_filter": ["airbyte-for-beginners", "good-reads"],
"api_token": "api-token"
}
api token should be in the credentials object
"""
if config.get("api_token") and not config.get("credentials"):
return True
return False

@classmethod
def _move_token_to_credentials(cls, config: Mapping[str, Any]) -> Mapping[str, Any]:
api_token = config["api_token"]
config.update({"credentials": {"api_token": api_token, "option_title": "API Token Credentials"}})
config.pop("api_token")
return config

@classmethod
def _modify_and_save(cls, config_path: str, source: SourceSlack, config: Mapping[str, Any]) -> Mapping[str, Any]:
migrated_config = cls._move_token_to_credentials(config)
# save the config
source.write_config(migrated_config, config_path)
return migrated_config

@classmethod
def _emit_control_message(cls, migrated_config: Mapping[str, Any]) -> None:
# add the Airbyte Control Message to message repo
cls.message_repository.emit_message(create_connector_config_control_message(migrated_config))
# emit the Airbyte Control Message from message queue to stdout
for message in cls.message_repository._message_queue:
print(message.json(exclude_unset=True))

@classmethod
def migrate(cls, args: List[str], source: SourceSlack) -> None:
"""
This method checks the input args, should the config be migrated,
transform if necessary and emit the CONTROL message.
"""
# get config path
config_path = AirbyteEntrypoint(source).extract_config(args)
# proceed only if `--config` arg is provided
if config_path:
# read the existing config
config = source.read_config(config_path)
# migration check
if cls._should_migrate(config):
cls._emit_control_message(
cls._modify_and_save(config_path, source, config),
)
Loading
Loading