From d37a80a257da35c690415f22b1448bdcdfd3f6bc Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Tue, 19 Sep 2023 15:01:03 +0000 Subject: [PATCH 1/5] Added Slack Docs! --- .../dlt-ecosystem/verified-sources/slack.md | 264 ++++++++++++++++++ docs/website/sidebars.js | 1 + 2 files changed, 265 insertions(+) create mode 100644 docs/website/docs/dlt-ecosystem/verified-sources/slack.md diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md new file mode 100644 index 0000000000..71501e1bd4 --- /dev/null +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -0,0 +1,264 @@ +--- +title: Slack +description: dlt verified source for Slack API +keywords: [slack api, slack verified source, slack] +--- + +# Slack + +:::info Need help deploying these sources, or figuring out how to run them in your data stack? + +[Join our Slack community](https://dlthub-community.slack.com/join/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g) +or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian. +::: + +[Slack](https://slack.com/) is a popular messaging and collaboration platform for teams and organizations. + +This Slack `dlt` verified source and +[pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/slack_pipeline.py) +loads data using “Slack API” to the destination of your choice. + +Sources and resources that can be loaded using this verified source are: + +| Name | Description | +| --------------------- | -------------------------------------------------------------------------- | +| slack_source | Retrives resources conversations, conversations_history and access_logs | +| channels_resource | Retrives all the channels | +| get_messages_resource | Retrives all the messages for a given channel | +| access_logs | Retrives the access logs | + +## Setup Guide + +### Grab user OAuth token + +To set up the pipeline, create a Slack app in your workspace to obtain a user token for accessing the Slack API. + +1. Navigate to your Slack workspace and click on the name at the top-left. +1. Select Tools > Customize Workspace. +1. From the top-left Menu, choose Configure apps. +1. Click Build (top-right) > Create a New App. +1. Opt for "From scratch", set the "App Name", and pick your target workspace. +1. Confirm with Create App. +1. Navigate to OAuth and Permissions under the Features section. +1. Assign the following scopes: + + | Name | Description | + | -------------------- | --------------------------------------------------------------------------------- | + | admin | Administer a workspace | + | channels:history | View messages and other content in public channels | + | groups:history | View messages and other content in private channels (where the app is added) | + | im:history | View messages and other content in direct messages (where the app is added) | + | mpim:history | View messages and other content in group direct messages (where the app is added) | + | channels:read | View basic information about public channels in a workspace | + | groups:read | View basic information about private channels (where the app is added) | + | im:read | View basic information about direct messages (where the app is added) | + | mpim:read | View basic information about group direct messages (where the app is added) | + > Note: These scopes are adjustable; tailor them to your needs. + +1. From "OAuth & Permissions" on the left, add the scopes and copy the User OAuth Token. + + +### Initialize the verified source + +To get started with your data pipeline, follow these steps: + +1. Enter the following command: + + ```bash + dlt init slack duckdb + ``` + + [This command](../../reference/command-line-interface) will initialize + [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/slack_pipeline.py) + with Google Sheets as the [source](../../general-usage/source) and + [duckdb](../destinations/duckdb.md) as the [destination](../destinations). + +1. If you'd like to use a different destination, simply replace `duckdb` with the name of your + preferred [destination](../destinations). + +1. After running this command, a new directory will be created with the necessary files and + configuration settings to get started. + +For more information, read the +[Walkthrough: Add a verified source.](../../walkthroughs/add-a-verified-source) + +### Add credentials + +1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive + information securely, like access tokens. Keep this file safe. Here's its format for service + account authentication: + + ```toml + [sources.slack] + access_token = "Please set me up!" # please set me up! + ``` + +1. Copy the user Oauth token you [copied above](#grab-user-oauth-token). + +1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). + +## Run the pipeline + +1. Before running the pipeline, ensure that you have installed all the necessary dependencies by + running the command: + + ```bash + pip install -r requirements.txt + ``` + +1. You're now ready to run the pipeline! To get started, run the following command: + + ```bash + python3 slack_pipeline.py + ``` + +1. Once the pipeline has finished running, you can verify that everything loaded correctly by using + the following command: + + ```bash + dlt pipeline show + ``` + + For example, the `pipeline_name` for the above pipeline example is `slack`, you + may also use any custom name instead. + + For more information, read the [Walkthrough: Run a pipeline](../../walkthroughs/run-a-pipeline). + +## Sources and resources + +`dlt` works on the principle of [sources](../../general-usage/source) and +[resources](../../general-usage/resource). + +### Source `slack_source` + +It retrieves data from Slack's API and returns resources conversations, conversations_history, and access_logs. + +```python +@dlt.source(name="slack", max_table_nesting=2) +def slack_source( + page_size: int = MAX_PAGE_SIZE, + access_token: str = dlt.secrets.value, + start_date: Optional[TAnyDateTime] = DEFAULT_START_DATE, + end_date: Optional[TAnyDateTime] = None, + selected_channels: Optional[List[str]] = dlt.config.value, +) -> Iterable[DltResource]: +``` + +`page_size`: Maximum items per page (default: 1000). + +`access_token`: OAuth token for authentication. + +`start_date`: Range start. (default: January 1, 2000). + +`end_date`: Range end. + +`selected_channels`: Channels to load; defaults to all if unspecified. + +### Resource `channels_resource` + +This function yields all the channels as `dlt` resource. + + +```python +@dlt.resource(name="channels", primary_key="id", write_disposition="replace") +def channels_resource() -> Iterable[TDataItem]: + + yield from channel +yield channels_resource +``` + +### Resource `get_messages_resource` + +This method fetches messages for a specified channel from the Slack API. It creates a resource for each channel in channels. + +```python +def get_messages_resource( + channel_data: Dict[str, Any], + created_at: dlt.sources.incremental[DateTime] = dlt.sources.incremental( + "ts", + initial_value=start_dt, + end_value=end_dt, + allow_external_schedulers=True, + ), +) -> Iterable[TDataItem]: +``` + +`channel_data`: A dictionary detailing a specific channel to determine where messages are fetched from. + +`created_at`: An optional parameter leveraging dlt.sources.incremental to define the timestamp range for message retrieval. Sub-arguments include: + + - `ts`: Timestamp from the Slack API response. + + - `initial_value`: Start of the timestamp range, defaulting to start_dt in slack_source. + + - `end_value`: Timestamp range end, defaulting to end_dt in slack_source. + + - `allow_external_schedulers`: A boolean that, if True, permits external schedulers to manage incremental loading. + +### Resource `access_logs` + +This method retrieves access logs from the Slack API. + +```python +@dlt.resource( + name="access_logs", + selected=False, + primary_key="user_id", + write_disposition="append", +) +# it is not an incremental resource it just has a end_date filter +def logs_resource() -> Iterable[TDataItem]: +``` + +`selected`: A boolean set to False, indicating the resource isn't loaded by default. + +`primary_key`: The unique identifier is "user_id". + +`write_disposition`: Set to "append", allowing new data to join existing data in the destination. +> Note: This resource may not function in the pipeline or tests due to its paid status. An error arises for non-paying accounts. + +### Create your own pipeline + +If you wish to create your own pipelines, you can leverage source and resource methods from this +verified source. + +1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows: + + ```python + pipeline = dlt.pipeline( + pipeline_name="google_sheets", # Use a custom name if desired + destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) + dataset_name="google_spreadsheet_data" # Use a custom name if desired + ) + ``` +1. To load Slack resources from the specified start date: + + ```python + source = slack_source(page_size=1000, start_date=datetime(2023, 9, 1), end_date=datetime(2023, 9, 8)) + + # Enable below to load only 'access_logs', available for paid accounts only. + # source.access_logs.selected = True + + load_info = pipeline.run(source) + print(load_info) + ``` + > Subsequent runs will load only items updated since the previous run. + +1. To load selected Slack resources from the specified start date: + + ```python + # To load data from selected channel. + selected_channels=["Please set me up!"] # Enter the channel name here. + + source = slack_source( + page_size=20, + selected_channels=selected_channels, + start_date=datetime(2023, 9, 1), + end_date=datetime(2023, 9, 8), + ) + + load_info = pipeline.run(source) + print(load_info) + ``` + > It loads data starting from 1st September 2023 to 8th Sep 2023. + diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 4b7aa65c26..7b52bf89e8 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -52,6 +52,7 @@ const sidebars = { 'dlt-ecosystem/verified-sources/salesforce', 'dlt-ecosystem/verified-sources/shopify', 'dlt-ecosystem/verified-sources/sql_database', + 'dlt-ecosystem/verified-sources/slack', 'dlt-ecosystem/verified-sources/strapi', 'dlt-ecosystem/verified-sources/stripe', 'dlt-ecosystem/verified-sources/workable', From 166999439398db8ec656a5a7281e9f8cfb3af494 Mon Sep 17 00:00:00 2001 From: AstrakhantsevaAA Date: Fri, 29 Sep 2023 16:57:56 +0200 Subject: [PATCH 2/5] update --- .../dlt-ecosystem/verified-sources/slack.md | 88 +++++++++---------- 1 file changed, 44 insertions(+), 44 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index 71501e1bd4..439cfec6b1 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -20,12 +20,13 @@ loads data using “Slack API” to the destination of your choice. Sources and resources that can be loaded using this verified source are: -| Name | Description | -| --------------------- | -------------------------------------------------------------------------- | -| slack_source | Retrives resources conversations, conversations_history and access_logs | -| channels_resource | Retrives all the channels | -| get_messages_resource | Retrives all the messages for a given channel | -| access_logs | Retrives the access logs | +| Name | Description | +| --------------------- |------------------------------------------------------------------------------------| +| slack_source | Retrives all the Slack data: channels, messages for selected channels, users, logs | +| channels_resource | Retrives all the channels data | +| get_messages_resource | Retrives all the messages for a given channel | +| access_logs | Retrives the access logs | +| users_resource | Retrives all the users info | ## Setup Guide @@ -40,23 +41,25 @@ To set up the pipeline, create a Slack app in your workspace to obtain a user to 1. Opt for "From scratch", set the "App Name", and pick your target workspace. 1. Confirm with Create App. 1. Navigate to OAuth and Permissions under the Features section. -1. Assign the following scopes: - - | Name | Description | - | -------------------- | --------------------------------------------------------------------------------- | - | admin | Administer a workspace | - | channels:history | View messages and other content in public channels | - | groups:history | View messages and other content in private channels (where the app is added) | - | im:history | View messages and other content in direct messages (where the app is added) | - | mpim:history | View messages and other content in group direct messages (where the app is added) | - | channels:read | View basic information about public channels in a workspace | - | groups:read | View basic information about private channels (where the app is added) | - | im:read | View basic information about direct messages (where the app is added) | - | mpim:read | View basic information about group direct messages (where the app is added) | +1. Assign the following scopes: + + | Name | Description | + |------------------|-----------------------------------------------------------------------------------| + | admin | Administer a workspace | + | channels:history | View messages and other content in public channels | + | groups:history | View messages and other content in private channels (where the app is added) | + | im:history | View messages and other content in direct messages (where the app is added) | + | mpim:history | View messages and other content in group direct messages (where the app is added) | + | channels:read | View basic information about public channels in a workspace | + | groups:read | View basic information about private channels (where the app is added) | + | im:read | View basic information about direct messages (where the app is added) | + | mpim:read | View basic information about group direct messages (where the app is added) | + | users:read | View people in a workspace | > Note: These scopes are adjustable; tailor them to your needs. 1. From "OAuth & Permissions" on the left, add the scopes and copy the User OAuth Token. +> Note: The Slack UI, which is described here, might change. The official guide is available at this [link](https://api.slack.com/start/quickstart). ### Initialize the verified source @@ -85,13 +88,14 @@ For more information, read the ### Add credentials 1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive - information securely, like access tokens. Keep this file safe. Here's its format for service - account authentication: + information securely, like access tokens. Keep this file safe. - ```toml - [sources.slack] - access_token = "Please set me up!" # please set me up! - ``` + Here's its format for service account authentication: + + ```toml + [sources.slack] + access_token = "Please set me up!" # please set me up! + ``` 1. Copy the user Oauth token you [copied above](#grab-user-oauth-token). @@ -109,7 +113,7 @@ For more information, read the 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 slack_pipeline.py + python slack_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using @@ -129,9 +133,9 @@ For more information, read the `dlt` works on the principle of [sources](../../general-usage/source) and [resources](../../general-usage/resource). -### Source `slack_source` +### Source `slack` -It retrieves data from Slack's API and returns resources conversations, conversations_history, and access_logs. +It retrieves data from Slack's API and fetches Slack Conversations, History, Users info and logs. ```python @dlt.source(name="slack", max_table_nesting=2) @@ -154,22 +158,18 @@ def slack_source( `selected_channels`: Channels to load; defaults to all if unspecified. -### Resource `channels_resource` - -This function yields all the channels as `dlt` resource. +### Resource `channels` +This function yields all the channels data as `dlt` resource. ```python @dlt.resource(name="channels", primary_key="id", write_disposition="replace") def channels_resource() -> Iterable[TDataItem]: - - yield from channel -yield channels_resource ``` ### Resource `get_messages_resource` -This method fetches messages for a specified channel from the Slack API. It creates a resource for each channel in channels. +This method fetches messages for a specified channel from the Slack API. It creates a resource for each channel with channel's name. ```python def get_messages_resource( @@ -192,8 +192,8 @@ def get_messages_resource( - `initial_value`: Start of the timestamp range, defaulting to start_dt in slack_source. - `end_value`: Timestamp range end, defaulting to end_dt in slack_source. - - - `allow_external_schedulers`: A boolean that, if True, permits external schedulers to manage incremental loading. + + - `allow_external_schedulers`: A boolean that, if True, permits [external schedulers](../../general-usage/incremental-loading#using-airflow-schedule-for-backfill-and-incremental-loading) to manage incremental loading. ### Resource `access_logs` @@ -217,6 +217,7 @@ def logs_resource() -> Iterable[TDataItem]: `write_disposition`: Set to "append", allowing new data to join existing data in the destination. > Note: This resource may not function in the pipeline or tests due to its paid status. An error arises for non-paying accounts. +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this @@ -226,9 +227,9 @@ verified source. ```python pipeline = dlt.pipeline( - pipeline_name="google_sheets", # Use a custom name if desired + pipeline_name="slack", # Use a custom name if desired destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) - dataset_name="google_spreadsheet_data" # Use a custom name if desired + dataset_name="slack_data" # Use a custom name if desired ) ``` 1. To load Slack resources from the specified start date: @@ -239,6 +240,7 @@ verified source. # Enable below to load only 'access_logs', available for paid accounts only. # source.access_logs.selected = True + # It loads data starting from 1st September 2023 to 8th Sep 2023. load_info = pipeline.run(source) print(load_info) ``` @@ -247,8 +249,8 @@ verified source. 1. To load selected Slack resources from the specified start date: ```python - # To load data from selected channel. - selected_channels=["Please set me up!"] # Enter the channel name here. + # To load data from selected channels. + selected_channels=["general", "random"] # Enter the channel names here. source = slack_source( page_size=20, @@ -256,9 +258,7 @@ verified source. start_date=datetime(2023, 9, 1), end_date=datetime(2023, 9, 8), ) - + # It loads data starting from 1st September 2023 to 8th Sep 2023 from the channels: "general" and "random". load_info = pipeline.run(source) print(load_info) ``` - > It loads data starting from 1st September 2023 to 8th Sep 2023. - From 101c5e411d33bd21d251d16c6a1accfc68a1da2e Mon Sep 17 00:00:00 2001 From: AstrakhantsevaAA Date: Fri, 29 Sep 2023 17:14:26 +0200 Subject: [PATCH 3/5] add users --- .../dlt-ecosystem/verified-sources/slack.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index 439cfec6b1..dd016275bd 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -21,12 +21,12 @@ loads data using “Slack API” to the destination of your choice. Sources and resources that can be loaded using this verified source are: | Name | Description | -| --------------------- |------------------------------------------------------------------------------------| -| slack_source | Retrives all the Slack data: channels, messages for selected channels, users, logs | -| channels_resource | Retrives all the channels data | +|-----------------------|------------------------------------------------------------------------------------| +| slack | Retrives all the Slack data: channels, messages for selected channels, users, logs | +| channels | Retrives all the channels data | +| users | Retrives all the users info | | get_messages_resource | Retrives all the messages for a given channel | | access_logs | Retrives the access logs | -| users_resource | Retrives all the users info | ## Setup Guide @@ -167,6 +167,15 @@ This function yields all the channels data as `dlt` resource. def channels_resource() -> Iterable[TDataItem]: ``` +### Resource `users` + +This function yields all the users data as `dlt` resource. + +```python +@dlt.resource(name="users", primary_key="id", write_disposition="replace") +def users_resource() -> Iterable[TDataItem]: +``` + ### Resource `get_messages_resource` This method fetches messages for a specified channel from the Slack API. It creates a resource for each channel with channel's name. From 3535f33e303e8c468f2b4b1a103dbb3798036c48 Mon Sep 17 00:00:00 2001 From: AstrakhantsevaAA Date: Fri, 29 Sep 2023 17:20:39 +0200 Subject: [PATCH 4/5] update --- docs/website/docs/dlt-ecosystem/verified-sources/slack.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index dd016275bd..392c7b1a27 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -135,7 +135,7 @@ For more information, read the ### Source `slack` -It retrieves data from Slack's API and fetches Slack Conversations, History, Users info and logs. +It retrieves data from Slack's API and fetches the Slack data such as channels, messages for selected channels, users, logs. ```python @dlt.source(name="slack", max_table_nesting=2) From 324c463750293e090964da62fc6b3a7e648304a1 Mon Sep 17 00:00:00 2001 From: AstrakhantsevaAA Date: Fri, 29 Sep 2023 17:26:43 +0200 Subject: [PATCH 5/5] update --- .../dlt-ecosystem/verified-sources/slack.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index 392c7b1a27..f786761390 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -255,7 +255,7 @@ verified source. ``` > Subsequent runs will load only items updated since the previous run. -1. To load selected Slack resources from the specified start date: +1. To load data from selected Slack channels from the specified start date: ```python # To load data from selected channels. @@ -271,3 +271,20 @@ verified source. load_info = pipeline.run(source) print(load_info) ``` + +1. To load only messages from selected Slack resources: + + ```python + # To load data from selected channels. + selected_channels=["general", "random"] # Enter the channel names here. + + source = slack_source( + page_size=20, + selected_channels=selected_channels, + start_date=datetime(2023, 9, 1), + end_date=datetime(2023, 9, 8), + ) + # It loads only massages from the channel "general". + load_info = pipeline.run(source.with_resources("general")) + print(load_info) + ```