Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Slack Docs! #643

Merged
merged 5 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 264 additions & 0 deletions docs/website/docs/dlt-ecosystem/verified-sources/slack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
---
title: Slack
description: dlt verified source for Slack API
keywords: [slack api, slack verified source, slack]
---

# Slack

:::info Need help deploying these sources, or figuring out how to run them in your data stack?

[Join our Slack community](https://dlthub-community.slack.com/join/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g)
or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian.
:::

[Slack](https://slack.com/) is a popular messaging and collaboration platform for teams and organizations.

This Slack `dlt` verified source and
[pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/slack_pipeline.py)
loads data using “Slack API” to the destination of your choice.

Sources and resources that can be loaded using this verified source are:

| Name | Description |
| --------------------- | -------------------------------------------------------------------------- |
| slack_source | Retrives resources conversations, conversations_history and access_logs |
| channels_resource | Retrives all the channels |
| get_messages_resource | Retrives all the messages for a given channel |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For getting the messages we use the selected channel name.

| access_logs | Retrives the access logs |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The users resource is in review and should be merged soon, I think you can add it too.

## Setup Guide

### Grab user OAuth token

To set up the pipeline, create a Slack app in your workspace to obtain a user token for accessing the Slack API.

1. Navigate to your Slack workspace and click on the name at the top-left.
1. Select Tools > Customize Workspace.
1. From the top-left Menu, choose Configure apps.
1. Click Build (top-right) > Create a New App.
1. Opt for "From scratch", set the "App Name", and pick your target workspace.
1. Confirm with Create App.
1. Navigate to OAuth and Permissions under the Features section.
1. Assign the following scopes:

| Name | Description |
| -------------------- | --------------------------------------------------------------------------------- |
| admin | Administer a workspace |
| channels:history | View messages and other content in public channels |
| groups:history | View messages and other content in private channels (where the app is added) |
| im:history | View messages and other content in direct messages (where the app is added) |
| mpim:history | View messages and other content in group direct messages (where the app is added) |
| channels:read | View basic information about public channels in a workspace |
| groups:read | View basic information about private channels (where the app is added) |
| im:read | View basic information about direct messages (where the app is added) |
| mpim:read | View basic information about group direct messages (where the app is added) |
> Note: These scopes are adjustable; tailor them to your needs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the users resource, add users:read permission too.


1. From "OAuth & Permissions" on the left, add the scopes and copy the User OAuth Token.


### Initialize the verified source

To get started with your data pipeline, follow these steps:

1. Enter the following command:

```bash
dlt init slack duckdb
```

[This command](../../reference/command-line-interface) will initialize
[the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/slack_pipeline.py)
with Google Sheets as the [source](../../general-usage/source) and
[duckdb](../destinations/duckdb.md) as the [destination](../destinations).

1. If you'd like to use a different destination, simply replace `duckdb` with the name of your
preferred [destination](../destinations).

1. After running this command, a new directory will be created with the necessary files and
configuration settings to get started.

For more information, read the
[Walkthrough: Add a verified source.](../../walkthroughs/add-a-verified-source)

### Add credentials

1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive
information securely, like access tokens. Keep this file safe. Here's its format for service
account authentication:

```toml
[sources.slack]
access_token = "Please set me up!" # please set me up!
```

1. Copy the user Oauth token you [copied above](#grab-user-oauth-token).

1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/).

## Run the pipeline

1. Before running the pipeline, ensure that you have installed all the necessary dependencies by
running the command:

```bash
pip install -r requirements.txt
```

1. You're now ready to run the pipeline! To get started, run the following command:

```bash
python3 slack_pipeline.py
```

1. Once the pipeline has finished running, you can verify that everything loaded correctly by using
the following command:

```bash
dlt pipeline <pipeline_name> show
```

For example, the `pipeline_name` for the above pipeline example is `slack`, you
may also use any custom name instead.

For more information, read the [Walkthrough: Run a pipeline](../../walkthroughs/run-a-pipeline).

## Sources and resources

`dlt` works on the principle of [sources](../../general-usage/source) and
[resources](../../general-usage/resource).

### Source `slack_source`

It retrieves data from Slack's API and returns resources conversations, conversations_history, and access_logs.

```python
@dlt.source(name="slack", max_table_nesting=2)
def slack_source(
page_size: int = MAX_PAGE_SIZE,
access_token: str = dlt.secrets.value,
start_date: Optional[TAnyDateTime] = DEFAULT_START_DATE,
end_date: Optional[TAnyDateTime] = None,
selected_channels: Optional[List[str]] = dlt.config.value,
) -> Iterable[DltResource]:
```

`page_size`: Maximum items per page (default: 1000).

`access_token`: OAuth token for authentication.

`start_date`: Range start. (default: January 1, 2000).

`end_date`: Range end.

`selected_channels`: Channels to load; defaults to all if unspecified.

### Resource `channels_resource`

This function yields all the channels as `dlt` resource.


```python
@dlt.resource(name="channels", primary_key="id", write_disposition="replace")
def channels_resource() -> Iterable[TDataItem]:

yield from channel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if you are adding this you should add the lines above it, as the channel variable is defined above, but I am not sure about it.

yield channels_resource
```

### Resource `get_messages_resource`

This method fetches messages for a specified channel from the Slack API. It creates a resource for each channel in channels.

```python
def get_messages_resource(
channel_data: Dict[str, Any],
created_at: dlt.sources.incremental[DateTime] = dlt.sources.incremental(
"ts",
initial_value=start_dt,
end_value=end_dt,
allow_external_schedulers=True,
),
) -> Iterable[TDataItem]:
```

`channel_data`: A dictionary detailing a specific channel to determine where messages are fetched from.

`created_at`: An optional parameter leveraging dlt.sources.incremental to define the timestamp range for message retrieval. Sub-arguments include:

- `ts`: Timestamp from the Slack API response.

- `initial_value`: Start of the timestamp range, defaulting to start_dt in slack_source.

- `end_value`: Timestamp range end, defaulting to end_dt in slack_source.

- `allow_external_schedulers`: A boolean that, if True, permits external schedulers to manage incremental loading.

### Resource `access_logs`

This method retrieves access logs from the Slack API.

```python
@dlt.resource(
name="access_logs",
selected=False,
primary_key="user_id",
write_disposition="append",
)
# it is not an incremental resource it just has a end_date filter
def logs_resource() -> Iterable[TDataItem]:
```

`selected`: A boolean set to False, indicating the resource isn't loaded by default.

`primary_key`: The unique identifier is "user_id".

`write_disposition`: Set to "append", allowing new data to join existing data in the destination.
> Note: This resource may not function in the pipeline or tests due to its paid status. An error arises for non-paying accounts.

### Create your own pipeline

If you wish to create your own pipelines, you can leverage source and resource methods from this
verified source.

1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows:

```python
pipeline = dlt.pipeline(
pipeline_name="google_sheets", # Use a custom name if desired
destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post)
dataset_name="google_spreadsheet_data" # Use a custom name if desired
)
```
1. To load Slack resources from the specified start date:

```python
source = slack_source(page_size=1000, start_date=datetime(2023, 9, 1), end_date=datetime(2023, 9, 8))

# Enable below to load only 'access_logs', available for paid accounts only.
# source.access_logs.selected = True

load_info = pipeline.run(source)
print(load_info)
```
> Subsequent runs will load only items updated since the previous run.

1. To load selected Slack resources from the specified start date:

```python
# To load data from selected channel.
selected_channels=["Please set me up!"] # Enter the channel name here.

source = slack_source(
page_size=20,
selected_channels=selected_channels,
start_date=datetime(2023, 9, 1),
end_date=datetime(2023, 9, 8),
)

load_info = pipeline.run(source)
print(load_info)
```
> It loads data starting from 1st September 2023 to 8th Sep 2023.

1 change: 1 addition & 0 deletions docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ const sidebars = {
'dlt-ecosystem/verified-sources/salesforce',
'dlt-ecosystem/verified-sources/shopify',
'dlt-ecosystem/verified-sources/sql_database',
'dlt-ecosystem/verified-sources/slack',
'dlt-ecosystem/verified-sources/strapi',
'dlt-ecosystem/verified-sources/stripe',
'dlt-ecosystem/verified-sources/workable',
Expand Down