diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md b/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md index 480b8f4a7d..6718ff15c2 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/airtable.md @@ -6,7 +6,7 @@ keywords: [airtable api, airtable verified source, airtable] # Airtable -Airtable is a cloud-based platform that merges spreadsheet and database functionalities for easy +[Airtable](https://www.airtable.com/) is a cloud-based platform that merges spreadsheet and database functionalities for easy data management and collaboration. This Airtable `dlt` verified source and @@ -24,8 +24,6 @@ Sources and resources that can be loaded using this verified source are: ### Grab Airtable personal access tokens - - 1. Click your account icon top-right. 1. Choose "Developer Hub" from the dropdown. 1. Select "Personal access token" on the left, then "Create new token". @@ -106,6 +104,8 @@ For more information, read the > Optionally, you can also input "base_id" and "table_names" in the script, as in the pipeline > example. +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -118,7 +118,7 @@ For more information, read the 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 airtable_pipeline.py + python airtable_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/asana.md b/docs/website/docs/dlt-ecosystem/verified-sources/asana.md index 3dda07a106..6a66f9c739 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/asana.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/asana.md @@ -12,7 +12,7 @@ keywords: [asana api, verified source, asana] or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian. ::: -Asana is a widely used web-based project management and collaboration tool that helps teams stay +[Asana](https://asana.com) is a widely used web-based project management and collaboration tool that helps teams stay organized, focused, and productive. With Asana, team members can easily create, assign, and track tasks, set deadlines, and communicate with each other in real-time. @@ -47,8 +47,8 @@ To get a complete list of sub-endpoints that can be loaded, see 1. This token will be used to configure `.dlt/secrets.toml`, so keep it secure and don't share it with anyone. -More information you can see in the -[Asana official documentation](https://developers.asana.com/docs/authentication). +> Note: The Asana UI, which is described here, might change. +The full guide is available at [this link.](https://developers.asana.com/docs/authentication) ### Initialize the verified source @@ -100,7 +100,7 @@ For more information, read the [General Usage: Credentials.](../../general-usage ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 asana_dlt_pipeline.py + python asana_dlt_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/chess.md b/docs/website/docs/dlt-ecosystem/verified-sources/chess.md index 2ff2cc3237..e528f57d87 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/chess.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/chess.md @@ -74,7 +74,7 @@ For more information, read the [General Usage: Credentials.](../../general-usage 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 chess_pipeline.py + python chess_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md b/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md index c565095230..db3c6e0b81 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/facebook_ads.md @@ -18,7 +18,7 @@ Facebook and its affiliated apps like Instagram and Messenger. This Facebook `dlt` verified source and [pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/facebook_ads_pipeline.py) -loads data using “Facebook Marketing API” to the destination of your choice. +loads data using [Facebook Marketing API](https://developers.facebook.com/products/marketing-api/) to the destination of your choice. The endpoints that this verified source supports are: @@ -85,13 +85,17 @@ debug_access_token() We highly recommend you to add the token expiration timestamp to get notified a week before token expiration that you need to rotate it. Right now the notifications are sent to logger with error -level. In config.toml / secrets.toml: +level. In `config.toml` / `secrets.toml`: ```toml [sources.facebook_ads] -access_token_expires_at=1688821881 +access_token_expires_at=1688821881... ``` +> Note: The Facebook UI, which is described here, might change. +The full guide is available at [this link.](https://developers.facebook.com/docs/marketing-apis/overview/authentication) + + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -150,6 +154,8 @@ For more information, read the 1. Replace the value of the "account id" with the one [copied above](#grab-account-id). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -159,7 +165,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 facebook_ads_pipeline.py + python facebook_ads_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -269,7 +275,7 @@ def facebook_insights_source( ) -> DltResource: ``` -`account_id`: Account id associated with add manager, configured in _config.toml_. +`account_id`: Account id associated with ads manager, configured in _config.toml_. `access_token`: Access token associated with the Business Facebook App, configured in _secrets.toml_. diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/github.md b/docs/website/docs/dlt-ecosystem/verified-sources/github.md index f68d872493..539d6131ae 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/github.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/github.md @@ -13,7 +13,7 @@ or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support ::: This verified source can be used to load data on issues or pull requests from any GitHub repository -onto a [destination](../../dlt-ecosystem/destinations) of your choice. +onto a [destination](../../dlt-ecosystem/destinations) of your choice using [GitHub API](https://docs.github.com/en/rest?apiVersion=2022-11-28). Resources that can be loaded using this verified source are: @@ -52,14 +52,13 @@ To get the API token, sign-in to your GitHub account and follow these steps: 1. Copy the token and save it. This is to be added later in the `dlt` configuration. -> You can optionally add API access tokens to avoid making requests as an unauthorized user.\ -> Note: +> You can optionally add API access tokens to avoid making requests as an unauthorized user. > If you wish to load data using the github_reaction source, the access token is mandatory. More information you can see in the [GitHub authentication](https://docs.github.com/en/rest/overview/authenticating-to-the-rest-api?apiVersion=2022-11-28#basic-authentication) and -[Github API token scopes](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/scopes-for-oauth-apps) +[GitHub API token scopes](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/scopes-for-oauth-apps) documentations. ### Initialize the verified source @@ -106,7 +105,7 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. -For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs/run-a-pipeline) +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) ## Run the pipeline @@ -117,7 +116,7 @@ For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 github_pipeline.py + python github_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -127,6 +126,8 @@ For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs For example, the `pipeline_name` for the above pipeline example is `github_reactions`, you may also use any custom name instead. +For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs/run-a-pipeline) + ## Sources and resources `dlt` works on the principle of [sources](../../general-usage/source) and diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md index 0b063c3bda..cf2a6c7a4a 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_analytics.md @@ -36,18 +36,18 @@ tokens are preferred when user consent is required, while service account creden suited for server-to-server interactions. You can choose the method of authentication as per your requirement. -### Grab google service account credentials +### Grab Google service account credentials -You need to create a GCP service account to get API credentials, if you don't have one. To create -one follow these steps: +You need to create a GCP service account to get API credentials if you don't have one. To create + one, follow these steps: 1. Sign in to [console.cloud.google.com](http://console.cloud.google.com/). -1. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#creating) if +1. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#creating) if needed. 1. Enable "Google Analytics API", refer - [google documentation](https://support.google.com/googleapi/answer/6158841?hl=en) for + [Google documentation](https://support.google.com/googleapi/answer/6158841?hl=en) for comprehensive instructions on this process. 1. Generate credentials: @@ -60,7 +60,7 @@ one follow these steps: ### Grab google OAuth credentials -You need to create a GCP account to get OAuth credentials, if you don't have one. To create one +You need to create a GCP account to get OAuth credentials if you don't have one. To create one, follow these steps: 1. Ensure your email used for the GCP account has access to the GA4 property. @@ -91,10 +91,10 @@ follow these steps: 1. Add your email as a test user. After configuring "client_id", "client_secret" and "project_id" in "secrets.toml". To generate the -refresh token run the following script from the root folder: +refresh token, run the following script from the root folder: ```bash -python3 google_analytics/setup_script_gcp_oauth.py +python google_analytics/setup_script_gcp_oauth.py ``` Once you have executed the script and completed the authentication, you will receive a "refresh @@ -161,10 +161,10 @@ For more information, read the 1. From the ".json" that you [downloaded earlier](google_analytics.md#grab-google-service-account-credentials), - copy `project_id`, `private_key`, - and `client_email` under `[sources.google_analytics.credentials]`. + copy `project_id`, `private_key`, + and `client_email` under `[sources.google_analytics.credentials]`. -1. Alternatively, if you're using OAuth credentials, replace the the fields and values with those +1. Alternatively, if you're using OAuth credentials, replace the fields and values with those you [grabbed for OAuth credentials](google_analytics.md#grab-google-oauth-credentials). 1. The secrets.toml for OAuth authentication looks like: @@ -177,7 +177,7 @@ For more information, read the project_id = "project_id" # please set me up! ``` -1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). +1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). #### Pass `property_id` and `request parameters` @@ -209,6 +209,8 @@ For more information, read the 1. To use queries from `.dlt/config.toml`, run the `simple_load_config()` function in [pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/google_analytics_pipeline.py). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -218,7 +220,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 google_analytics_pipeline.py + python google_analytics_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -295,6 +297,7 @@ def metrics_table(metadata: Metadata) -> Iterator[TDataItem]: Similarly, there is a transformer function called `dimensions_table` that populates table called "dimensions" with the data from each dimension. +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this @@ -304,9 +307,9 @@ verified source. ```python pipeline = dlt.pipeline( - pipeline_name="google_analytics", # Use a custom name if desired - destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) - dataset_name="GA4_data" # Use a custom name if desired + pipeline_name="google_analytics", # Use a custom name if desired + destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) + dataset_name="GA4_data" # Use a custom name if desired ) ``` diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md index aaf823b702..b0ab4ecaa2 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/google_sheets.md @@ -41,10 +41,10 @@ OAuth tokens are preferred when user consent is required, while service account better suited for server-to-server interactions. Here we recommend using service account credentials. You can choose the method of authentication as per your requirement. -### Grab google service account credentials +### Grab Google service account credentials -You need to create a GCP service account to get API credentials, if you don't have one. To create -one follow these steps: +You need to create a GCP service account to get API credentials if you don't have one. To create + one, follow these steps: 1. Sign in to [console.cloud.google.com](http://console.cloud.google.com/). @@ -52,7 +52,7 @@ one follow these steps: needed. 1. Enable "Google Sheets API", refer - [google documentation](https://developers.google.com/sheets/api/guides/concepts) for + [Google documentation](https://developers.google.com/sheets/api/guides/concepts) for comprehensive instructions on this process. 1. Generate credentials: @@ -65,7 +65,7 @@ one follow these steps: ### Grab google OAuth credentials -You need to create a GCP account to get OAuth credentials, if you don't have one. To create one +You need to create a GCP account to get OAuth credentials if you don't have one. To create one, follow these steps: 1. Ensure your email used for the GCP account has access to the GA4 property. @@ -98,10 +98,10 @@ follow these steps: 1. Generate `refresh_token`: After configuring "client_id", "client_secret" and "project_id" in "secrets.toml". To generate - the refresh token run the following script from the root folder: + the refresh token, run the following script from the root folder: ```bash - python3 google_sheets/setup_script_gcp_oauth.py + python google_sheets/setup_script_gcp_oauth.py ``` Once you have executed the script and completed the authentication, you will receive a "refresh @@ -115,7 +115,7 @@ follow these steps: To allow the API to access the Google Sheet, open the sheet that you wish to use and do the following: -1. Select the share button on the top left corner. +1. Select the share button in the top left corner. ![Share_Button](docs_images/Share_button.png) @@ -128,37 +128,37 @@ following: ### Guidelines about headers -Make sure your data has headers and is in a form of well structured table. +Make sure your data has headers and is in the form of well-structured table. -First row of any extracted range should contain headers. Please make sure: +The first row of any extracted range should contain headers. Please make sure: 1. The header names are strings and are unique. -1. That all the columns that you intend to extract have a header. -1. That data starts exactly at the origin of the range - otherwise source will remove padding but it +1. All the columns that you intend to extract have a header. +1. The data starts exactly at the origin of the range - otherwise a source will remove padding, but it is a waste of resources. - > When source detects any problems with headers or table layout it will issue a WARNING in the - > log. Hence we advice to run your pipeline script manually/locally and fix all the problems. + > When a source detects any problems with headers or table layout, it will issue a WARNING in the + > log. Hence, we advise running your pipeline script manually/locally and fixing all the problems. 1. Columns without headers will be removed and not extracted. -1. Columns with headers that does not contain any data will be removed. -1. If there's any problems with reading headers (ie. header is not string or is empty or not +1. Columns with headers that do not contain any data will be removed. +1. If there are any problems with reading headers (i.e. header is not string or is empty or not unique): the headers row will be extracted as data and automatic header names will be used. 1. Empty rows are ignored 1. `dlt` will normalize range names and headers into table and column names - so they may be - different in the database than in google sheets. Prefer small cap names without special + different in the database than in Google Sheets. Prefer small cap names without special characters. ### Guidelines about named ranges -We recommend to to use +We recommend to use [Named Ranges](https://support.google.com/docs/answer/63175?hl=en&co=GENIE.Platform%3DDesktop) to -indicate which data should be extracted from a particular spreadsheet and this is how this source +indicate which data should be extracted from a particular spreadsheet, and this is how this source will work by default - when called without setting any other options. All the named ranges will be converted into tables, named after them and stored in the destination. -1. You can let the spreadsheet users to add and remove tables by just adding/removing the ranges, +1. You can let the spreadsheet users add and remove tables by just adding/removing the ranges, you do not need to configure the pipeline again. -1. You can indicate exactly the fragments of interest and only this data will be retrieved so it is +1. You can indicate exactly the fragments of interest, and only this data will be retrieved, so it is the fastest. 1. You can name database tables by changing the range names. @@ -169,7 +169,7 @@ converted into tables, named after them and stored in the destination. range_names = ["Range_1","Range_2","Sheet1!A1:D10"] ``` -1. You can pass explicit ranges to the google spreadsheet "ranged_names" as: +1. You can pass explicit ranges to the Google Spreadsheet "ranged_names" as: | Name | Example | | ------------ | ----------------------------------------- | @@ -186,8 +186,8 @@ If you are not happy with the workflow above, you can: 1. Pass a list of ranges as supported by Google Sheets in range_names. > Note: To retrieve all named ranges with "get_named_ranges" or all sheets with "get_sheets" - > methods, pass an empty `range_names` list as `range_names = []`. Even when you use set - > "get_named_ranges" to false pass the range_names as empty list to get all the sheets with + > methods, pass an empty `range_names` list as `range_names = []`. Even when you use a set + > "get_named_ranges" to false pass the range_names as an empty list to get all the sheets with > "get_sheets" method. ### Initialize the verified source @@ -231,7 +231,7 @@ For more information, read the [downloaded earlier](google_sheets.md#grab-google-service-account-credentials), copy `project_id`, `private_key`, and `client_email` under `[sources.google_sheets.credentials]`. -1. Alternatively, if you're using OAuth credentials, replace the the fields and values with those +1. Alternatively, if you're using OAuth credentials, replace the fields and values with those you [grabbed for OAuth credentials](google_sheets.md#grab-google-oauth-credentials). 1. The secrets.toml for OAuth authentication looks like: @@ -269,9 +269,11 @@ For more information, read the spreadsheet_identifier="1VTtCiYgxjAwcIw7UM1_BSaxC3rzIpr0HwXZwd2OlPD4" ``` -> Note: You have option to pass "range_names" and "spreadsheet_identifier" directly to the +> Note: You have an option to pass "range_names" and "spreadsheet_identifier" directly to the > google_spreadsheet function or in ".dlt/config.toml" +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -382,6 +384,7 @@ dlt.resource( case,"spreadsheet_id", means that the records will be merged based on the values in this column. [Read more](https://dlthub.com/docs/general-usage/incremental-loading#merge-incremental_loading). +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md b/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md index 4e5b836a66..c305b5b842 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/hubspot.md @@ -63,6 +63,11 @@ Follow these steps: 1. Click "Show token" and store it for ".dlt/secrets.toml". + +> Note: The Hubspot UI, which is described here, might change. +The full guide is available at [this link.](https://knowledge.hubspot.com/integrations/how-do-i-get-my-hubspot-api-key) + + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -105,6 +110,8 @@ For more information, read the 1. Enter credentials for your chosen destination as per the [docs](../destinations/). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -114,7 +121,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 hubspot_pipeline.py + python hubspot_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md index 76d8b1e67e..cbc24d2056 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/jira.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/jira.md @@ -41,6 +41,10 @@ To get a complete list of sub-endpoints that can be loaded, see 1. Safely copy the newly generated access token. +> Note: The Jira UI, which is described here, might change. +The full guide is available at [this link.](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/) + + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -93,6 +97,8 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -102,7 +108,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 jira_pipeline.py + python jira_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -171,7 +177,8 @@ def issues(jql_queries: List[str]) -> Iterable[TDataItem]: `jql_queries`: Accepts a list of JQL queries. -## Create Your Data Loading Pipeline +## Customization +### Create your own pipeline If you wish to create your own pipelines you can leverage source and resource methods as discussed above. diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md index 4a3527633f..374ec92ab9 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/matomo.md @@ -16,9 +16,9 @@ loads data using “Matomo API” to the destination of your choice. The endpoints that this verified source supports are: | Name | Description | -| ----------------- | ------------------------------------------------------------------------------- | -| matomo_reports | detailed analytics summaries of website traffic, visitor behavior, and more | -| matomo_visits | individual user sessions on your website, pages viewed, visit duration and more | +| ----------------- |---------------------------------------------------------------------------------| +| matomo_reports | Detailed analytics summaries of website traffic, visitor behavior, and more | +| matomo_visits | Individual user sessions on your website, pages viewed, visit duration and more | ## Setup Guide @@ -36,6 +36,9 @@ The endpoints that this verified source supports are: 1. Your Matomo URL is the web address in your browser when logged into Matomo, typically "https://mycompany.matomo.cloud/". Update it in the `.dlt/config.toml`. 1. The site_id is a unique ID for each monitored site in Matomo, found in the URL or via Administration > Measureables > Manage under ID. +> Note: The Matomo UI, which is described here, might change. +The full guide is available at [this link.](https://developer.matomo.org/guides/authentication-in-depth) + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -95,6 +98,8 @@ For more information, read the 1. To monitor live events on a website, enter the `live_event_site_id` (usually it is same as `site_id`). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -104,7 +109,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 matomo_pipeline.py + python matomo_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -263,14 +268,14 @@ verified source. ```python queries = [ - { - "resource_name": "custom_report_name", - "methods": ["CustomReports.getCustomReport"], - "date": "2023-01-01", - "period": "day", - "extra_params": {"idCustomReport": 1}, #id of the report - }, - ] + { + "resource_name": "custom_report_name", + "methods": ["CustomReports.getCustomReport"], + "date": "2023-01-01", + "period": "day", + "extra_params": {"idCustomReport": 1}, #id of the report + }, + ] site_id = 1 #id of the site for which reports are being loaded diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md b/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md index bfb7a8d99d..35f042f969 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/mongodb.md @@ -17,7 +17,7 @@ documents. This MongoDB `dlt` verified source and [pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/mongodb_pipeline.py) -loads data using “MongoDB" source to the destination of your choice. +loads data using "MongoDB" source to the destination of your choice. Sources and resources that can be loaded using this verified source are: @@ -59,7 +59,7 @@ Here are the typical ways to configure MongoDB and their connection URLs: #### Grab `database and collections` -1. To grab "database and collections" you must have MongoDB shell installed. For installation +1. To grab "database and collections" you must have MongoDB shell installed. For installation guidance, refer to [documentation here.](https://www.mongodb.com/docs/mongodb-shell/install/) 1. Modify the example URLs with your credentials (dbuser & passwd) and host details. @@ -146,10 +146,10 @@ For more information, read the connection_url = "mongodb connection_url" # please set me up! ``` -1. Replace the connection_url value with the [previously copied one](#grab-connection_url) to ensure +1. Replace the `connection_url` value with the [previously copied one](#grab-connection_url) to ensure secure access to your MongoDB sources. -1. Next, Follow the [destination documentation](../../dlt-ecosystem/destinations) instructions to +1. Next, follow the [destination documentation](../../dlt-ecosystem/destinations) instructions to add credentials for your chosen destination, ensuring proper routing of your data to the final destination. @@ -169,6 +169,8 @@ For more information, read the 1. Replace the value of the "database" and "collections_names" with the ones [copied above](#grab-database-and-collections). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -178,7 +180,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 mongodb_pipeline.py + python mongodb_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -237,6 +239,8 @@ def mongodb_collection( `collection`: Name of the collection to load. + +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this @@ -246,9 +250,9 @@ verified source. ```python pipeline = dlt.pipeline( - pipeline_name="mongodb_pipeline", # Use a custom name if desired - destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) - dataset_name="mongodb_data" # Use a custom name if desired + pipeline_name="mongodb_pipeline", # Use a custom name if desired + destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) + dataset_name="mongodb_data" # Use a custom name if desired ) ``` @@ -281,17 +285,17 @@ verified source. ```python load_data = mongodb_collection( - collection="movies", - incremental=dlt.sources.incremental( - "lastupdated", initial_value=pendulum.DateTime(2020, 9, 10, 0, 0, 0) + collection="movies", + incremental=dlt.sources.incremental( + "lastupdated", initial_value=pendulum.DateTime(2020, 9, 10, 0, 0, 0) )) - + load_info = pipeline.run(load_data, write_disposition="merge") ``` - > The source function "mongodb_collection" loads data from a particular single - > collection, where as source "mongodb" can load data from multiple collections. + > The source function "mongodb_collection" loads data from a particular single + > collection, where as source "mongodb" can load data from multiple collections. > This script configures incremental loading from the "movies" collection based on the > "lastupdated" field, starting from midnight on September 10, 2020. @@ -301,9 +305,9 @@ verified source. # Suitable for tables where new rows are added, but existing rows aren't updated. # Load data from the 'listingsAndReviews' collection in MongoDB, using 'last_scraped' for incremental addition. airbnb = mongodb().with_resources("listingsAndReviews") - + airbnb.listingsAndReviews.apply_hints( - incremental=dlt.sources.incremental("last_scraped") + incremental=dlt.sources.incremental("last_scraped") ) info = pipeline.run(airbnb, write_disposition="append") diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/mux.md b/docs/website/docs/dlt-ecosystem/verified-sources/mux.md index 344a896665..5fb794628e 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/mux.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/mux.md @@ -37,6 +37,9 @@ loads data using “Mux API” to the destination of your choice. 1. Copy the API access token and secret key for later configuration. +> Note: The Mux UI, which is described here, might change. +The full guide is available at [this link.](https://docs.mux.com/guides/system/make-api-requests) + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -78,7 +81,9 @@ For more information, read the 1. Replace the API access and secret key with the ones that you [copied above](#grab-credentials). This will ensure that this source can access your Mux resources securely. -1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). +1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). + +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) ## Run the pipeline @@ -89,7 +94,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 mux_pipeline.py + python mux_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -107,7 +112,7 @@ For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs [resources](../../general-usage/resource). -### Source `mux_source` +### Source `mux_source` This function yields resources "asset_resource" and "views_resource" to load video assets and views. @@ -152,6 +157,8 @@ def views_resource( The arguments `mux_api_access_token`, `mux_api_secret_key` and `limit` are the same as described [above](#resource-assets_resource) in "asset_resource". + +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/notion.md b/docs/website/docs/dlt-ecosystem/verified-sources/notion.md index 079baf7075..358ba8c547 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/notion.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/notion.md @@ -29,6 +29,7 @@ Sources that can be loaded using this verified source are: 1. Click "New Integration" on the left and name it appropriately. 1. Finally, click on "Submit" located at the bottom of the page. + ### Add a connection to the database 1. Open the database that you want to load to the destination. @@ -39,6 +40,10 @@ Sources that can be loaded using this verified source are: 1. From the list of options, select the integration you previously created and click on "Confirm". +> Note: The Notion UI, which is described here, might change. +The full guide is available at [this link.](https://developers.notion.com/docs/authorization) + + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -77,12 +82,14 @@ For more information, read the ``` 1. Replace the value of `api_key` with the one that [you copied above](notion.md#grab-credentials). - This will ensure that your data-verified source can access your notion resources securely. + This will ensure that your data-verified source can access your Notion resources securely. 1. Next, follow the instructions in [Destinations](../destinations/duckdb) to add credentials for your chosen destination. This will ensure that your data is properly routed to its final destination. +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -92,7 +99,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 notion_pipeline.py + python notion_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -131,6 +138,8 @@ def notion_databases( It is important to note that the data is loaded in “replace” mode where the existing data is completely replaced. + +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md b/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md index 5b94bb582e..689f8f7808 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/pipedrive.md @@ -44,8 +44,8 @@ Sources and resources that can be loaded using this verified source are: 1. Select the API tab. 1. Copy your API token (to be used in the dlt configuration). -You can learn more about Pipedrive API token authentication in the docs -[here](https://pipedrive.readme.io/docs/how-to-find-the-api-token). +> Note: The Pipedrive UI, which is described here, might change. +The full guide is available at [this link.](https://pipedrive.readme.io/docs/how-to-find-the-api-token) ### Initialize the verified source @@ -88,6 +88,8 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -97,7 +99,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 pipedrive_pipeline.py + python pipedrive_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -147,8 +149,8 @@ def pipedrive_source( `pipedrive_api_key`: Authentication token for Pipedrive, configured in ".dlt/secrets.toml". -`since_timestamp`: Starting timestamp for incremental loading. By default complete history is loaded -on first run. And new data in subsequent runs. +`since_timestamp`: Starting timestamp for incremental loading. By default, complete history is loaded + on the first run. And new data in subsequent runs. > Note: Incremental loading can be enabled or disabled depending on user prefrences. @@ -221,7 +223,7 @@ entity exists. This updated state is then saved for future pipeline runs. ### Other functions -Similar to the above functions there are following: +Similar to the above functions, there are the following: `custom_fields_mapping`: Transformer function that parses and yields custom fields' mapping in order to be stored in destination by dlt. diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md b/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md index 47a2fd29e0..c819c8120b 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/salesforce.md @@ -6,7 +6,7 @@ or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian. ::: -Salesforce is a cloud platform that streamlines business operations and customer relationship +[Salesforce](https://www.salesforce.com) is a cloud platform that streamlines business operations and customer relationship management, encompassing sales, marketing, and customer service. This Salesforce `dlt` verified source and @@ -55,6 +55,10 @@ To obtain the `security_token`, follow these steps: 1. Check your email for the token sent by Salesforce. +> Note: The Salesforce UI, which is described here, might change. +The full guide is available at [this link.](https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/quickstart_oauth.htm) + + ### Initialize the verified source To get started with your data pipeline, follow these steps: @@ -102,6 +106,8 @@ For more information, read the add credentials for your chosen destination, ensuring proper routing of your data to the final destination. +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -111,7 +117,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 salesforce_pipeline.py + python salesforce_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: @@ -158,13 +164,13 @@ def sf_user() -> Iterator[Dict[str, Any]]: yield from get_records(client, "User") ``` -Besides "sf_user", the there are several resources that use replace mode for data writing to the +Besides "sf_user", there are several resources that use replace mode for data writing to the destination. | user_role() | contact() | lead() | campaign() | product_2() | pricebook_2() | pricebook_entry() | |-------------|-----------|--------|------------|-------------|---------------|-------------------| -The described functions fetch records from endpoints based on their names, e.g., user_role() +The described functions fetch records from endpoints based on their names, e.g. user_role() accesses the "user_role" endpoint. ### Resource `opportunity` (incremental loading): @@ -189,7 +195,7 @@ def opportunity( It is configured to track "SystemModstamp" field in data item returned by "get_records" and then yielded. It will store the newest "SystemModstamp" value in dlt state and make it available in "last_timestamp.last_value" on next pipeline run. -Besides "opportunity", the there are several resources that use replace mode for data writing to the +Besides "opportunity", there are several resources that use replace mode for data writing to the destination. | opportunity_line_item() | opportunity_contact_role() | account() | campaign_member() | task() | event() | @@ -202,7 +208,7 @@ opportunity_line_item() accesses the "opportunity_line_item" endpoint. ### Create your own pipeline -If you wish to create your own pipelines you can leverage source and resource methods as discussed +If you wish to create your own pipelines, you can leverage source and resource methods as discussed above. To create your data pipeline using single loading and diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md b/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md index 0d7482f7f7..ee91cdc0ea 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/shopify.md @@ -91,13 +91,15 @@ For more information, read the shop_url = "Please set me up !" # please set me up! ``` -1. Update `shop_url` with the URL of your Shopify store. For example - ”https://shop-123.myshopify.com/%E2%80%9D. +1. Update `shop_url` with the URL of your Shopify store. For example, + "https://shop-123.myshopify.com/%E2%80%9D". 1. Next, follow the [destination documentation](../../dlt-ecosystem/destinations) instructions to add credentials for your chosen destination, ensuring proper routing of your data to the final destination. +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -107,7 +109,7 @@ For more information, read the ``` 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 shopify_dlt_pipeline.py + python shopify_dlt_pipeline.py ``` 1. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command: diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md index f786761390..fd25d7818b 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/slack.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/slack.md @@ -101,6 +101,8 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md index 5f09bde826..f219d7c0be 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/sql_database.md @@ -62,7 +62,7 @@ connection_url = "connection_string = f"{drivername}://{username}:{password}@{ho Here we use `mysql` and `pymysql` dialect to set up SSL connection to a server. All information taken from the -[SQLAlchemy docs](https://docs.sqlalchemy.org/en/14/dialects/mysql.html#ssl-connections) +[SQLAlchemy docs](https://docs.sqlalchemy.org/en/14/dialects/mysql.html#ssl-connections). 1. To force SSL on the client without a client certificate you may pass the following DSN: @@ -143,10 +143,12 @@ For more information, read the > [pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/sql_database_pipeline.py) > for details. -1. Finally, follow the instructions in [Destinations](../destinations/) to add credentials for your +1. Finally, follow the instructions in [Destinations](../destinations/) to add credentials for your chosen destination. This will ensure that your data is properly routed to its final destination. -## Run the pipeline example +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + +## Run the pipeline 1. Install the necessary dependencies by running the following command: @@ -157,7 +159,7 @@ For more information, read the 1. Now the verified source can be run by using the command: ```bash - python3 sql_database_pipeline.py + python sql_database_pipeline.py ``` 1. To make sure that everything is loaded as expected, use the command: @@ -166,15 +168,15 @@ For more information, read the dlt pipeline show ``` - For example, the pipeline_name for the above pipeline example is `rfam`, you may also use any - custom name instead) + For example, the pipeline_name for the above pipeline example is `rfam`, you may also use any + custom name instead. ## Sources and resources `dlt` works on the principle of [sources](../../general-usage/source) and [resources](../../general-usage/resource). -### Source `sql_database`: +### Source `sql_database`: This function loads data from an SQL database via SQLAlchemy and auto-creates resources for each table or from a specified list. @@ -226,6 +228,7 @@ def sql_table( `write_disposition`: Can be "merge", "replace", or "append". +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md b/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md index 9c115165a4..7a9161d380 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/strapi.md @@ -90,6 +90,8 @@ For more information, read the 1. Finally, enter credentials for your chosen destination as per the [docs](../destinations/). +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + ## Run the pipeline 1. Before running the pipeline, ensure that you have installed all the necessary dependencies by @@ -102,7 +104,7 @@ For more information, read the 1. You're now ready to run the pipeline! To get started, run the following command: ```bash - python3 strapi_pipeline.py + python strapi_pipeline.py ``` > In the provided script, we've included a list with one endpoint, "athletes." Simply add any @@ -145,6 +147,8 @@ def strapi_source( `domain`: Strapi API domain name, defaults to dlt secrets. + +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md b/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md index 0abd50be88..a25768981f 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/stripe.md @@ -124,6 +124,8 @@ For more information, read the [Walkthrough: Run a pipeline](../../walkthroughs/ `dlt` works on the principle of [sources](../../general-usage/source) and [resources](../../general-usage/resource). +### Default endpoints +You can write your own pipelines to load data to a destination using this verified source. However, it is important to note is how the `ENDPOINTS` and `INCREMENTAL_ENDPOINTS` tuples are defined in `stripe_analytics/settings.py`. ```python @@ -187,13 +189,14 @@ This function loads a dictionary with calculated metrics, including MRR and Chur def metrics_resource() -> Iterable[TDataItem]: ``` - Abrevations MRR and Churn rate are as follows: - Monthly Recurring Revenue (MRR): - Measures the predictable monthly revenue from all active subscriptions. It's the sum of the monthly-normalized subscription amounts. - Churn rate: - Indicates the rate subscribers leave a service over a specific period. Calculated by dividing the number of recent cancellations by the total subscribers from 30 days ago, adjusted for new subscribers. + +## Customization ### Create your own pipeline If you wish to create your own pipelines, you can leverage source and resource methods from this