From 5358ec8059ff0bfd5153edf09f48d650ba125f23 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 30 Nov 2023 06:36:50 +0000 Subject: [PATCH] ADded perosnio documentation. --- .../verified-sources/personio.md | 187 ++++++++++++++++++ docs/website/sidebars.js | 1 + 2 files changed, 188 insertions(+) create mode 100644 docs/website/docs/dlt-ecosystem/verified-sources/personio.md diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/personio.md b/docs/website/docs/dlt-ecosystem/verified-sources/personio.md new file mode 100644 index 0000000000..db3aa1f4e2 --- /dev/null +++ b/docs/website/docs/dlt-ecosystem/verified-sources/personio.md @@ -0,0 +1,187 @@ +# Personio + +:::info Need help deploying these sources, or figuring out how to run them in your data stack? + +[Join our Slack community](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA) +or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian. +::: + +Personio is a human resources management software that helps businesses streamline HR processes, +including recruitment, employee data management, and payroll, in one platform. + +Our Personio verified source loads data using Perosnio API to your preferred +[destination](../destinations). + +:::tip You can check out our pipeline example +[here](https://github.com/dlt-hub/verified-sources/blob/master/sources/personio_pipeline.py). ::: + +Resources that can be loaded using this verified source are: + +| Name | Description | +|-------------|------------------------------------------------------------------------------------------| +| employees | Retrieves company employees details. (Employees list, absense_entitlement, cost_centers) | +| absences | Retrieves list of various types of employee absences | +| attendances | Retrieves attendance records for each employee | + +## Setup Guide + +### Grab credentials + +To load data from Personio, you need to API credentials, `client_id` and `client_secret`: + +1. Sign in to your Personio account, and ensure that your user account has API access rights. +1. Navigate to Settings > Integrations > API credentials. +1. Click on "Generate new credentials." +1. Assign necessary permissions to credentials, i.e. read access. + +### Initialize the verified source + +To get started with your data pipeline, follow these steps: + +1. Enter the following command: + + ```bash + dlt init personio duckdb + ``` + + [This command](../../reference/command-line-interface) will initialize + [the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/personio_pipeline.py) + with Personio as the [source](../../general-usage/source) and [duckdb](../destinations/duckdb.md) + as the [destination](../destinations). + +1. If you'd like to use a different destination, simply replace `duckdb` with the name of your + preferred [destination](../destinations). + +1. After running this command, a new directory will be created with the necessary files and + configuration settings to get started. + +For more information, read the +[Walkthrough: Add a verified source.](../../walkthroughs/add-a-verified-source) + +### Add credentials + +1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive + information securely, like access tokens. Keep this file safe. Here's its format for service + account authentication: + + ```toml + # Put your secret values and credentials here + # Note: Do not share this file and do not push it to GitHub! + [sources.personio] + client_id = "papi-********-****-****-****-************" # please set me up! + client_secret = "papi-************************************************" # please set me up! + ``` + +1. Replace the value of `client_id` and `client_secret` with the one that + [you copied above](#grab-credentials). This will ensure that your data-verified source can access + your Personio API resources securely. + +1. Next, follow the instructions in [Destinations](../destinations/duckdb) to add credentials for + your chosen destination. This will ensure that your data is properly routed to its final + destination. + +For more information, read the [General Usage: Credentials.](../../general-usage/credentials) + +## Run the pipeline + +1. Before running the pipeline, ensure that you have installed all the necessary dependencies by + running the command: + ```bash + pip install -r requirements.txt + ``` +1. You're now ready to run the pipeline! To get started, run the following command: + ```bash + python personio_pipeline.py + ``` +1. Once the pipeline has finished running, you can verify that everything loaded correctly by using + the following command: + ```bash + dlt pipeline show + ``` + For example, the `pipeline_name` for the above pipeline example is `personio`, you may also use + any custom name instead. + +For more information, read the [Walkthrough: Run a pipeline.](../../walkthroughs/run-a-pipeline) + +## Sources and resources + +`dlt` works on the principle of [sources](../../general-usage/source) and +[resources](../../general-usage/resource). + +### Source `personio_source` + +This function initializes class `PersonioAPI` in "personio/helpers.py" and returns data resources +like "employees", "absences", and "attendances". + +```python +@dlt.source(name="personio") +def personio_source( + client_id: str = dlt.secrets.value, + client_secret: str = dlt.secrets.value, + items_per_page: int = DEFAULT_ITEMS_PER_PAGE, +) -> Iterable[DltResource]: +``` + +`client_id`: Generated ID for API access. + +`client_secret`: Generated secret for API access. + +`items_per_page`: Maximum number of items per page, defaults to 200. + +### Resource `employees` + +This resource retrieves data on all the employees in a company. + +```python + @dlt.resource(primary_key="id", write_disposition="merge") + def employees( + updated_at: dlt.sources.incremental[ + pendulum.DateTime + ] = dlt.sources.incremental( + "last_modified_at", initial_value=None, allow_external_schedulers=True + ), + items_per_page: int = items_per_page, + ) -> Iterable[TDataItem]: +``` + +`updated_at`: The saved state of the last 'last_modified_at' value. It is used for +[incremental loading](../../general-usage/incremental-loading). + +`items_per_page`: Maximum number of items per page, defaults to 200. + +Like the `employees` resource discussed above, other resources `absences` and `attendances` load +data from the Personio API to your preferred destination. + +## Customization + +### Create your own pipeline + +If you wish to create your own pipelines, you can leverage source and resource methods from this +verified source. + +1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows: + + ```python + pipeline = dlt.pipeline( + pipeline_name="personio", # Use a custom name if desired + destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post) + dataset_name="personio_data" # Use a custom name if desired + ) + ``` + + :::note To read more about pipeline configuration, please refer to our + [documentation](../../general-usage/pipeline). ::: + +1. To load employee data: + + ```python + load_data = personio_source().with_resources("employees") + print(pipeline.run(load_data)) + ``` + +1. To load data from all supported endpoints: + + ```python + load_data = personio_source().with_resources("employees", "absences", "attendances") + print(pipeline.run(load_data)) + ``` diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 71be6ccaa8..3d731d64f6 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -52,6 +52,7 @@ const sidebars = { 'dlt-ecosystem/verified-sources/mongodb', 'dlt-ecosystem/verified-sources/mux', 'dlt-ecosystem/verified-sources/notion', + 'dlt-ecosystem/verified-sources/personio', 'dlt-ecosystem/verified-sources/pipedrive', 'dlt-ecosystem/verified-sources/salesforce', 'dlt-ecosystem/verified-sources/shopify',