Skip to content

Commit

Permalink
Added Personio documentation. (#798)
Browse files Browse the repository at this point in the history
* ADded perosnio documentation.

* Update

* Updated tip and info.

* Updated secrets.toml

* updated for comments.

* Updated title, description and keywords.

* fix docs titles

---------

Co-authored-by: AstrakhantsevaAA <[email protected]>
  • Loading branch information
dat-a-man and AstrakhantsevaAA authored Dec 4, 2023
1 parent 7a2f9c3 commit 406a104
Show file tree
Hide file tree
Showing 2 changed files with 194 additions and 0 deletions.
193 changes: 193 additions & 0 deletions docs/website/docs/dlt-ecosystem/verified-sources/personio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
title: Personio
description: dlt verified source for Personio API
keywords: [personio api, personio verified source, personio]
---

# Personio

:::info Need help deploying these sources, or figuring out how to run them in your data stack?

[Join our Slack community](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA)
or [book a call](https://calendar.app.google/kiLhuMsWKpZUpfho6) with our support engineer Adrian.
:::

Personio is a human resources management software that helps businesses streamline HR processes,
including recruitment, employee data management, and payroll, in one platform.

Our [Personio verified](https://github.com/dlt-hub/verified-sources/blob/master/sources/personio) source loads data using Perosnio API to your preferred
[destination](../destinations).

:::tip
You can check out our pipeline example [here](https://github.com/dlt-hub/verified-sources/blob/master/sources/personio_pipeline.py).
:::

Resources that can be loaded using this verified source are:

| Name | Description |
|-------------|------------------------------------------------------------------------------------------|
| employees | Retrieves company employees details. (Employees list, absense_entitlement, cost_centers) |
| absences | Retrieves list of various types of employee absences |
| attendances | Retrieves attendance records for each employee |

## Setup Guide

### Grab credentials

To load data from Personio, you need to obtain API credentials, `client_id` and `client_secret`:

1. Sign in to your Personio account, and ensure that your user account has API access rights.
1. Navigate to Settings > Integrations > API credentials.
1. Click on "Generate new credentials."
1. Assign necessary permissions to credentials, i.e. read access.

:::info
The Personio UI, which is described here, might change. The full guide is available at this [link.](https://developer.personio.de/docs#21-employee-attendance-and-absence-endpoints)
:::

### Initialize the verified source

To get started with your data pipeline, follow these steps:

1. Enter the following command:

```bash
dlt init personio duckdb
```

[This command](../../reference/command-line-interface) will initialize
[the pipeline example](https://github.com/dlt-hub/verified-sources/blob/master/sources/personio_pipeline.py)
with Personio as the [source](../../general-usage/source) and [duckdb](../destinations/duckdb.md)
as the [destination](../destinations).

1. If you'd like to use a different destination, simply replace `duckdb` with the name of your
preferred [destination](../destinations).

1. After running this command, a new directory will be created with the necessary files and
configuration settings to get started.

For more information, read [Add a verified source.](../../walkthroughs/add-a-verified-source)

### Add credentials

1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive
information securely, like access tokens. Keep this file safe. Here's its format for service
account authentication:

```toml
# Put your secret values and credentials here
# Note: Do not share this file and do not push it to GitHub!
[sources.personio]
client_id = "papi-*****" # please set me up!
client_secret = "papi-*****" # please set me up!
```

1. Replace the value of `client_id` and `client_secret` with the one that
[you copied above](#grab-credentials). This will ensure that your data-verified source can access
your Personio API resources securely.

1. Next, follow the instructions in [Destinations](../destinations/duckdb) to add credentials for
your chosen destination. This will ensure that your data is properly routed to its final
destination.

For more information, read [Credentials](../../general-usage/credentials).

## Run the pipeline

1. Before running the pipeline, ensure that you have installed all the necessary dependencies by
running the command:
```bash
pip install -r requirements.txt
```
1. You're now ready to run the pipeline! To get started, run the following command:
```bash
python personio_pipeline.py
```
1. Once the pipeline has finished running, you can verify that everything loaded correctly by using
the following command:
```bash
dlt pipeline <pipeline_name> show
```
For example, the `pipeline_name` for the above pipeline example is `personio`, you may also use
any custom name instead.

For more information, read [Run a pipeline.](../../walkthroughs/run-a-pipeline)

## Sources and resources

`dlt` works on the principle of [sources](../../general-usage/source) and
[resources](../../general-usage/resource).

### Source `personio_source`

This `dlt` source returns data resources like "employees", "absences", and "attendances".

```python
@dlt.source(name="personio")
def personio_source(
client_id: str = dlt.secrets.value,
client_secret: str = dlt.secrets.value,
items_per_page: int = DEFAULT_ITEMS_PER_PAGE,
) -> Iterable[DltResource]:
```

`client_id`: Generated ID for API access.

`client_secret`: Generated secret for API access.

`items_per_page`: Maximum number of items per page, defaults to 200.

### Resource `employees`

This resource retrieves data on all the employees in a company.

```python
@dlt.resource(primary_key="id", write_disposition="merge")
def employees(
updated_at: dlt.sources.incremental[
pendulum.DateTime
] = dlt.sources.incremental(
"last_modified_at", initial_value=None, allow_external_schedulers=True
),
items_per_page: int = items_per_page,
) -> Iterable[TDataItem]:
```

`updated_at`: The saved state of the last 'last_modified_at' value. It is used for
[incremental loading](../../general-usage/incremental-loading).

`items_per_page`: Maximum number of items per page, defaults to 200.

Like the `employees` resource discussed above, other resources `absences` and `attendances` load
data from the Personio API to your preferred destination.

## Customization

### Create your own pipeline

If you wish to create your own pipelines, you can leverage source and resource methods from this
verified source.

1. Configure the [pipeline](../../general-usage/pipeline) by specifying the pipeline name, destination, and dataset as follows:

```python
pipeline = dlt.pipeline(
pipeline_name="personio", # Use a custom name if desired
destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post)
dataset_name="personio_data" # Use a custom name if desired
)
```

1. To load employee data:

```python
load_data = personio_source().with_resources("employees")
print(pipeline.run(load_data))
```

1. To load data from all supported endpoints:

```python
load_data = personio_source()
print(pipeline.run(load_data))
```
1 change: 1 addition & 0 deletions docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ const sidebars = {
'dlt-ecosystem/verified-sources/mongodb',
'dlt-ecosystem/verified-sources/mux',
'dlt-ecosystem/verified-sources/notion',
'dlt-ecosystem/verified-sources/personio',
'dlt-ecosystem/verified-sources/pipedrive',
'dlt-ecosystem/verified-sources/salesforce',
'dlt-ecosystem/verified-sources/shopify',
Expand Down

0 comments on commit 406a104

Please sign in to comment.