Skip to content

Commit

Permalink
docs(datahub source): Add urn exclusions to docs (datahub-project#11568)
Browse files Browse the repository at this point in the history
  • Loading branch information
eboneil authored Oct 9, 2024
1 parent e535d72 commit 732543f
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions metadata-ingestion/docs/sources/datahub/datahub_pre.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,27 @@ and [mce-consumer](../../../../metadata-jobs/mce-consumer-job/README.md))
- Increase the number of gms pods to add redundancy and increase resilience to node evictions
* If you are migrating large amounts of data, consider increasing elasticsearch's
thread count via the `ELASTICSEARCH_THREAD_COUNT` environment variable.

#### Exclusions
You will likely want to exclude some urn types from your ingestion, as they contain instance-specific
metadata, such as settings, roles, policies, ingestion sources, and ingestion runs. For example, you
will likely want to start with this:

```yaml
source:
config:
urn_pattern: # URN pattern to ignore/include in the ingestion
deny:
# Ignores all datahub metadata where the urn matches the regex
- ^urn:li:role.* # Only exclude if you do not want to ingest roles
- ^urn:li:dataHubRole.* # Only exclude if you do not want to ingest roles
- ^urn:li:dataHubPolicy.* # Only exclude if you do not want to ingest policies
- ^urn:li:dataHubIngestionSource.* # Only exclude if you do not want to ingest ingestion sources
- ^urn:li:dataHubSecret.*
- ^urn:li:dataHubExecutionRequest.*
- ^urn:li:dataHubAccessToken.*
- ^urn:li:dataHubUpgrade.*
- ^urn:li:inviteToken.*
- ^urn:li:globalSettings.*
- ^urn:li:dataHubStepState.*
```

0 comments on commit 732543f

Please sign in to comment.