Sparv plugin for SBX specific export of metadata
- Sparv pipeline
- Python 3.8 or newer
Option 1: Installation from GitHub with pipx:
pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-sbx-metadata/archive/latest.tar.gz
Option 2: Manual download of plugin and installation in your sparv-pipeline virtual environment:
source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-sbx-metadata directory]
metadata:
id: test
language: swe
name:
eng: Test corpus
swe: Testkorpus
short_description:
eng: A small test corpus for testing purposes
swe: En liten testkorpus för att testa
description:
eng: This is an optional longer description that may contain HTML and multiple sentences.
eng: Detta är en längre beskrivning som kan innehålla HTML och flera meningar.
korp:
modes:
- default # default setting
sbx_metadata:
xml_export: scrambled # scrambled/original/false
stats_export: true # true/false
korp: true # true/false
## 'downloads' and 'interface' are not needed for standard corpora
# downloads:
# - url: http://spraakbanken.gu.se/lb/resurser/meningsmangder/gp-test.xml.bz2
# type: corpus
# format: XML
# info: this file contains a scrambled version of the corpus
# licence: CC BY 4.0
# restriction: attribution
# - url: https://svn.spraakdata.gu.se/sb-arkiv/pub/frekvens/gp-test.csv
# type: token frequencies
# format: XML
# info: ""
# licence: CC BY 4.0
# restriction: attribution
# interface:
# - access: http://spraakbanken.gu.se/korp/#?corpus=gp-test
# licence: CC BY 4.0
# restriction: attribution
## 'contact_info' is only needed if somebody else is the contact person for the corpus
# contact_info:
# name: Markus Forsberg
# email: [email protected]
# affiliation:
# organisation: Språkbanken
# email: [email protected]
## Other optional config values (for mor info check https://github.com/spraakbanken/metadata/blob/main/yaml_templates/corpus.yaml)
# trainingdata: false
# unlisted: false
# in_collections: []
# annotation:
# swe: ''
# eng: ''
# keywords: []
# caveats:
# swe: ''
# eng: ''
# references: []
# intended_uses:
# swe: ''
# eng: ''