-
Notifications
You must be signed in to change notification settings - Fork 0
FoLiA
Maarten Janssen edited this page Aug 17, 2020
·
1 revision
FoLiA (.xml) is a file-format developed at Radboud University. It is an XML-based format for transcribing linguistic data, in a largely standoff-like fashion.
folia2teitok.pl
Command line options of the tool:
- debug: debugging mode
- output: name of the output file - if empty STDOUT
- morerev: More revision statements
- file: filename of the input
- nospace: convert to whitespace-sensitive XML
The script integrates the various FoLiA parts into a single token-based representation:
- known metadata are converted to the teiHeader
-
w
are converted intotok
, with thet
as its innerXML - a space is introduced after each token that does not have a space="no" - as a
c
element. By default output is done as non-whitespace-sensitive XML - corrections are converted
@norm
attributes. If the correction introduces tokens, amtok
is introduced - dependencies are converted to
@head
and@deprel
For the moment, there are several (potential) parts of FoLiA files that are thrown away in the conversion - either because there is no obvious target element for them, or because the conversion has not been completed yet. These include:
- entities
- phonology
- chunking
- timing
- semroles
- statements
- metric
- foreign-data
- suggestion
- morphology
- syntax
No export has been provided as of yet, nor is it planned for the near future. FoLiA provides tools to convert to and from other formats for which there is a conversion tool.