Skip to content
Maarten Janssen edited this page Aug 17, 2020 · 1 revision

FoLiA (.xml) is a file-format developed at Radboud University. It is an XML-based format for transcribing linguistic data, in a largely standoff-like fashion.

Import

folia2teitok.pl

Options

Command line options of the tool:

  • debug: debugging mode
  • output: name of the output file - if empty STDOUT
  • morerev: More revision statements
  • file: filename of the input
  • nospace: convert to whitespace-sensitive XML

Specifications

The script integrates the various FoLiA parts into a single token-based representation:

  • known metadata are converted to the teiHeader
  • w are converted into tok, with the t as its innerXML
  • a space is introduced after each token that does not have a space="no" - as a c element. By default output is done as non-whitespace-sensitive XML
  • corrections are converted @norm attributes. If the correction introduces tokens, a mtok is introduced
  • dependencies are converted to @head and @deprel

Known issues

For the moment, there are several (potential) parts of FoLiA files that are thrown away in the conversion - either because there is no obvious target element for them, or because the conversion has not been completed yet. These include:

  • entities
  • phonology
  • chunking
  • timing
  • semroles
  • statements
  • metric
  • foreign-data
  • suggestion
  • morphology
  • syntax

Export

No export has been provided as of yet, nor is it planned for the near future. FoLiA provides tools to convert to and from other formats for which there is a conversion tool.

Clone this wiki locally