Skip to content

Commit

Permalink
Fortsetzung Annotation
Browse files Browse the repository at this point in the history
  • Loading branch information
hennyu committed Jan 24, 2024
1 parent db52a4c commit 1b97762
Showing 1 changed file with 30 additions and 16 deletions.
46 changes: 30 additions & 16 deletions data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,32 @@
collect and transcribe spoken language resources, their number is limited and thus corpora
need to be interoperable and reusable in order to improve research on themes such as
phonology, prosody, interaction, syntax, and textometry. To help researchers reach this
goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of
spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the
principle of an underlying common format, namely TEI XML as described in its specification
for spoken language use (ISO 2016). This tool enables the conversion of transcriptions
created with alignment software such as CLAN, Transcriber, Praat, or <ptr type="software"
xml:id="ELAN" target="#ELAN"/><rs type="soft.name" ref="#ELAN">ELAN</rs> as well as
common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of
a lossless pivot format. Backward conversion is possible in many cases, with limitations
inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech
tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to
textometric tools such as <ptr type="software" xml:id="TXM" target="#TXM"/><rs
type="soft.name" ref="#TXM">TXM</rs>, Le Trameur, or Iramuteq, making it suitable for
spoken language corpora editing as well as for various research purposes.</p>
goal, CORLI has designed a pair of tools: <ptr type="software" xml:id="R1"
target="#TEICORPO"/><rs type="soft.name" ref="#R1">TEICORPO</rs> to assist in the
conversion and use of spoken language corpora, and <ptr type="software" xml:id="R2"
target="#TEIMETA"/><rs type="soft.name" ref="#R2">TEIMETA</rs> for metadata purposes.
<ptr type="software" xml:id="R3" target="#TEICORPO"/><rs type="soft.name" ref="#R3"
>TEICORPO</rs> is based on the principle of an underlying common format, namely TEI XML
as described in its specification for spoken language use (ISO 2016). This tool enables
the conversion of transcriptions created with alignment software such as <ptr
type="software" xml:id="R4" target="#CLAN"/><rs type="soft.name" ref="#R4">CLAN</rs>,
<ptr type="software" xml:id="R5" target="#Transcriber"/><rs type="soft.name" ref="#R5"
>Transcriber</rs>, <ptr type="software" xml:id="R6" target="#Praat"/><rs
type="soft.name" ref="#R6">Praat</rs>, or <ptr type="software" xml:id="R7"
target="#ELAN"/><rs type="soft.name" ref="#R7">ELAN</rs> as well as common file formats
(CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot
format. Backward conversion is possible in many cases, with limitations inherent in the
destination target format. <ptr type="software" xml:id="R8" target="#TEICORPO"/><rs
type="soft.name" ref="#R8">TEICORPO</rs> can run the <ptr type="software" xml:id="R9"
target="#Treetagger"/><rs type="soft.name" ref="#R9">Treetagger part-of-speech
tagger</rs> and the <ptr type="software" xml:id="R10" target="#Stanford-CoreNLP"/><rs
type="soft.name" ref="#R10">Stanford CoreNLP</rs> tools on TEI files and can export the
resulting files to textometric tools such as <ptr type="software" xml:id="R11"
target="#TXM"/><rs type="soft.name" ref="#R11">TXM</rs>, <ptr type="software"
xml:id="R12" target="#Le-Trameur"/><rs type="soft.name" ref="#R12">Le Trameur</rs>, or
<ptr type="software" xml:id="R13" target="#Iramuteq"/><rs type="soft.name" ref="#R13"
>Iramuteq</rs>, making it suitable for spoken language corpora editing as well as for
various research purposes.</p>
</div>
</front>
<body>
Expand Down Expand Up @@ -170,7 +183,8 @@
limited coverage, even if the corpora involved are very large.</p>
</div>
<div xml:id="teicorpo">
<head>The TEICORPO Approach</head>
<head>The <ptr type="software" xml:id="R14" target="#TEICORPO"/><rs type="soft.name"
ref="#R14">TEICORPO</rs> Approach</head>
<p>The goal of the CORLI consortium is to make it easier to deposit, share, and reuse
data. With this goal in mind, CORLI has always promoted the use of open public
repositories and open formats. Our policy is to advocate for the use of a common single
Expand Down Expand Up @@ -1427,8 +1441,8 @@
CLARIN.</title> In Selected Papers from the CLARIN Annual Conference 2016, edited by
<editor>Lars Borin</editor>, <biblScope unit="page">113–30</biblScope>. Linköping
Electronic Conference Proceedings 136. Linköping, Sweden: LiU Electronic Press. <ptr
target="https://ep.liu.se/ecp/article.asp?issue=136&amp;article=009&amp;volume=0"/>; <ptr
target="https://ep.liu.se/ecp/136/009/ecp17136009.pdf"/>.</bibl>
target="https://ep.liu.se/ecp/article.asp?issue=136&amp;article=009&amp;volume=0"/>;
<ptr target="https://ep.liu.se/ecp/136/009/ecp17136009.pdf"/>.</bibl>
<bibl xml:id="schmidts2010"><author>Schmidt, Thomas</author>, and <author>Wilfried
Schütte</author>. <date>2010</date>. <title level="a">FOLKER: An Annotation Tool for
Efficient Transcription of Natural, Multi-party Interaction.</title> In Proceedings of
Expand Down

0 comments on commit 1b97762

Please sign in to comment.