-
Notifications
You must be signed in to change notification settings - Fork 0
CHAT
CHAT (.cha) is a file-format used by CHILDES / TalkBank. It is a text-based format for transcribing spoken data, which encodes some metadata, speakers turns and their aligment to an audio/video file. A good description of the format can be found here. The format itself is loose about the encoding of the spoken material itself, although it provides several standards, which themselves can be specified in the header using a @Options
line.
chat2teitok.pl
Command line options of the tool:
- debug: debugging mode
- output: name of the output file - if empty STDOUT
- morerev: More revision statements
- file: filename of the input
- options: format of the transcription (overruling what is in @Options)
When using heritage as a format, the tool will not output the transcription as-is, but rather convert symbols to their corresponding TEI code:
- (be)cause =>
<ex>be</ex>cause
- [//] =>
<gap type="long"/>
- [/] =>
<gap type="short"/>
- =>
<del reason="reformulation">text</del>
- &trunca =>
<del reason="truncation">trunca</del>
- word@code =>
<sic n="code">word</sic>
The following mapping of metadata is done:
- @Comment => /TEI/teiHeader/notesStmt/note
- @Title => /TEI/teiHeader/fileDesc/titleStmt/title
- @Date => /TEI/teiHeader/fileDesc/titleStmt/date
- @Languages => /TEI/teiHeader/profileDesc/langUsage/language/@ident
- @Language => /TEI/teiHeader/profileDesc/langUsage/language/@ident
- @Location => /TEI/teiHeader/recordingStmt/recording/location
- @Transcriber => /TEI/teiHeader/fileDesc/titleStmt/respStmt/resp[@n="Transcription"]
- @Creator => /TEI/teiHeader/fileDesc/titleStmt/respStmt/resp[@n="Creator"]
- @Types => /TEI/teiHeader/profileDesc/textClass/keywords/term[@type="genre"]
- @Subject => /TEI/teiHeader/profileDesc/textClass/keywords/term[@type="genre"]
- @Publisher => /TEI/teiHeader/fileDesc/publicationStmt/publisher
- @PID => /TEI/teiHeader/fileDesc/publicationStmt/idno[@type="handle"]
The newer CA format, which avoids symbols that might conflict with XML, has not been implemented for conversion yet. Since the new format contains symbols that XML does not like, conversion are sanitised during the conversion
No export has been provided as of yet.