Skip to content
Maarten Janssen edited this page Aug 23, 2020 · 7 revisions

CHAT (.cha) is a file-format used by CHILDES / TalkBank. It is a text-based format for transcribing spoken data, which encodes some metadata, speakers turns and their aligment to an audio/video file. A good description of the format can be found here. The format itself is loose about the encoding of the spoken material itself, although it provides several standards, which themselves can be specified in the header using a @Options line.

Import

chat2teitok.pl

Options

Command line options of the tool:

  • debug: debugging mode
  • output: name of the output file - if empty STDOUT
  • morerev: More revision statements
  • file: filename of the input
  • options: format of the transcription (overruling what is in @Options)

Specifications

When using heritage as a format, the tool will not output the transcription as-is, but rather convert symbols to their corresponding TEI code:

  • (be)cause => <ex>be</ex>cause
  • [//] => <gap type="long"/>
  • [/] => <gap type="short"/>
  • => <del reason="reformulation">text</del>
  • &trunca => <del reason="truncation">trunca</del>
  • word@code => <sic n="code">word</sic>

The following mapping of metadata is done:

  • @Comment => /TEI/teiHeader/notesStmt/note
  • @Title => /TEI/teiHeader/fileDesc/titleStmt/title
  • @Date => /TEI/teiHeader/fileDesc/titleStmt/date
  • @Languages => /TEI/teiHeader/profileDesc/langUsage/language/@ident
  • @Language => /TEI/teiHeader/profileDesc/langUsage/language/@ident
  • @Location => /TEI/teiHeader/recordingStmt/recording/location
  • @Transcriber => /TEI/teiHeader/fileDesc/titleStmt/respStmt/resp[@n="Transcription"]
  • @Creator => /TEI/teiHeader/fileDesc/titleStmt/respStmt/resp[@n="Creator"]
  • @Types => /TEI/teiHeader/profileDesc/textClass/keywords/term[@type="genre"]
  • @Subject => /TEI/teiHeader/profileDesc/textClass/keywords/term[@type="genre"]
  • @Publisher => /TEI/teiHeader/fileDesc/publicationStmt/publisher
  • @PID => /TEI/teiHeader/fileDesc/publicationStmt/idno[@type="handle"]

Known issues

The newer CA format, which avoids symbols that might conflict with XML, has not been implemented for conversion yet. Since the new format contains symbols that XML does not like, conversion are sanitised during the conversion

Export

No export has been provided as of yet.

Clone this wiki locally