Skip to content

Import ELAN files into R as data.frames

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

relan-package/rELAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rELAN

Overview

rELAN provides a tool to import ELAN files (.eaf), which are generated by the annotation software ELAN1, directly into R as data.frames.

Installation

You can install rELAN with:

# install.packages("devtools")
devtools::install_github("relan-package/rELAN")

Usage

Depending on your working directory, the first argument is the file name, or path and file name, as a string. wide_format = TRUE returns a wide data.frame where each tier has its own column and rows are merged, so ANNOTATION_VALUEs replace NAs. This will also have fewer data than the original extracted data.frame.

library(rELAN)

frog_story_annotations <- extract_annotations("ELAN_files/frog_story.eaf")

pear_story_annotations <- extract_annotations("ELAN_files/pear_story.eaf", wide_format = TRUE)

Why rELAN / more Information

So far, one of the most common ways to import the annotation data into R, was by a two step process. First, you needed to use ELAN’s function to export the ELAN file, which is written in XML, as a tab-delimited text, for instance. This exported file could then be imported into R as a data.frame. Thus, using rELAN has three advantages:

  1. The import is a single step, which is more economical in general.
  2. If you need to add, change, or delete annotations, you only need to modify the ELAN file and import it into R again instead of changing the ELAN file, the tab-delimited file, and importing it into R.
  3. Importing with rELAN delivers you every information of the ELAN file concerning the annotations. The tab-delimited text file only contains limited data relating to the annotations.

The default of extract_annotations() produces a long data.frame with all data relating to the annotations. However, you can get a data.frame with fewer data, where each tier has its own column, by using the argument wide_format = TRUE.

While there are other packages available in R and Python for working with ELAN files directly, rELAN stands out for its ability to calculate the time values from reference annotations.

Reference

1 ELAN (Version 6.7) [Computer software]. 2023. Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. Retrieved from https://archive.mpi.nl/tla/elan