Skip to content
kim gerdes edited this page Nov 16, 2015 · 13 revisions

This page is a short guide to the Arborator, a collaborative dependency annotation software.

It contains information for

  • annotators
  • validators (the referees between different annotations that decide on the final annotation to be included in the corpus)
  • arborator administrators, who assign the annoations and validation tasks
  • and site administrators (for installation of the software on an apache web server)
The software is designed for use in Mozilla Firefox. Google Chrome works partly. Other browsers are not supported. There is also a simple viewing version of Arborator, useful for integrating dependency graphs in your webpage.

Table of Contents

the editor

Most functions are accessible by keyboard and mouse:

shortcuts

  • TAB - shows next sentence. if not yet loaded, loads it. loops through sentences
  • SPACE - edit next word. if no word handeled yet, opens first word, loops through words
  • BACKSPACE - edit previous word. if no word handeled yet, opens last word, loops through words
  • RETURN - accepts the changed head, function, or category. if SHIFT is held down and a new head is proposed, the existing link is preserved
  • ESC - stops editing, closes open menus
  • CURSOR UP or DOWN - if editing: open function menu, if moreover SHIFT or CTRL pressed: open category menu
  • "c" - if editing: open category menu
  • "f" - if editing open function menu
  • in open menus: first letter of function or category loops through the funcs/cats with that letter
  • ctrl-s - save

mouse

  • pull governor on dependent token creates link, function menu opens. if shift is held down, existing link is preserved
  • click on function name or category opens corresponding menus
  • doubleclick on token: open all feature table

tree status

each annotator can assign a status to each tree:

  • no tree if no tree has been saved by the user yet. clicking on the "no tree" label saves the currently visible tree
  • todo
  • ok
  • problem
The status can be seen by other users on the project page.

export

the graph of a tree can be exported in

  • pdf
  • ps
  • odg
  • jpg
  • png
  • tiff
  • svg

compare mode

  • validators and administrators have access to the compare mode (if there are two different annotation of the sentence):
    • the fruit icon opens a list of the annotations for the sentence.
    • different annotations of the given sentence can be checked
    • clicking on the fruit icon again gives a graphical representation of the different annotations of the sentence.
    • wrong links in the unified tree can be erased and the corrected tree can be saved.

connecting and splitting

Administrators see a special button, a chain symbol, at the far right of the buttons for each sentence. This allows for

  • connecting the sentence with the following sentence into one line. A simple click on the chain button opens an approval dialog.
  • splitting the sentence at any position into two separate lines. To do that, the administrator selects the word after which he or she wants to split the sentence. The word is then color highlighted. Then, a Ctrl-click on the chain button, opens an approval dialog.

the project page

  • shows
    • the assigned texts
      • the annotator can change the text status by clicking on the default status (todo)
      • the default status tags are todo, ok, and problem
      • The status can be seen by administrators on the project page.
    • all the texts of the database and their annotators and validators
    • administrators additionally see a list of all the assignments per user
  • allows searching for words in the database (by typing simply the word) as well searching for other features (attribute-value pairs separated by a colon without spaces, example: lemma:pouvoir). It is not possible to search for functions. You can mix words and other feature searches: lemma:pouvoir journée
    • supports "google type" research (quotes around multiple words to search for the whole string including spaces, AND (default), OR, *, ...).
    • shows snippets and gives direct access to the sentences and their annotations (only for simple word search, feature search does not provide snippets).
  • allows for different types of conll export:
    • by assignment: the text is exported with all its sentences, a file per user. i.e. all sentences of a text are exported even if the annotator did not save them as his or her own. in that case, the parser's trees are taken
    • by existing trees: all saved trees for a given text are exported, a file per user, non-annotated sentences of a text do not appear in the file.
  • administrators can
    • assign annotators and validators: to assign a person,
      • click on "assign", choose the person,
        • click on "+" to assign a simple annotator (can't see the other annotators' annotations)
        • click on "✓" to assign a validator (can see the other annotators' annotations)
    • erase texts

query

The Arborator also allows for simple queries in the database. In the field in the above right corner of the project page, google-like queries (space separated query terms) can be carried out. Hit "Enter" and the system will give back a list with snippets of sentences containing all the query terms.

Additionally, feature queries are possible: colon-separated attribute value pairs can be included like for example cat:N. "func" or "function" can be used to access function names. This is valid query: 'agréable cat:I func:para_disfl'

Note, however that double categories or double function queries will look for a word that has all the features. For example the query cat:N lemma:red will look for words that have 'both' the features cat=N and lemma=red.

account editing

  • On the bottom of each page are links to logout and to edit the user account. The page provides
    • information on passed site access
    • change of password and real name
  • Administrators additionally have a link for user administration which allows
    • To create or invite new users
    • To edit or delete users
    • To edit the main config file and default user

User Administration

  • Users can be simple annotators or validators
  • Annotators can only see their own trees and the trees by users specified in the project.cfg (generally the trees by parser)
  • Validators can see all the trees on the texts for which they are assigned as validators and can use the compare tools.
  • Admins are declared in the user admin pages. They should obtain the "Admin Level" of 1 (only the site admin should have 3). Admins can
    • attribute annotation validation tasks and
    • see all the trees
    • upload conll files
    • erase texts
    • export texts as conll
Be careful when editing user.ini files by hand: Don't leave temporary files in the user folder!

site administration and installation

The software is written in Python (server side) and Javascript (client side). It is released under the GNU Affero GPL v3 licence. The latest version can be obtained on the Arborator's launchpad page.

To install the Arborator, the whole source dump has to be unzipped to a folder on an apache server.

Arborator uses SQLite, and needs this to be installed just as the corresponding python sqlite module (standard in recent versions of Python). The tree transformation for the import options also use two non-standard modules: nltk.featstruct (you'll have to install nltk, on ubuntu, that's done with sudo apt-get install python-nltk) and jellyfish (for fuzzy matching when importing, install by typing sudo pip install jellyfish). All other used python modules are standard: difflib, generators, glob, hashlib, json, optparse, os, random, re, shutil, sqlite3, subprocess, sys, time, traceback, urllib, xml.dom

The Arborator includes various open source scripts and software:

  • Javascript tools
    • JQuery
    • JQueryUI
    • Raphael
    • JQuery.fileupload
  • Python tools
    • logintools
  • Java tools
    • Batik
    • svg2office

making Arborator run locally on an ubuntu machine for test purposes

install apache

either the whole lamp (if you want to use php or mysql):

 sudo apt-get install tasksel
 sudo tasksel install lamp-server

or simply apache alone:

 sudo apt-get install apache2

try to visit http://localhost in your browser

if it doesn't connect try

 sudo /etc/init.d/apache2 restart

linking the folder

suppose you downloaded arborator into /home/me/arborator (i.e. the file index.cgi for example is in this folder)

 sudo ln -s /home/me/arborator /var/www/arborator

make the folder accessible and writable for everyone (not a good idea for a public server!):

 sudo -R a+rw /home/me/arborator

check whether all .cgi files are executable. if not:

 sudo find /home/arborator -name "*.cgi" -exec chmod +x {} \;

apache configuration

now look into /etc/apache2/sites-enabled/ there should be the default server: 000-default

open this file with your favorite editor, for example with dolphin



the information about the /var/www directory should look like this:

 <Directory /var/www/>
 		Options Indexes FollowSymLinks MultiViews
 		Options +FollowSymLinks 
 		Options +ExecCGI
 		AddHandler cgi-script .cgi
 		AllowOverride None
 		Order allow,deny
 		allow from all
 	</directory>

change it, save it, restart apache:

 sudo /etc/init.d/apache2 restart

now going on http://localhost/arborator in your browser should show the arborator start page

creating a new project

  • create a folder with the name of the project in the projects directory
  • copy project.cfg from an example project and edit it:
    • this includes giving the list of categories and functions in separate files. In the editor, functions and categories not in the list will be shown in grey and cannot be assigned.
    • each function and each category can be followed by a tab and a json/css description of how the arrows and categories should look like respectively
  • make all new folders world read and writable: the projects folder, the export/cache folder, the user folder, and all their subfolders should be world read-writable. If you edit anything in the users folder, don't leave temporary folders (...ini~).
  • to create a new database, go to the root directory of the whole system (arborator) and run:
 python createDB.py name_of_project
  • an image file name_of-the_project.png can be placed in the project folder. this will be shown on the start page.
  • you should not have to change anything outside of the projects folder.
    • only exception: you may want to add (rhapsodie xml or conll) files into the corpus folder, instead of manually uploading each file individually.

entering parser data

In a usual setting, the data is automatically preparsed and only corrected by the annotators. CoNLL (Malt, 10 or 14) files as well as Rhapsodie XML files can be uploaded into the database from the project page. For the moment, all existing annotations on texts of the same name are erased when uploading a new file!!!

  • you can also enter the CoNLL or XML files manually into the database (using a python script) instead of clicking on each link on the site:
look at the last lines of treebankfiles.py. the two essential lines are: sql = SQL(name_of_project) and sql.enterRhapsodie("corpus/xml/example.xml") for xml and sql.enterConll("corpus/conll/example.conll10") for CoNLL (Malt, CoNLL 10 and CoNLL 14 are accepted).
  • If users get their own login by signing up, they have to log on to confirm. Only then, the user is added to the database (and thus only then the user can be assigned an annotation task). In case of automatic creation of users (or manual creation by adding ini files), click once on User Administration / edit users so that all users are also included in the database.


further tweaks

The database is a file called arborator.db.sqlite. It's located in the project folder.

  • before doing any works on the databases, warn potential annotators by changing the name of the file xsitemessage.html into sitemessage.html
  • make a backup copy of your database (simply copy the file somewhere else)

bulk correction

the distribution contains a script called bulkCorrectDatabase.py

it allows to run over the whole database and correct features coherently. however, it's very slow. usually, a direct access to the database is faster.

  • the function evoked in is bulkcorrectDB.
    • it can be called with a list of treeids: bulkcorrectDB("Rhapsodie", [9795])
    • or without it: bulkcorrectDB("Rhapsodie")

TODO

  • non destructive upload: integrate users's annotation without deleting existing annotations, even if tokens are slightly different (diff)
  • mode comparaison: referee, cohen's kappa
  • importation of a selection of features from the rhapsodie xml format.
  • special save button for validators in order to keep their original annotation if they are also annotators.
  • graphical editor:
    • little problems with the undo manager. e.g. if menu opens and the same func/cat is chosen, don't register as dirty, something is wrong after saving.undo/redo accesssibility is different from dirty/clean: after save, undo/redo should remain accessible (ok)
  • check the simple graphical viewer:
    • put ad hoc colors back in again (currently: project's colors)
  • upload page: make trash file work, make rhapsodie xml work
  • make tiny and fast js compilation
  • redesign: make project choice cookie based (and not form based)
Clone this wiki locally