Rabbid is a rapid application development environment for KorAP Corpus Analysis Platform and used in production for the creation and management of collections of textual examples in the area of discourse analysis and discourse lexicography.
Unlike KorAP, Rabbid provides a rather limited set of search operators for small, non-annotated corpora.
! This software is in its early stages and not stable yet! Use it on your own risk!
To fetch the latest version of Rabbid ...
$ git clone https://github.com/KorAP/Rabbid
$ cd Rabbid
To generate the static asset files (scripts, styles, images ...), you need NodeJS > 0.8. For processing Sass, you will need Ruby with the sass gem in addition. This will probably need administration rights.
$ npm install
$ grunt
Rabbid uses the Mojolicious framework, that expects a Perl version of at least 5.10.1. The recommended environment is based on Perlbrew with App::cpanminus.
Some perl modules are not on CPAN yet, so you need to install them from GitHub. The easiest way to do this is using App::cpanminus. This will probably need administration rights.
$ cpanm git://github.com/Akron/DBIx-Oro.git
$ cpanm git://github.com/Akron/Mojolicious-Plugin-Oro.git
$ cpanm git://github.com/Akron/Mojolicious-Plugin-Oro-Viewer.git
$ cpanm git://github.com/Akron/Mojolicious-Plugin-TagHelpers-ContentBlock.git
$ cpanm git://github.com/Akron/Mojolicious-Plugin-Localize.git
$ cpanm git://github.com/KorAP/KorAP-XML-Krill.git
Then install the dependencies as always and run the test suite.
$ SQLITE_ENABLE_FTS3_TOKENIZER=1 cpanm --installdeps .
$ perl Makefile.PL
$ make test
There is no need to install Rabbid on your system, but you have to initialize the database before you can start.
$ perl script/rabbid rabbid_init
First you may want to import the example corpus from t/example/
:
$ perl script/rabbid rabbid_import -c example -d t/example
Rabbid can be deployed like all Mojolicious apps. The easiest way is to start the built-in server:
$ perl script/rabbid daemon
Rabbid will then be available at localhost:3000
in your browser.
The input format of Rabbid is a simplified XHTML document.
The <head />
contains the <title />
of the document,
further meta data fields like doc_id
are given as <meta />
elements. In the body only <p />
elements are of relevance -
they divide the text body into snippets used by Rabbid.
Optional <span />
elements can be used to subdivide long paragraphs
in shorter snippets.
Optional pagebreaks may be given in the form of empty
<br class="pb" data-after="1" />
elements,
with the data-after
attribute
telling the page number following the element.
An example document may look like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Example 1</title>
<meta name="doc_id" content="5" />
<meta name="author" content="John Doe" />
<meta name="year" content="1919" />
</head>
<body>
<p>This is an example text.</p>
<p>Each paragraph resembles one snippet in Rabbid's view.</p>
<p>
<span>Long paragraphs can be subdivided.</span>
<span>By using the span element, each span makes one snippet.</span>
</p>
<p>The End.</p>
</body>
</html>
The corpus configuration is still experimental. In the configuration file
(e.g. rabbid.conf
), the corpus can be defined with the entry name Corpora
with a short handle and an entry called schema
that lists all metadata
categories to be displayed by the document browser.
Corpora => {
-default => 'example',
example => {
schema => [
author => 'TEXT',
title => 'TEXT',
year => 'INTEGER',
domain => 'TEXT',
genre => 'TEXT',
polDir => 'TEXT',
file => 'TEXT'
]
}
}
The -default
entry can denote a default handle.
Localization of metadata categories can be done in the file rabbid.dict
.
Please consult the documentation for
Mojolicious::Plugin::Localize.
To convert documents to Rabbidml, see
$ perl script/rabbid rabbid_convert
Currently supported input formats include I5 and some Gutenberg Project conventions.
New versions of DBD::SQLite
do not include support
for fulltext search tokenizers by default.
To compile SQLite with support, use
$ SQLITE_ENABLE_FTS3_TOKENIZER=1 cpanm DBD::SQLite --force
ALERTIFY.js is released under the terms of the MIT License. Almond is released under the terms of the BSD License. Jasmine is released under the terms of the MIT License. RequireJS is released under the terms of the BSD License. Font Awesome by Dave Gandy is released under the terms of the SIL OFL 1.1. The Example Corpus is released under the Project Gutenberg License: "This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net"
Copyright (C) 2015-2017, IDS Mannheim
Author: Nils Diewald,
Ruth Maria Mell
Rabbid is developed as part of the KorAP and Demokratiediskurs 1918-1925 projects at the Institute for the German Language (IDS), member of the Leibniz-Gemeinschaft.
Rabbid is free software published under the BSD-2 License.