Skip to content

Commit

Permalink
Update examples.
Browse files Browse the repository at this point in the history
  • Loading branch information
sebpuetz authored May 27, 2019
1 parent 97b1d3b commit 912b413
Showing 1 changed file with 24 additions and 5 deletions.
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ cargo install --git https://github.com/sebpuetz/lumberjack
* Convert treebank in NEGRA export 4 format to bracketed TueBa V2 format
```bash
lumberjack-conversion --input_file treebank.negra --input_format negra \
--output_format tueba --output_file treebank.filtered
--output_format tueba --output_file treebank.tueba --projectivize
```
* Retain only root node, NPs and VPs and print to simple bracketed format:
* Retain only root node, `NP`s and `PP`s and print to simple bracketed format:
```bash
echo "NP PP" > filter_set.txt
lumberjack-conversion --input_file treebank.simple --input_format simple \
Expand All @@ -34,6 +34,25 @@ parent tags of terminals as features.
lumberjack-conversion --input_file treebank.simple --input_format simple\
--output_format conllx --output_file treebank.conll --parent
```
* Modifications in the following order:

1. Reattach all terminals with part-of-speech starting with `$` to the
root node
2. Remove all nonterminals except the root, `S`s, `NP`s, `PP`s and `VP`s
3. Assign unique identifiers based on the closest `S` to terminals
4. Insert nodes with label `label` above terminals that aren't dominated by `NP` or `PP`
5. Annotate label of parent node on terminals.
6. Print to CONLLX format with annotations.

```bash
echo "S VP NP PP" > filter_set.txt
echo "NP PP" > insert_set.txt
echo "S" > id_set.txt
lumberjack-conversion --input_file treebank.simple --input_format simple\
--output_format conllx --insertion_set insert_set.txt \
--insertion_label label --id_set id_set.txt --reattach $\
--parent parent --output_file treebank.conllx
```

## Usage as rust library:
* read and projectivize trees from NEGRA format and print to simple
Expand All @@ -57,11 +76,11 @@ fn print_negra(path: &str) {
* filter non-terminal nodes from trees in a treebank and print to
simple bracketed format:
```rust
use lumberjack::{io::PTBFormat, TreeOps, util::LabelSet};
use lumberjack::{io::PTBFormat, Tree, TreeOps, util::LabelSet};

fn filter_nodes(iter: impl Iterator<Item=Tree>, set: LabelSet) {
for mut tree in iter {
tree.filter_nonterminals(&set).unwrap();
tree.filter_nonterminals(|tree, nt| set.matches(tree[nt].label())).unwrap();
println!("{}", PTBFormat::Simple.tree_to_string(&tree).unwrap());
}
}
Expand All @@ -71,7 +90,7 @@ encoded in the features field
```rust
use conllx::graph::Sentence;
use lumberjack::io::Encode;
use lumberjack::TreeOps;
use lumberjack::{Tree, TreeOps, UnaryChains};

fn to_conllx(iter: impl Iterator<Item=Tree>) {
for mut tree in iter {
Expand Down

0 comments on commit 912b413

Please sign in to comment.