Almost the same parser as in RobMcH/cyk_parser and phiSgr(CYK_parser) but more in Object oriented fashion.
This is a simple context-free grammar parser, in Python3.
Feel free to use any piece of the code in your own projects.
- Main folder
- cfg.py main file
- README.md this document
- data
- rules.txt the phrase structure grammar rules
- normalized.json a backup of the grammar in json format, for future load
- classes
- grammar.py Implementation of the CFGGrammar class
- parser.py Implementation of the Parser class + some other useful functions
Here example of rules
S -> VP NP
VP -> V NP
NP -> D N
NP -> N PP
...
In this version of the Parser (other as in the original repos), also rules with double outputs are accepted:
V -> 'buy' | 'sell'
S -> VP NP | VP
N.B: Non terminal nodes should be listed in single quotes in the grammar file. The Parser will use this sign to distinguish terminal from non terminal nodes:
V -> 'buy'
**NOT** V -> buy
-
Grammar and Parser are now implemented as classes
-
Each class has some new methods
-
New rule file:
rules_usami.txt
from usami/pcfg -
Minor code improvement (I hope)
-
Parser
- The tree is printed with round parenthesis (for compatibility with some nltk tree tools)
- grammar_from_file and grammar_from_string were collapsed into the new method load_grammar
- Start symbol fixed as "S"
- The parser searches (after the CYK-Algorithm) if there are alternative derivations. If you want only
derivations starting from 'S', you can pass the bool param.
only_s
to the method.to_tree
-
Grammar class
- Deleted option to give a single rule via string input I think it makes things more complicated, and one can still test single rules using the text file
- Possibility to load previously normalized grammar from json
-
work in progress: Draw Tree as svg, see below:
- Added ASCII art - for fun ;)
- Grammar from file, sentence (at the moment, only a single sentence) from file:
python3 cfg.py
- Grammar from file, sentence from input (stdin):
python3 cfg.py
- Default grammar ("data/rules_usami.txt") and sentence from input
python3 cfg.py
The Parser has several boolean parameters:
- output (default
True
) : prints the parsings - only_s (default
False
) : search only for parsings which begin with start symbol "S" - draw (default
True
) : use NLTK to draw trees (these can be saved as .ps files)