Skip to content

amasotti/CFG_Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub


CYK-Parser for CFG Grammars

Almost the same parser as in RobMcH/cyk_parser and phiSgr(CYK_parser) but more in Object oriented fashion.

This is a simple context-free grammar parser, in Python3.

Feel free to use any piece of the code in your own projects.

Structure

  • Main folder
    • cfg.py main file
    • README.md this document
    • data
      • rules.txt the phrase structure grammar rules
      • normalized.json a backup of the grammar in json format, for future load
    • classes
      • grammar.py Implementation of the CFGGrammar class
      • parser.py Implementation of the Parser class + some other useful functions

Rules

Here example of rules

S -> VP NP
VP -> V NP
NP -> D N
NP -> N PP
...

In this version of the Parser (other as in the original repos), also rules with double outputs are accepted:

V -> 'buy' | 'sell'
S -> VP NP | VP

N.B: Non terminal nodes should be listed in single quotes in the grammar file. The Parser will use this sign to distinguish terminal from non terminal nodes:

V -> 'buy'

**NOT**  V -> buy

Main modifications

  • Grammar and Parser are now implemented as classes

  • Each class has some new methods

  • New rule file: rules_usami.txt from usami/pcfg

  • Minor code improvement (I hope)

  • Parser

    • The tree is printed with round parenthesis (for compatibility with some nltk tree tools)
    • grammar_from_file and grammar_from_string were collapsed into the new method load_grammar
    • Start symbol fixed as "S"
    • The parser searches (after the CYK-Algorithm) if there are alternative derivations. If you want only derivations starting from 'S', you can pass the bool param. only_s to the method .to_tree
  • Grammar class

    • Deleted option to give a single rule via string input I think it makes things more complicated, and one can still test single rules using the text file
    • Possibility to load previously normalized grammar from json
  • work in progress: Draw Tree as svg, see below:

tree

  • Added ASCII art - for fun ;)

How does it work:

  • Grammar from file, sentence (at the moment, only a single sentence) from file:

python3 cfg.py

  • Grammar from file, sentence from input (stdin):

python3 cfg.py

  • Default grammar ("data/rules_usami.txt") and sentence from input

python3 cfg.py

The Parser has several boolean parameters:

  • output (default True) : prints the parsings
  • only_s (default False) : search only for parsings which begin with start symbol "S"
  • draw (default True) : use NLTK to draw trees (these can be saved as .ps files)

About

Parser for Context Free Grammars

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages