-
Notifications
You must be signed in to change notification settings - Fork 12
Using the prover
You will need only SWI-Prolog to run the prover.
Go to the main directory and run the following command in order to run the prover on the trial portion of the SICK dataset (included in ccg_sen_d
):
swipl -f prolog/main.pl -l ccg_sen_d/SICK_trial_ccg.pl ccg_sen_d/SICK_trial_sen.pl
*_ccg.pl
contains CCG derivations by the C&C CCG parser (Clark&Curran,2007) while *_sen.pl
lists RTE problems.
Choose SICK_trial_eccg.pl
for the CCG derivations by EasyCCG (Lewis&Steedman).
Now already in the Prolog environment, run
gentail(211). % gui/graphical entailment check
This will trigger a tableau in the XPCE GUI for the SICK-211 problem to verify its gold answer.
If you prefer an HTML representation of the proof, add the html
parameter before gentail/1
:
parList([html]), gentail(211).
This first creates xml/tableau-211-yes-no_align.xml
and then xml/tableau-211-yes-no_align.html
from the latter via xsltproc
.
The file names stand for "a tableau for problem-211 to verify entailment (i.e. yes) with no alignment techniques used".
To reset the parameter list for the next run use parList([])
.
If you want to force the alignment of LLFs, run (because SICK-253 shows the effect of alignment well):
parList([]), gentail(align, 253). % gentail(253) is the same as gentail(no_align, 253)
In the example, the phrase is on top of the mountain
shared by the premise and the hypothesis is treated as an atomic term by the alignment technique.
If you want to get both tableaux for entailment (yes) and contradiction (no) checks, run
gentail_no_answer(211). % or gentail_no_answer(align, 211) to force alignment of LLFs
Be aware not all IDs have corresponding problems in SICK-trial as it is just a portion of SICK. You can try: 4, 24, 105, 116, 119, 185, 197, ... In order to run the prover on several problems without GUI, by listing them or giving an interval, try:
entail_some([4, 24, 105, 116, 119, 185, 197]).
or equivalently
entail_some(1-200).
For each RTE problem, these commands will print the problem ID, the [gold answer], the prover's judgment and a tableau status (open/closed, terminated/limited, #rule applications). In the end, a confusion matrix with some stats is printed. Something like this:
?- entail_some(1-200).
4: [no], no, closed, 17
24: [unknown], unknown, open, 'Ter',175
105: [unknown], unknown, open, 'Ter',56
116: [unknown], unknown, open, 'Ter',27
119: [unknown], unknown, open, 'Ter',205
185: [unknown], unknown, open, 'Ter',71
197: [unknown], unknown, open, 'Ter',68
------------------------------------------------------
Gold\Prover YES NO UNK DEF
------------------------------------------------------
ENTAILMENT: 0 0 0 (0) 0
------------------------------------------------------
CONTRADICTION: 0 1 0 (0) 0
------------------------------------------------------
NEUTRAL: 0 0 6 (0) 0
------------------------------------------------------
Total #problems: 7
Accuracy (pure): 1.00000 (1.00000)
Precision: 1.00000
Recall (pure): 1.00000 (1.00000)
------------ STATS -------------
Total #clTabperPrb: 1
Total #ruleApps for clTab: 17
Average #ruleApps for clTab: 17.00000
true.
Run the prover on all the problems of SICK-trial by:
entail_all.
The above predcates entail_some/1
and entail_all/0
build tableaux both for aligned and non-aligned LLFs.
If you want to terminate the run, type Ctrl+C
and then press a
for abort.
If you have a multi-core CPU, you can use parallel
in parList
to prove problems in a concurrent way.
parList([parallel]), entail_all. % similarly: parList([parallel]), entail_some(1-200).
You can make theorem prover faster by building tableaux only for aligned LLFs (use parList([])
to cancel the parallel
parameter asserted by the previous command):
parList([]), entail_all(align). % similarly: entail_some(align, 1-200)
or even setting the rule application limit (RAL) in order to stop a tableau after the rule application limit is reached. The default RAL is 400.
parList([ral(50)]), entail_all(align). % or combine it with parallel proving
The theorem prover has several parameters (see main.pl
). Part of these parameters control the actual proof procedure, For example:
-
rule application limit (
ral/1
), -
wordnet relations (
wn_ant/0
,wn_sim/0
,wn_der/0
), -
treatment of bare, plural and definite NPs (
the/0
,a2the/0
,s2the/0
,thE/0
,noPl/0
,noThe/0
,noHOQ/0
), -
word senses filtering (
ss/1
), -
rule efficiency criterion (
effCr/1
), - intersectivity of unknown noun modifiers (
allInt/0
), - strong alignment of LLFs (align all,
aall/0
), - check consistency of each sentence
constchck/0
.
To run the theorem prover with the optimal set of parameters for SICK (according to Abzianidze (2016b)), run:
parList([aall, wn_ant, wn_sim, wn_der, constchck, allInt, ral(800), waif('file_with_answers_inside.txt')]), entail_all.
For FraCaS run the command below. fracFilter
guarantees that the problems with undefined gold answers are excluded.
parList([fracFilter, aall, wn_ant, ral(400), waif('fr_answers_all_sections.txt')]), entail_all.
wn_sim
, wn_der
, and constchck
doesn't affect the answers. ral(800)
is decreased to ral(400)
as 400 rule applications seem enough for FraCaS problems.
This decrease also makes the prover faster. allInt
is omitted as it hurts the performance because FraCaS contains many non-intersective adjectives.
If you want to run the prover on certain sections of FraCas, e.g., 1, 2, 5 and 9, run:
parList([fracFilter, aall, wn_ant, ral(400), waif('fr_answers_sec1259.txt')]), findall(X, (sen_id(_,X,'h',_,_), (between(1,80,X); between(81,113,X); between(197,219,X); between(334,346,X))), List), entail_some(List).
The command makes sure that the problem IDs fall in the specific sections, e.g., the Plurals section contains 81,...,113 problems.
After the prover generates all the answers in a file passed by the waif/1
predicate, the evaluation can be done in the following way, depending on the dataset:
./perl/evaluate.pl SICK_dataset/SICK_test_annotated.txt file_with_answers_inside.txt
./perl/evaluate.pl FraCaS_dataset/FraCaS_gold.txt file_with_answers_inside.txt
It is also possible to combine two answer sets of the prover obtained based on different LLFs (e.g., resulted from different CCG parsers).
./perl/combine_two_classifiers.pl SICK_dataset/SICK_test_annotated.txt file_with_answers_inside_1.txt file_with_answers_inside_2.txt
The script also creates the HYBRID_ANSWERS
files that list the problemID-combinedAnswer pairs.
Notice that if you have files with the answers for a part of a dataset, e.g., for the FraCaS sections 1 and 2, the evaluation and combining scripts will still work for them.