Collocation Grammar #4

LenaHenke · 2021-09-10T09:37:55Z

Hi Ke! I am very excited about your toolbox, however I unfortunately do not get it to work for the collocation grammar. Everything works fine for the unigram grammar. I have tried using the input (train.dat/test.dat) that you provided and get an assertion error in hybrid.py:

line 1333, in model_state_assertion adapted_production_dependent[adapted_production]))
AssertionError: : Word -> a (Word -> Chars, Chars -> Char, Char -> 'a')
: 0 set()
: 5 {Collocation -> i ' l l p u t i t a w a y (Collocation -> Words, Words -> Word Words, Word -> i ' l l p u (Word -> Chars, Chars -> Char Chars, Char -> ...

Unfortunately, I could not figure out how to solve it. Do you maybe have a solution?
Thank you very much for your help in advance!
Best regards,
Lena

kzhai · 2021-09-13T05:57:10Z

Hi, Lena,

Thanks for your interest in the package.
I have not been keeping up with the package for quite a long time. The package was originally implemented years ago with python 2.7 (hence an old nltk version), and later ported to python 3. During that porting, there was a big change in nltk's FreqDist API. The lines reporting the error is likely due to the "block" assertion statement, where I kept around to validate the intermediate data structure and cache format.
One possible quick (and a bit hacky) fix is to comment out the entire assertion block between line 1319-1345. It runs fine on my end.
I will revisit the internal logics when I have some spare time.

Best,
Ke

LenaHenke · 2021-10-06T15:22:29Z

Dear Ke,

Thank you so much for your reply! I have only just seen your suggestion and it works perfectly also on my own data!

I might, however, have another question on how I can apply the model to new input. I basically just want to use the Collocation Model to make inferences about new sentences. I was hoping that there would be a simple function (along the lines of Model.inference(newsentence)), however from the previously closed issues, I understood that launch_test returns parses of new data. The function itself is working for me, however, I am unsure whether I understand and apply it correctly and I would be really very grateful for your insights:

(1) I am very new to NLP and I apologize if this is very basic, but why does the function take truth and training data, if I have already used training data to train the model? The output for truth data is also the very same as my truth input, so I am not sure I understand why I need both of those inputs. Maybe this is also a misconception from my side on what should be the train.dat and truth.dat. Could you possibly clarify this to me?

(2) In my output file for the train.dat, each sentence is parsed 10 times (sometimes in different ways). Which of these parses should I consider as final output (i.e. the final/most likely parse given the trained model)?

Thank you very much for your help again!
Lena

kzhai · 2021-10-08T04:15:50Z

Hi, Lena,

(1) I am very new to NLP and I apologize if this is very basic, but why does the function take truth and training data, if I have already used training data to train the model? The output for truth data is also the very same as my truth input, so I am not sure I understand why I need both of those inputs. Maybe this is also a misconception from my side on what should be the train.dat and truth.dat. Could you possibly clarify this to me?

If I understand your question correctly, the adaptor grammar is an unsupervised model, so that it does need external labels/annotations. If you check the truth vs train, train is simply a tokenized version of truth data.

(2) In my output file for the train.dat, each sentence is parsed 10 times (sometimes in different ways). Which of these parses should I consider as final output (i.e. the final/most likely parse given the trained model)?

Ideally, there should be one dominant parse tree, however, in some case, there could be two or more, in that case, you may do a sampling over the parse trees, or simply take the most frequent one.

LenaHenke · 2021-10-31T18:29:35Z

Thank you so much, Ke! Your answers helped a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collocation Grammar #4

Collocation Grammar #4

LenaHenke commented Sep 10, 2021

kzhai commented Sep 13, 2021

LenaHenke commented Oct 6, 2021

kzhai commented Oct 8, 2021

LenaHenke commented Oct 31, 2021

Collocation Grammar #4

Collocation Grammar #4

Comments

LenaHenke commented Sep 10, 2021

kzhai commented Sep 13, 2021

LenaHenke commented Oct 6, 2021

kzhai commented Oct 8, 2021

LenaHenke commented Oct 31, 2021