Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing CoNLL-U #76

Open
vcvpaiva opened this issue Mar 15, 2017 · 8 comments
Open

Enhancing CoNLL-U #76

vcvpaiva opened this issue Mar 15, 2017 · 8 comments
Assignees

Comments

@vcvpaiva
Copy link
Member

vcvpaiva commented Mar 15, 2017

We want to add the enhanced dependencies at some stage.
The CoreNLP code says:

  • Process multi-word prepositions: Yes issue Multiword functional words from SD #65
  • Add prepositions to relation labels: Yes
  • Add prepositions only to nmod relations: No
  • Add coordinating conjunctions to relation labels: Yes
  • Propagate dependents: Yes
  • Add "referent" relations: Yes
  • Add copy nodes for conjoined Ps and PPs: Yes
  • Turn quantificational modifiers into flat MWEs: Yes
  • Add relations between controlling subject and controlled verbs: Yes
@vcvpaiva
Copy link
Member Author

which quantificational modifiers?
a few
several

@fcbr fcbr self-assigned this Mar 22, 2017
@fcbr
Copy link
Member

fcbr commented Mar 22, 2017

Quantificational patterns:

Three words (a xx of): lot, assortment, number, couple, bunch, handful, litany, sheaf, slew, dozen, series, variety, multitude, wad, clutch, wave, mountain, array, spate, string, ton, range, plethora, heap, sort, form, kind, type, version, bit, pair, triple, total.

Two words (xx of): lots, many, several, plenty, tons, dozens, multitudes, mountains, loads, pairs, tens, hundreds, thousands, millions, billions, trillions

Two words (xx of the, xx of them): some, all, both, neither, everyone, nobody, one, two, three, four, five, six, seven, eight, nine, ten, hundred, thousand, million, billion, trillion

@arademaker
Copy link
Member

Last paragraph is two or three words ?

@fcbr
Copy link
Member

fcbr commented Mar 22, 2017

Good news and bad news. Good news is that I found a way of running the Stanford enhanced dependencies on our CoNLL-U files. The bad news is that they are based exclusively on the UD tagset, which means that I had to run the tool agains the files in the expanded.ud directory. The result is in the expanded.ud.enhanced++ directory.

@fcbr
Copy link
Member

fcbr commented Mar 22, 2017

@arademaker it's three words of course, but they are classified as two words in the code.

@fcbr
Copy link
Member

fcbr commented Mar 22, 2017

I have also updated the bosque interface with both new corpora: SICK-UD for the UD version and SICK-UD++ for the enhanced++ UD version.

@arademaker
Copy link
Member

Weird ! Are they talking the words in the pattern or something else ?

@vcvpaiva
Copy link
Member Author

wow! I don't know how I missed this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants