Dependency parsing: is there some way to discourage multiple `nsubj` dependents? #1340

nschneid · 2024-01-30T04:00:00Z

A very weird English tree produced by Stanza 1.6.0 in the demo:

My cousin my extremely rude colleague admired last year chewed the chicken enthusiastically.

In UD, no word is allowed to have multiple (plain) nsubj dependents. But "admired" has two.

Is there a recommended alternate training or decoding method in Stanza that could avoid this sort of problem?

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2024-01-30T04:38:36Z

Currently no, the parser has no idea how to handle such a weird sentence and there's no way to give it constraints such as only one nsubj. Clearly it is interpreting the appositive phrase as the root phrase of the sentence. If you create a few sentences of similar nature, we can throw them in the training data and see if it improves these structures without hurting the rest of the performance https://github.com/stanfordnlp/handparsed-treebank

…

On Mon, Jan 29, 2024 at 8:00 PM Nathan Schneider ***@***.***> wrote: A very weird English tree produced by Stanza 1.6.0 in the demo <http://stanza.run/>: My cousin my extremely rude colleague admired last year chewed the chicken enthusiastically. image.png (view on web) <https://github.com/stanfordnlp/stanza/assets/985263/a6468f67-9d09-4ad5-8f3e-4cbc92d08e6b> In UD, no word is allowed to have multiple (plain) nsubj dependents. But "admired" has two. Is there a recommended alternate training or decoding method in Stanza that could avoid this sort of problem? — Reply to this email directly, view it on GitHub <#1340>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWKUOH2ULB5YKLI3MNDYRBV43AVCNFSM6AAAAABCQRFCNOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDMOJQGQ2DIMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

stale · 2025-01-21T23:22:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2025-01-31T22:03:22Z

This issue has been automatically closed due to inactivity.

AngledLuffa · 2025-02-22T08:41:29Z

stalebot nonwithstanding, i find myself wondering if this is possible myself

so here's a brief summary of what the dependency parser does for finding a graph. first it calculates the weights of all possible edges, then it finds the "minimum spanning arborescence" for that graph. it does that using the chu-liu-edmonds algorithm.

we could theoretically special case nsubj so that: if the parser produces two (or more) nsubj for a particular node, it systematically replaces each cost except one with infinity, then reruns the algorithm. whichever wins the new search is the chosen graph.

such constraints would not really generalize to every possible validation error, but having two nsubj is clearly a problem.

reasonable? not worth the effort? any other suggestions on how to approach this, given the brief summary of our dependency parser at the top?

nschneid · 2025-02-22T16:26:21Z

One way to do this would be to use a decoding algorithm that allows higher-order constraints (that take into account pairs of edges), e.g. via an ILP as in TurboParser.

we could theoretically special case nsubj so that: if the parser produces two (or more) nsubj for a particular node, it systematically replaces each cost except one with infinity, then reruns the algorithm. whichever wins the new search is the chosen graph.

I don't think it is theoretically optimal/sufficient to assign, one at a time, infinite cost to just the nsubj edges predicted in the original parse. Removing one of those edges could force a restructuring of the tree in the next best parse, and then there could be multiple nsubj edges for a different predicate. But this strategy may be a good enough band-aid in practice.

AngledLuffa · 2025-02-22T17:07:45Z

via an ILP as in TurboParser.

Sorry, is that integer linear programming? I'm not at all familiar with turboparser.

Removing one of those edges could force a restructuring of the tree in the next best parse, and then there could be multiple nsubj edges for a different predicate.

Agreed. It could also be the case that two nodes in the initial parse have two nsubj, in this case we'd have to go through both nodes anyway. However, it might adequately handle the most common error case

AngledLuffa · 2025-02-22T18:06:17Z

actually it's even worse than what i suggested above, since at the time of the tree decoding, the model has already thrown away information on 2nd best arcs. there really isn't anything that interacts between the arcs in terms of nsubj means something else is more likely to be obj, for example

if you have a suggestion for a replacement algorithm (which isn't hugely slower than our current), happy to look into it. i can also ask around in the theory department next week. i haven't found something obvious by googling, and this particular problem never came up in my algorithms classes...

nschneid · 2025-02-22T18:19:10Z

I would not look for the 2nd best edge - I would parse the entire sentence again but blocking an individual edge at a time.

The theory people will probably tell you to use an integer linear programming parser instead. https://www.cs.cmu.edu/~nasmith/LSP/ explains different structured prediction algorithms in detail.

nschneid added the question label Jan 30, 2024

stale bot added the stale label Jan 21, 2025

stale bot closed this as completed Jan 31, 2025

AngledLuffa reopened this Feb 22, 2025

stale bot removed the stale label Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency parsing: is there some way to discourage multiple `nsubj` dependents? #1340

Dependency parsing: is there some way to discourage multiple `nsubj` dependents? #1340

nschneid commented Jan 30, 2024

AngledLuffa commented Jan 30, 2024 via email

stale bot commented Jan 21, 2025

stale bot commented Jan 31, 2025

AngledLuffa commented Feb 22, 2025

nschneid commented Feb 22, 2025

AngledLuffa commented Feb 22, 2025

AngledLuffa commented Feb 22, 2025

nschneid commented Feb 22, 2025 •

edited

Loading

Dependency parsing: is there some way to discourage multiple nsubj dependents? #1340

Dependency parsing: is there some way to discourage multiple nsubj dependents? #1340

Comments

nschneid commented Jan 30, 2024

AngledLuffa commented Jan 30, 2024 via email

stale bot commented Jan 21, 2025

stale bot commented Jan 31, 2025

AngledLuffa commented Feb 22, 2025

nschneid commented Feb 22, 2025

AngledLuffa commented Feb 22, 2025

AngledLuffa commented Feb 22, 2025

nschneid commented Feb 22, 2025 • edited Loading

Dependency parsing: is there some way to discourage multiple `nsubj` dependents? #1340

Dependency parsing: is there some way to discourage multiple `nsubj` dependents? #1340

nschneid commented Feb 22, 2025 •

edited

Loading