Making sense of the data #1

sagnik · 2024-01-12T23:36:27Z

I really liked this paper and hoping to use it in a problem I am interested in. Wanted to clarify a couple of things.

In all_data.csv, you have the open class words defined as an identifier in the hierarchy. I assume N0.0 implies noun hierarchy 0, and the item 0 in that hierarchy. N0.0 uniquely identifies a token, right?

Let's assume I have a noun hierarchy defined as pig < mammal < animal and a verb one defined as run < move.

Now, from the first example in "(S (Q all)(A )(N N0.0)(C )(R )(Neg not)(V V0.0)),(S (Q all)(A rose)(N N0.0)(C )(R that hum)(Neg not)(V V0.0))" -- I can generate the sentence pair: "all pig not run, all rose pig that hum not run".

If the sentence pair is instead "(S (Q all)(A )(N N0.0)(C )(R )(Neg not)(V V0.0)),(S (Q all)(A rose)(N N0.0)(C )(R that hum)(Neg not)(V V0.1))", the generated sentence pair would be "all pig not run, all rose pig that hum not move".

Is this interpretation correct? If yes, did you define the hierarchy somewhere?

Also, dance < move and run < move would be defined as two different hierarchies (dance and run are at the same level), no?

TIA

The text was updated successfully, but these errors were encountered:

emilygoodwin · 2024-01-13T19:48:44Z

Hi, thanks for your comment!

Yes, In the experiments we used nonce words like N0.0, but the English equivalents you provided look right to me.

In terms of defining the hierarchy, I've just copied the relevant bit from the Readme file inside the "dataset" subdirectory. That file also explains the block structure in a bit more detail. Let me know if this isn't answering your question:

All nouns and verbs have the form XY.Z where:
X is a letter representing the part of speech: N for noun or V for verb
Y is a number, representing the hierarchy (between 0 and 99)
Z is a number, representing the position in the hierarchy (between 0 and 5)

the number of hierarchies & words per hierarchy can be set in config.yml
Function words (relative clauses, quantifiers, modifiers) are normal English words to make things easier to read.

sagnik · 2024-01-13T21:42:22Z

In the experiments, the noun token in the sentence was N0.0 (and the verb V0.0) and not something like “pig” or “run”?

emilygoodwin closed this as completed Jan 13, 2024

emilygoodwin reopened this Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making sense of the data #1

Making sense of the data #1

sagnik commented Jan 12, 2024 •

edited

Loading

emilygoodwin commented Jan 13, 2024

sagnik commented Jan 13, 2024 •

edited

Loading

Making sense of the data #1

Making sense of the data #1

Comments

sagnik commented Jan 12, 2024 • edited Loading

emilygoodwin commented Jan 13, 2024

sagnik commented Jan 13, 2024 • edited Loading

sagnik commented Jan 12, 2024 •

edited

Loading

sagnik commented Jan 13, 2024 •

edited

Loading