Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making sense of the data #1

Open
sagnik opened this issue Jan 12, 2024 · 2 comments
Open

Making sense of the data #1

sagnik opened this issue Jan 12, 2024 · 2 comments

Comments

@sagnik
Copy link

sagnik commented Jan 12, 2024

I really liked this paper and hoping to use it in a problem I am interested in. Wanted to clarify a couple of things.

In all_data.csv, you have the open class words defined as an identifier in the hierarchy. I assume N0.0 implies noun hierarchy 0, and the item 0 in that hierarchy. N0.0 uniquely identifies a token, right?

Let's assume I have a noun hierarchy defined as pig < mammal < animal and a verb one defined as run < move.

Now, from the first example in "(S (Q all)(A )(N N0.0)(C )(R )(Neg not)(V V0.0)),(S (Q all)(A rose)(N N0.0)(C )(R that hum)(Neg not)(V V0.0))" -- I can generate the sentence pair: "all pig not run, all rose pig that hum not run".

If the sentence pair is instead "(S (Q all)(A )(N N0.0)(C )(R )(Neg not)(V V0.0)),(S (Q all)(A rose)(N N0.0)(C )(R that hum)(Neg not)(V V0.1))", the generated sentence pair would be "all pig not run, all rose pig that hum not move".

Is this interpretation correct? If yes, did you define the hierarchy somewhere?

Also, dance < move and run < move would be defined as two different hierarchies (dance and run are at the same level), no?

TIA

@emilygoodwin
Copy link
Owner

Hi, thanks for your comment!

Yes, In the experiments we used nonce words like N0.0, but the English equivalents you provided look right to me.

In terms of defining the hierarchy, I've just copied the relevant bit from the Readme file inside the "dataset" subdirectory. That file also explains the block structure in a bit more detail. Let me know if this isn't answering your question:

All nouns and verbs have the form XY.Z where:
X is a letter representing the part of speech: N for noun or V for verb
Y is a number, representing the hierarchy (between 0 and 99)
Z is a number, representing the position in the hierarchy (between 0 and 5)

the number of hierarchies & words per hierarchy can be set in config.yml
Function words (relative clauses, quantifiers, modifiers) are normal English words to make things easier to read.

@sagnik
Copy link
Author

sagnik commented Jan 13, 2024

In the experiments, the noun token in the sentence was N0.0 (and the verb V0.0) and not something like “pig” or “run”?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants