-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate quantifier example #38
base: main
Are you sure you want to change the base?
Conversation
…eration results for comparison pre-changes
…rom the universe from which a QuantifierModel was generated. Cleaned up generation scripts.
…ents as name is based on properties of referent
…odel dataclass args, successfully generated output
…tonicityMeasurer (still not implemented)
…rammar. QuantifierModel initialized only with name string. Minor bug fixes
…grammar creation of index primitive rules. Added default weight value of 2.0 for primitive indices
…perators for succinctness
…enerated expressions - can't seem to get more than two int rules to appear in gen exps
…ng property for GrammaticalExpression
… contains for Meaning, folder read of expressions + pkled Universe, generation with inclusive M sizes
…ial... etc.) into 'generate_expressions.py'
Learn quant
… as a function, print grammar upon reading it
…load grammar import, print grammar in serial generations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff Chris; very exciting progress! A few smaller things here and there, mostly for cleaning things up, documenting, and things of that sort. Let me know if anything's confusing / unclear :)
src/examples/learn_quant/README.md
Outdated
|
||
- `scripts`: a set of scripts for generating `QuantifierModels` and measuring various properties of individual models and sets of models. These are explained in more detail in the [Usage](#usage) section below. | ||
- `outputs`: outputs from the generation routines for creating `QuantifierModel`s and `QuantifierUniverse`s | ||
- `referents.csv`: this file defines the set of points of communication (which will become `Referent`s in ULTK terms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is out of date now?
src/examples/learn_quant/README.md
Outdated
- `referents.csv`: this file defines the set of points of communication (which will become `Referent`s in ULTK terms). | ||
- `meaning.py`: this file defines the meaning space (a `Universe` in ULTK terms) of referents that are individual models of quantifiers (`QuantifierModel`s) | ||
- `quantifier.py`: defines the subclasses of `ultk`'s `Referent` and `Universe` classes that add additional properties and functionality for modeling quantifier learning | ||
- `grammar.yml`: defines the Language of Thought grammar (an ULTK `Grammar` is created from this file in one line in `grammar.py`) for this domain, using the five semantic features identified in Haspelmath 1997. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"using the five..." is a copy/paste vestige :)
src/examples/learn_quant/README.md
Outdated
|
||
This script generates the _shortest_ expression (ULTK `GrammaticalExpression`s) for each possible `Meaning` (set of `Referent`s) in the LoT defined in `grammar.py`. In particular, ULTK provides methods for enumerating all grammatical expressions up to a given depth, with user-provided keys for uniqueness and for comparison in the case of a clash. By setting the former to get the `Meaning` from an expression and the latter to compare along length of the expression, the enumeration method returns a mapping from meanings to shortest expressions which express them. | ||
|
||
2. `python -m learn_quant.scripts.generation_text`: generates a `GrammaticalExpression` from the quantifier modeling `Grammar` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation_text
should be generation_test
name: str | ||
|
||
@dataclasses.dataclass | ||
class UniverseConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some comments on the fields, i.e. what they are? Two thoughts: (i) Might be good to have better names for m_size
and x_size
, and (ii) what exactly is depth
? That feels like a property of expressions, but does it mean something like max size of a model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so having read everything now, the point is: depth
is used by generate_unique_expressions
, which is part of the grammar, to generate expressions. IT's not a property of the Universe
, so that argument should be moved elsehwere in the Config, i.e. to a GrammarConfig
sub-part or some such
@@ -0,0 +1,17 @@ | |||
defaults: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chef's kiss!
object.__setattr__(self, 'B', frozenset([i for i, x in enumerate(self.name) if x in ['1','2']])) | ||
object.__setattr__(self, 'M', frozenset([i for i, x in enumerate(self.name) if x in ['0','1','2','3']])) | ||
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
"B": len(self.B), | ||
} | ||
|
||
def to_numpy(self, quantifier_index=None, in_meaning=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! add type hints and docstring
raise ValueError("quantifier_index must be a one-dimensional one-hot vector.") | ||
else: | ||
appended_value = 0 | ||
if in_meaning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced this in_meaning
stuff should be part of this method, versus handled elswhere in the data processing, i.e. wherever this is being called from. Because when supplying to the model, the label 1/0 is going to be separate from this array anywyas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more explicit: the combination of model + quantifier label are inputs (i.e. x
) to a model, and the truth value (which I think is what in_meaning
is doing) is the goal/output/y
value
x_size = max(self.x_size, other.x_size) | ||
return QuantifierUniverse(referents = self.referents + other.referents, prior= self._prior + other._prior, x_size=x_size) | ||
|
||
def get_names(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list[str]
|
||
return expressions_by_meaning | ||
|
||
def generate_expressions(quantifiers_grammar: QuantifierGrammar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run black
for formatting too :). The repo is setup to do that automatically when PRs are merged, but I'm having a hard time reading this method actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress Chris! A few more things, again mostly minor. There are some details about the monotonicity calculation that I'm now wondering about. Happy to try to figure those out asynch or wait until our next meeting; let me know!
for expression_id, quantifier_expression in enumerate(expressions): | ||
print("Calculating monotonicity for: ", quantifier_expression) | ||
metrics[str(quantifier_expression)]["monotonicity"] = ( | ||
upward_monotonicity_entropy(submembership, membership[:, expression_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is submembership the first argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As opposed to the second you mean? I can switch them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original method upward_monotonicity_entropy
that this is modified from, the first argument was all_models
, which was an array containing all the quantifiermodels in the universe, not just the submodels. But maybe some other logic has changed there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I might have been testing something and the code drifted into the test. I'll check it out
return (1.0 - cond_ent / q_ent)[0, 0] | ||
|
||
|
||
def calculate_monotonicity(universe, expressions, down=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations and docstring
src/examples/learn_quant/meaning.py
Outdated
import argparse | ||
|
||
|
||
def create_universe(m_size, x_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations and docstring
src/examples/learn_quant/meaning.py
Outdated
|
||
possible_quantifiers = [] | ||
|
||
for x in combinations_with_replacement([0, 1, 2, 3], r=m_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor style point: best to use a nicer name even for something like x
, e.g. for combination in ...
names_array.append(quantifier_model.name) | ||
truth_matrix = np.array(truth_array) | ||
names_vector = np.array(names_array) | ||
return truth_matrix, names_vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a mismatch here between type annotation and return type (which is tuple[np.ndarray, np.ndarray]
). I'd recommend setting up VSCode with mypy
so that it automatically alerts you to these things and they can be spotted quickly :). Also: why does names need to be an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the MS one, but don't have any strong/antecedent views here (probably chose it just b/c VSCode is also an MS product, honestly don't remember)
return Context(objects, properties, bools) | ||
|
||
|
||
def get_sub_structures(concept_lattice: Context, name: list[str]) -> set[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's walk through the logic of the monotonicity calculation in our next meeting and discuss the -set(name)
part. In general, the substructure relation in our definition is reflexive, i.e. every structure is a sub-structure of itself; but thtere might be reasons in the way you have things setup to exclude that
return num_arr[num, :] | ||
|
||
def has_true_pred(num_arr, y): | ||
return np.any(y * num_arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that these get_preds
and has_true_pred
are no longer needed, i.e. replaced by your lattice-based methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I do wonder if we should not do the lattice methods because they might be slower than the logic you've written, I'm not yet sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one big issue I foresee is that the methods I have here rely on very strong assumptions about the nature of the quantifiermodels that are no longer true (but were true in Carcassi et al 2021): we're assuming all models are the same size and that there are only two sets (
raise ValueError("quantifier_index must be a one-dimensional one-hot vector.") | ||
else: | ||
appended_value = 0 | ||
if in_meaning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more explicit: the combination of model + quantifier label are inputs (i.e. x
) to a model, and the truth value (which I think is what in_meaning
is doing) is the goal/output/y
value
@@ -0,0 +1,792 @@ | |||
"subset_eq(A, A)",1 | |||
"subset_eq(A, difference(A, B))",1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something is slightly off here: I don't think this one should be upward monotone, and I think you're only measuring upward monotonicity right now (correct me if this is wrong). For instance:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be onnected to the substructure stuff as first argument to the monotonicity calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I had not done worked to get anything other than upward monotonicity implemented yet.
Thanks for finding that and spelling that out. I'll see if I can figure out how to rectify this
No description provided.