-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Generation Rewrite (WIP) #608
base: master
Are you sure you want to change the base?
Conversation
Introduces the 'FeatureBlock' as the main interface for Triage to create features, instead of directly interfacing with collate. - Add FeatureBlock abstract base class. The abstract methods are aimed at providing an easy, flexible interface for feature generators to implement. The concrete methods are aimed at building on those abstract methods to provide a slightly easier, higher-level interface for Experiments, et al to use. - Splits up the bloated architect.FeatureGenerator into a couple different places: the code that generates is now in SpacetimeAggregations in feature_block _generators, and the code that runs the generated queries now lives in concrete methods of the FeatureBlock base class. - ExperimentBase now passes on config to generate_feature_blocks and owns a collection of resulting FeatureBlocks, instead of a FeatureGenerator object. - Instead of a FeatureDictionaryCreator object, the FeatureDictionary class is introduced that knows how to build itself from a collection of FeatureBlocks. The Experiment holds references to these as before. - Continuing the trend of molding collate to more closely fit how Triage is using it and removing unneeded flexibility, the unused Aggregation base class is folded into the SpacetimeAggregation so the latter can directly inherit from FeatureBlock. This way we get to remove/condense many tests, and sunset the collate integration tests because the methods being tested no longer exist and are tested in either SpacetimeAggregation or FeatureBlock. In addition, many of the arguments within SpacetimeAggregation are changed to their FeatureBlock equivalents to more tightly fit as a FeatureBlock subclass. Some of the more generic helper methods within SpacetimeAggregation are moved to FeatureBlock as concrete methods - With feature data now residing at a new config key, the experiment's config version is bumped to v7.
Codecov Report
@@ Coverage Diff @@
## master #608 +/- ##
==========================================
- Coverage 82.18% 81.14% -1.05%
==========================================
Files 93 94 +1
Lines 6271 6136 -135
==========================================
- Hits 5154 4979 -175
- Misses 1117 1157 +40
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really sure how helpful I can be -- (nudge @nanounanue) -- and it wasn't obvious what code was being moved vs. newly added; so, I commented on a few things, but I didn't really see any problems.
The FeatureQueryRunner that was present at first here is actually unused and I'll remove it. It was the first version of what eventually became the concrete methods of FeatureBlock. I felt that it was too many layers, and that it would be advantageous to allow these new FeatureBlocks to just build themselves. I'm not totally sold on my changes: the pattern of having the Experiment own a bunch of 'verb' components, each totally responsible for doing what its concerned with, is now broken up. Instead control over generating features is now split between a list of things. I'm still unsure if this is a better or worse pattern for the Experiment. |
raise ValueError("value of FeatureDictionary objects represents a list of " | ||
"feature names, and therefore must be iterable") | ||
for feature_name in feature_names: | ||
if not isinstance(feature_name, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you like, this check might make (more) sense in FeatureNameList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'm testing this out, and it's a bit of work to actually do right. The same concerns about inheriting from dict apply to inheriting from list, at least once you do what we're doing now and start to change the mutation methods to add something like this in.
The top-rated answer makes a good suggestion; while inheriting from list, I've so far overridden insert
, __setitem__
, and append
and still haven't found the one that hits how we actually use FeatureNameList. I assume it's one of extend
, __add__
, or __iadd__
, but that's quite a bit to do to properly inherit from list
. So implementing the suggestion in there for a typed list looks promising.
Of course, there's also the comment further below that raises the point that inheriting from MutableSequence also means that we have to figure out serialization on our own. This might not be a bad thing to do anyway: as it stands, this class doesn't serialize right and the calling code is forced to listify it. It would be nice to just implement this right in FeatureNameList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ended up being that both __add__
and extend
are used by triage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Again, doesn't abstractly have to be one way or the other, (nor encapsulated in one place or another). But, good to consider 👍
Note! This is a lot to read! Unfortunately I didn't have a good way of splitting this up into multiple PRs. My suggestion for reading this to start:
Introduces the 'FeatureBlock' as the main interface for Triage to create features, instead of directly interfacing with collate.
Add FeatureBlock abstract base class. The abstract methods are aimed
at providing an easy, flexible interface for feature generators to
implement. The concrete methods are aimed at building on those abstract
methods to provide a slightly easier, higher-level interface for
Experiments, et al to use.
Splits up the bloated architect.FeatureGenerator into a couple
different places: the code that generates is now in SpacetimeAggregations in feature_block _generators, and the code that runs the generated queries now lives in concrete methods of the FeatureBlock base class.
ExperimentBase now passes on config to generate_feature_blocks and
owns a collection of resulting FeatureBlocks, instead of a
FeatureGenerator object.
Instead of a FeatureDictionaryCreator object, the FeatureDictionary
class is introduced that knows how to build itself from a collection of
FeatureBlocks. The Experiment holds references to these as before.
Continuing the trend of molding collate to more closely fit how Triage
is using it and removing unneeded flexibility, the unused Aggregation base class is folded
into the SpacetimeAggregation so the latter can directly inherit from
FeatureBlock. This way we get to remove/condense many tests, and sunset
the collate integration tests because the methods being tested no longer
exist and are tested in either SpacetimeAggregation or FeatureBlock. In
addition, many of the arguments within SpacetimeAggregation are changed
to their FeatureBlock equivalents to more tightly fit as a FeatureBlock
subclass. Some of the more generic helper methods within
SpacetimeAggregation are moved to FeatureBlock as concrete methods
With feature data now residing at a new config key, the experiment's
config version is bumped to v7.
In general I think these changes will make it easier to build feature introspection functionality. It's easier for code to, based on configuration, do stuff like grab feature lists without having to introspect the database structure.