Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Group sub classable through entry points #3882

Merged
merged 1 commit into from
Apr 9, 2020

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Mar 31, 2020

Fixes #3567

We add the aiida.groups entry point group where sub classes of the
aiida.orm.groups.Group class can be registered. A new metaclass is
used to automatically set the type_string based on the entry point of
the Group sub class. This will make it possible to reload the correct
sub class when reloading from the database.

If the GroupMeta metaclass cannot retrieve the corresponding entry
point of the subclass, a warning is issued that any instances of this
class will not be storable and the __type_string attribute is set to
None. This can be checked by the store method which will make it
fail. We choose to only except in the store method such that it is
still possible to define and instantiate subclasses of Group that have
not yet been registered. This is useful for testing and experimenting.

Since the group type strings are now based on the entry point names, the
existing group type strings in the database have to be migrated:

  • user -> core
  • data.upf.family -> core.upf
  • auto.import -> core.import
  • auto.run -> core.auto

When loading a Group instance from the database, the loader will try
to resolve the type string to the corresponding subclass through the
entry points. If this fails, a warning is issued and we fallback on the
base Group class.

Remaining questions/features to be discussed:

  • Currently, defining a subclass of Group without an entry point will raise. This should probably be turned into a warning. Decision: allow definition of unregistered subclasses while issuing a warning that storing any instances will not be allowed.
  • I think the type_string of the base Group should probably be core instead of core.group, i.e. we should make the entry point name there simply core. This will make it the parent hierarchically to all other groups from the core namespace. This will be useful when querying with wildcards for target subclasses. Decision: use core as type string for the base Group class
  • Regarding backwards compatibility: Make it possible to define a Group subclass and instantiate it without a corresponding entry point? What should the behavior be? It should get the base core type string? Should there be a warning that loading the group from the database will load the Group base class and not the sub class until an entry point is added. Decision: loading instance with type string that cannot be resolved will fall back on base Group class
  • Regarding backwards compatibility: What about the type_string in the Group constructor. Currently this is accepted and directly determines the type_string set in the database. With this PR this value is determined automatically and currently I raise an exception when it is passed. How to make this backwards compatible? Just issue a warning and discard the value? Maybe better then raising, but still not really backwards compatible. I guess these changes are therefore by definition not fully backwards compatible in the strictest sense of the words. Is this a problem we think? Decision: after discussion we found that keeping this argument as to not break backwards compatibility was impossible if we were also to keep the requirement of not allowing to store group instances of unregistered subclasses. The best option, that is currently implemented, is to issue a warning if an explicit type_string is passed and that it will be ignored and the base core type string will be used.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 31, 2020

@ramirezfranciscof there is only one unit test failing tests/tools/graph/test_age.py::TestAiidaGraphExplorer::test_groups and I have no clue why. Could you maybe have a look and see if you can spot why? It creates 4 nodes and groups and stores node 2 and 3 in group 2 and tries to find those two nodes through the AGE. However, it returns all 4 nodes. These are the results of an example run:

NODE_IDS 52249 52250 52251 52252
GROUP_IDS 4008 4009 4010 4011
EXPECTED {52250, 52251}
OBTAINED {52249, 52250, 52251, 52252}

@ltalirz
Copy link
Member

ltalirz commented Mar 31, 2020

Thanks @sphuber , this makes a lot of sense to me!
I think there are many use cases, where one would otherwise use stand-alone scripts to act on specific groups (we already do). Being able to put this into the groups themselves seems more elegant.

Currently, defining a subclass of Group without an entry point will raise. This should probably be turned into a warning.

Yes. In which sense is this different from creating a subclass of, say, Data?I

I think the type_string of the base Group should probably be core

Agreed.

I'm not sure I'm the best person for reviewing this PR, perhaps some of the ones who worked on the GroupPath / Groups in other contexts may be more suitable.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 31, 2020

Yes. In which sense is this different from creating a subclass of, say, Data?I

In how it should work probably there is no difference, however, where the behavior of the meta class in this PR is too strict, the current behavior for Data sub classes is probably a bit too liberal. See for example open issue #3457.

My feeling now is that we should have the following behavior:

  1. When defining or importing a sub class of Group that does not correspond to a known entry point a warning will be issued that when instantiated and stored, it will be stored with the default type string core and so when you load the group, it will return an instance of the base Group class and not the specific subclass.
  2. Instantiating Group sub class with unknown entry point will force type_string='core'.
  3. Loading a Group instance from database with a type string that cannot be resolved to a known entry point will emit a warning and use the base Group class as a fallback.

@greschd
Copy link
Member

greschd commented Mar 31, 2020

I'd actually be in favor of keeping the current behavior (error) w.r.t. classes that don't have a corresponding entry point. What is the point of instantiating an object as a specific Group subclass if, after storing + loading, it will end up being a bare Group anyway?

@sphuber
Copy link
Contributor Author

sphuber commented Mar 31, 2020

I'd actually be in favor of keeping the current behavior (error) w.r.t. classes that don't have a corresponding entry point. What is the point of instantiating an object as a specific Group subclass if, after storing + loading, it will end up being a bare Group anyway?

I agree there that that scenario would not make sense, so best to just not allow it. However, this will currently also raise by just importing the class. Since we still have some problems now and again with entry points being properly loaded, it might be a bit aggressive and annoying when you simply import a class and it excepts straight away, just because the corresponding package is not properly installed or its entry points in any case or not (yet) properly recognized. How about the following:

  • Emitting a warning in the metaclass when the entry point is not found, keeping the class importable
  • Constructing a new instance will raise an exception if no entry point is found
  • Loading an instance from the database with unknown entry point will emit warning and fall back on to Group base class

And now that I have you here @greschd :) what do you think the correct behavior is regarding backwards compatibility of code passing an explicit type_string in the Group constructor. I am not sure if there is any code like this around. I think we have no option but to ignore it, but I guess that technically that is not backwards compatible. What to do here?

@greschd
Copy link
Member

greschd commented Mar 31, 2020

Good point, I agree with that behavior.

Regarding the type_string: Just ignoring it is dangerous, because it will silently break existing code (if any existing code does use that). I think the options are:

  • Explicitly break it, as is done now. This makes sense if there is actually no code using it.
  • Keep the current behavior of type_string (this is essentially just an attribute now, right?) and add a deprecation warning. We can use a different name (type_label, entry_point, or something like that) for the new behavior.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 31, 2020

Good point, I agree with that behavior.

Okay, I will try to implement that

Regarding the type_string: Just ignoring it is dangerous, because it will silently break existing code (if any existing code does use that). I think the options are:

* Explicitly break it, as is done now. This makes sense if there is actually no code using it.

* Keep the current behavior of `type_string` (this is essentially just an attribute now, right?) and add a deprecation warning. We can use a different name (`type_label`, `entry_point`, or something like that) for the new behavior.

No the type_string is an actual database column of the DbGroup database model. We don't need a different name because it simply should never be passed in again. the type_string is now determined by the class itself, which through the metaclass will base this on the corresponding entry point name. The problem also by still accepting the type_string is that we are allowing exactly that what we are forbidding when instantiating an instance of a sub class that is not registered. So then we also run into inconsistencies. However, come to think of it, except for a warning when loading, the code should still work the same.

Just an example to run through it. Currently people can do:

from aiida.orm import Group
group = Group(label='label', type_string='custom.type').store()
group.backend_entity.dbmodel.type_string   # 'custom.type'

If we add these changes, when running this code, at construction they will get a deprecation warning saying that the type_string argument is deprecated. When they load one of the nodes, they will get another warning saying that the aiida.groups:custom.type entry point cannot be found, but it will fall back on the Group class, so the behavior remains the same. At that point they could even implement the CustomTypeGroup(Group) class and add the entry point custom.type = some.path:CustomTypeGroup and the warnings will disappear. I think this is fair enough then, correct?

@greschd
Copy link
Member

greschd commented Mar 31, 2020

So instead of ignoring it, you would still use the explicitly passed type_string to override the automatic one, right?

Yeah, that makes sense to me - probably makes sense for someone else to give his comments on this part, too.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 2, 2020

So instead of ignoring it, you would still use the explicitly passed type_string to override the automatic one, right?

Yeah, that makes sense to me - probably makes sense for someone else to give his comments on this part, too.

The only problem that I realize just now, is that will essentially create new groups with the old type strings. For example, currently one of the types is data.upf.family that with the migration of this PR we will properly change to core.upf. However, if we just keep accepting these custom type strings, people will keep creating groups with incorrect type strings. Maybe it is not so bad however, since they can still be loaded, since the loader will fall back onto base Group class, but still it is a bit inconsistent and messy. But there is no way around this, without breaking backwards compatibility, right? @giovannipizzi what do you think?

Another design problem I am facing is where to put the warning/exception when trying to store a Group with an incorrect type string. To keep in line with the choices for Node sub classes in #3886 we want to keep the failure as late as possible, i.e. in the store. However, in combination with the deprecated type_string this becomes a bit tricky to implement. In the meta class, where the type_string is determined automatically by reverse engineering the entry point, I currently just warn if that fails and set None. Normally then I would just check in the store that the type_string is not None, and therefore the class was properly registered. However, with this explicit type_string being passed in the constructor, I can issue a warning that it is deprecated, but then have to override the type_string with this value. The check in the store will then just pass and we will have to accept it and just store it. However, I feel this nullifies the check a bit. One can then always just pass the type_string in the constructor directly and bypass the check. I don't know what to do, help ✋

@sphuber sphuber force-pushed the fix_3567_group_entry_points branch from 8d2bc61 to 874feac Compare April 2, 2020 14:10
@codecov
Copy link

codecov bot commented Apr 2, 2020

Codecov Report

Merging #3882 into develop will decrease coverage by 0.03%.
The diff coverage is 92.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3882      +/-   ##
===========================================
- Coverage    78.08%   78.05%   -0.04%     
===========================================
  Files          458      460       +2     
  Lines        33916    33957      +41     
===========================================
+ Hits         26485    26505      +20     
- Misses        7431     7452      +21     
Flag Coverage Δ
#django 70.06% <76.00%> (-0.08%) ⬇️
#sqlalchemy 70.91% <80.80%> (-0.06%) ⬇️
Impacted Files Coverage Δ
aiida/orm/implementation/groups.py 72.30% <0.00%> (ø)
aiida/plugins/entry_point.py 88.18% <ø> (ø)
aiida/cmdline/commands/cmd_group.py 78.53% <60.00%> (-11.24%) ⬇️
aiida/orm/nodes/data/upf.py 80.42% <75.00%> (-0.41%) ⬇️
aiida/tools/groups/paths.py 90.50% <83.33%> (+0.14%) ⬆️
aiida/plugins/factories.py 98.87% <87.50%> (-1.13%) ⬇️
aiida/orm/groups.py 91.66% <97.43%> (+4.27%) ⬆️
...s/djsite/db/migrations/0044_dbgroup_type_string.py 100.00% <100.00%> (ø)
aiida/backends/djsite/db/migrations/__init__.py 68.83% <100.00%> (ø)
...tions/versions/bf591f31dd12_dbgroup_type_string.py 100.00% <100.00%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b14243e...9c97376. Read the comment docs.

@sphuber sphuber force-pushed the fix_3567_group_entry_points branch 4 times, most recently from c3906c6 to 98db2ad Compare April 5, 2020 15:13
@sphuber
Copy link
Contributor Author

sphuber commented Apr 5, 2020

@ltalirz @greschd @giovannipizzi and @greschd : I have updated the PR with the final behavior as discussed last Friday. In the OP of this PR you will find the descrription of each decision taken. I have added some more tests and documentation. So this should be good for review.

@sphuber sphuber force-pushed the fix_3567_group_entry_points branch 2 times, most recently from 377994f to e1e3463 Compare April 6, 2020 13:50
@yakutovicha yakutovicha self-requested a review April 7, 2020 07:18
yakutovicha
yakutovicha previously approved these changes Apr 7, 2020
Copy link
Contributor

@yakutovicha yakutovicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sphuber, very nice work!

Before I forget: there are two typos in the commit message:

If the `GroupMeta` metclass fails cannot retrieve the corresponding

metclass -> metaclass
fails cannot -> cannot

I looked at the discussion, code, tested the Groups behaviour and it all makes total sense to me. Especially the fact that groups with old type strings can still be loaded, but they become a standard Group, which was the case before. Also, the way deprecated type_string argument is handled I find very good.

I also see a potential problem here, if (and there is a big if here) there is a user who has been working with groups of custom type_string, he/she won't be able to add more of such groups. But I think this is acceptable, as the user can make a new group and add nodes from the old one into it.

$ verdi group show test_group
/Users/yakutovicha/Documents/aiida-core/aiida/orm/groups.py:42: UserWarning: could not load entry point `ololo`, falling back onto `Group` base class.
  warnings.warn(message)  # pylint: disable=no-member
-----------------  ----------------
Group label        test_group
Group type_string  ololo
Group description  <no description>
-----------------  ----------------
# Nodes:
  PK  Type    Created
----  ------  ----------
   1  Dict    2m:15s ago

$ verdi group create test_group_new
Success: Group created with PK = 2 and name 'test_group_new'

$ verdi group copy  test_group test_group_new
/Users/yakutovicha/Documents/aiida-core/aiida/orm/groups.py:42: UserWarning: could not load entry point `ololo`, falling back onto `Group` base class.
  warnings.warn(message)  # pylint: disable=no-member
Success: Nodes copied from group<test_group> to group<test_group_new>

$ verdi group show test_group_new
-----------------  ----------------
Group label        test_group_new
Group type_string  core
Group description  <no description>
-----------------  ----------------
# Nodes:
  PK  Type    Created
----  ------  -----------
   1  Dict    22m:16s ago

I am going to approve.

@sphuber sphuber force-pushed the fix_3567_group_entry_points branch 3 times, most recently from a92be21 to b2086b8 Compare April 8, 2020 19:36
@sphuber sphuber requested a review from chrisjsewell April 8, 2020 19:55
aiida/orm/groups.py Outdated Show resolved Hide resolved
aiida/orm/autogroup.py Show resolved Hide resolved
aiida/tools/groups/paths.py Show resolved Hide resolved
aiida/tools/groups/paths.py Outdated Show resolved Hide resolved
aiida/tools/groups/paths.py Outdated Show resolved Hide resolved
aiida/tools/groups/paths.py Outdated Show resolved Hide resolved
We add the `aiida.groups` entry point group where sub classes of the
`aiida.orm.groups.Group` class can be registered. A new metaclass is
used to automatically set the `type_string` based on the entry point of
the `Group` sub class. This will make it possible to reload the correct
sub class when reloading from the database.

If the `GroupMeta` metaclass cannot retrieve the corresponding entry
point of the subclass, a warning is issued that any instances of this
class will not be storable and the `__type_string` attribute is set to
`None`. This can be checked by the `store` method which will make it
fail. We choose to only except in the `store` method such that it is
still possible to define and instantiate subclasses of `Group` that have
not yet been registered. This is useful for testing and experimenting.

Since the group type strings are now based on the entry point names, the
existing group type strings in the database have to be migrated:

 * `user` -> `core`
 * `data.upf.family` -> `core.upf`
 * `auto.import` -> `core.import`
 * `auto.run` -> `core.auto`

When loading a `Group` instance from the database, the loader will try
to resolve the type string to the corresponding subclass through the
entry points. If this fails, a warning is issued and we fallback on the
base `Group` class.
@sphuber sphuber force-pushed the fix_3567_group_entry_points branch from b2086b8 to 9c97376 Compare April 9, 2020 07:46
@sphuber sphuber merged commit d609e5e into aiidateam:develop Apr 9, 2020
@sphuber sphuber deleted the fix_3567_group_entry_points branch April 9, 2020 09:11
bosonie added a commit to siesta-project/aiida_siesta_plugin that referenced this pull request May 19, 2020
…up system.

From aiida 1.2.0 the class `Group` have been modified to implement new features.
This has implications in our plugin because the PseudoFamilies are Groups.
In particular there is now an explicit implementation of PsmlFamily and PsfFamily
as subclasses of the `Group` class. Moreover PsmlFamily and PsfFamily are
registred as entry points making possible to reload the correct
sub class when reloading from the database.
For more info see: aiidateam/aiida-core#3882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Group class extensible through plugins
5 participants