`Flow` + `Job` magic methods #369

janosh · 2023-07-17T21:55:46Z

Closes #368.

from pymatgen.core import Lattice, Structure
from atomate2.vasp.flows.mp import MPMetaGGARelax

struct = Structure(
    lattice=Lattice.cubic(3),
    species=("Fe", "Fe"),
    coords=((0, 0, 0), (0.5, 0.5, 0.5)),
)
flow = MPMetaGGARelax().make(structure)

len(flow)
>>> 3
print(flow)
>>> Flow(name='MP Meta-GGA Relax', uuid='d43cd776-fb64-40f0-bacf-1933ee2342b8')
consisting of 3 jobs:
  1. Job(name='MP pre-relax', uuid='99a016af-eb9d-4d7a-8454-85ecf6b59d41')
  2. Job(name='MP meta-GGA Relax', uuid='496006d6-5375-49ce-91a8-cf53483173e1')
  3. Job(name='MP meta-GGA Static', uuid='8fe17841-a536-4d39-9bfa-dd83bc7900a2')

utf · 2023-07-18T09:50:55Z

This is great! Thanks very much.

I notice from the tests that I assumed that job comparison would depend not only on the UUIDs but also on the specific function arguments. I think this is just relying on the default dataclass __eq__() operator. See: https://docs.python.org/3/library/dataclasses.html#module-contents

Personally, I think having stricter equality is better approach but I am open to other thoughts.

davidwaroquiers · 2023-07-18T09:58:44Z

This is indeed nice. Just to comment or push the discussion a bit further, what "should" be the len of a Flow if it is made of other Flows ? Could be the number of "steps" (jobs or flows) or the number of "flattened" jobs. I think it should be made clear what it is. Same for the print, what happens when you'd print a Flow with Flows in it. I guess it would/should print kind of an indented "structure" of the Flow with its subflows. Maybe it would be nice to have a choice on how "deep" you want to print.

Additional thought on the dynamical aspect. The Flow as such only exist at creation time while it will disappear once inserted into the database. A some point it would be nice to be able to somehow have the information of this Flow stored somewhere. And the len of this Flow can evolve in that case (if you have additions, replace, ...).

I'm maybe mixing up different issues here but they are somehow related to each other.

utf · 2023-07-18T10:13:51Z

I think those are very good points @davidwaroquiers. To add to that, some comments:

Flow already has a method iterflow which iterates through the flow in such an order that the job output references can always be resolved. E.g., root nodes of the graph are always returned first.
The __iter__ method introduced here has different behaviour and only iterates through the jobs array (which can also contain other flows). This could be somewhat confusing.
If we change the behaviour of __iter__ and __len__ to be consistent with subflows, then this would make the behaviour of __setitem__ and __getitem__ inconsistent.

Maybe its ok to leave the implementation as is, but I agree that the behaviour should be clarified. It would be nice to add examples of how to use these methods in the docs somewhere.

janosh · 2023-07-18T22:18:08Z

Great comments, thanks @utf @davidwaroquiers.

This is indeed nice. Just to comment or push the discussion a bit further, what "should" be the len of a Flow if it is made of other Flows ? Could be the number of "steps" (jobs or flows) or the number of "flattened" jobs.

I considered returning the flattened length of a Flow but I think the default length should be just the number of items in the jobs list.
I think the user should have to ask for the flattened length explicitly. We could add a property

@property
def flat(self):
    ...

which would recursively unpack subflows. Then people can use len(flow.flat).

Same for the print, what happens when you'd print a Flow with Flows in it.

@davidwaroquiers That's already handled actually. If __str__ is undefined, Python automatically falls back on __repr__ (not vice versa) and since Flow.__repr__ puts every subflow in f"{subflow}" which calls __str__ on it which calls __repr__ on that subflow, it's already recursive. I just tweaked __repr__ to increase indentation for every level down in the subflow hierarchy for added readability. Now this prints:

from jobflow import Flow, Job


def add(a, b):
    return a + b

def get_test_job():
    return Job(add, function_args=(1, 2))

subflow1 = Flow([get_test_job()])
subflow2 = Flow([get_test_job(), get_test_job()])
subsubflow = Flow([get_test_job(), get_test_job()])
subflow3 = Flow([get_test_job(), get_test_job(), subsubflow])
print(Flow([subflow1, subflow2, subflow3]))

Flow(name='Flow', uuid='f7ae544a-7e8d-4976-b768-13e431199fa6') consisting of 3 jobs:
  1. Flow(name='Flow', uuid='9e7b6459-adca-48b1-8c00-b86e15876a38') consisting of 1 jobs:
    1.1. Job(name='add', uuid='0d4a4621-3249-4f94-bdb1-a83910982921')
  2. Flow(name='Flow', uuid='d2b1cdbe-0075-4bba-8686-3851fe81a4b0') consisting of 2 jobs:
    2.1. Job(name='add', uuid='bf687f6b-3e5b-4e48-b579-e1646b0707c7')
    2.2. Job(name='add', uuid='9422bbf9-8c91-4153-a19b-5aad36d96431')
  3. Flow(name='Flow', uuid='b62ebaaf-bea2-489f-8e88-75a2eebfa7a9') consisting of 3 jobs:
    3.1. Job(name='add', uuid='5fae9b50-8051-438e-b295-14298569c527')
    3.2. Job(name='add', uuid='1dbba533-4cc7-4cef-be76-d9d773cfaf80')
    3.3. Flow(name='Flow', uuid='b6526c18-ce3c-48c2-ae26-6207859ccb46') consisting of 2 jobs:
      3.3.1. Job(name='add', uuid='c0c2eb67-65ab-4bcc-a4cc-465dd8898f2c')
      3.3.2. Job(name='add', uuid='42e8f5d2-42d1-4885-a627-13232362447d')

I'll have to get back to the other comments later.

janosh · 2023-07-18T22:19:24Z

Looking at it now, I guess we can drop the consisting of n jobs part.

janosh · 2023-07-29T01:12:19Z

Just had time to take another look at this.

`eq`

I'm a bit torn about Job.__eq__ on the one hand I agree two flows should be different if they have different attributes. On the other hand, what's the point of a UUID if it doesn't uniquely identify a flow? UUID stands for universally unique identifier.

@utf Btw, Jobs are not dataclasses so before they had the default Python behavior of only being equal if they're literally the same object in memory.

jobflow/src/jobflow/core/job.py

Lines 219 to 221 in 9928fdb

    
           class Job(MSONable): 
        
               """

`iterflow`

I actually think it's a plus to have two ways of iterating through a Flow. As you said @utf, we just need to be clear in the docs on how for job in flow.iterflow and for job in flow differ. But I can see use cases for either one. pandas also has many different ways of iterating though a dataframe (df.iterrows, df.items, df.itertuples, for col in df, df.keys and maybe more).

`repr`

@davidwaroquiers The problem with __repr__ kwargs like depth: int is people usually aren't aware of them. They just print the object or return it from a Jupyter cell in which case Python/ipykernel implicitly calls __repr__ on that object without passing any arguments. So I don't think there would be much value in a depth keyword. To use it would mean

print(flow.__repr__(depth=2)

Happy to take further suggestions esp. about __eq__ but for now I'll start adding tests and docs for the implementation as is.

…ic operators

still unfinished

Shallow copy doesn't make sense; jobs aren't allowed to belong to multiple flows

utf · 2023-08-07T16:54:35Z

On the __eq__ point, due to how some of the dynamic behaviour works (specifically the replace functionality), multiple jobs can have the same uuid but different indexs.

So maybe for __eq__ the check could just be whether both uuid and index match.

…for equality

janosh · 2023-08-07T17:00:30Z

So maybe for eq the check could just be whether both uuid and index match.

Oh bad timing. I just changed it to self.__dict__ == other.__dict__. 🤦 Happy to revert in which case this test would need to be changed to expect ==:

jobflow/tests/core/test_job.py

Line 430 in 7a19168

assert test_job != resolved_job

codecov · 2023-08-07T17:01:42Z

Codecov Report

Merging #369 (781d3b5) into main (108d2f4) will increase coverage by 0.00%.
Report is 3 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #369   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files          19       19           
  Lines        1439     1505   +66     
  Branches      364      374   +10     
=======================================
+ Hits         1437     1503   +66     
  Misses          1        1           
  Partials        1        1

Files Changed	Coverage Δ
src/jobflow/core/reference.py	`100.00% <ø> (ø)`
src/jobflow/core/flow.py	`100.00% <100.00%> (ø)`
src/jobflow/core/job.py	`100.00% <100.00%> (ø)`

janosh · 2023-08-07T17:26:24Z

@utf Other than the __eq__ decision, I think this is in a good state and ready to go.

utf · 2023-08-07T17:31:34Z

Thanks @janosh. I'm ok with the current __eq__ implementation but we could always switch it in a future PR if there is a strong need.

I don't mean to be a stickler, but is it possible to add a few more tests so that the test coverage doesn't downgrade. Part of the issue is that you have changed:

if typing.TYPE_CHECKING

to

if TYPE_CHECKING

in some cases, and that means the codecov is moaning about not checking those lines. The simple fix is to add the latter to the coverage exception list here:

jobflow/pyproject.toml

Lines 111 to 115 in 5602cfd

    
           exclude_lines = [ 
        
               '^\s*@overload( |$)', 
        
               '^\s*assert False(,|$)', 
        
               'if typing.TYPE_CHECKING:', 
        
           ]

But there are a couple of if statements etc that are not currently covered.

janosh · 2023-08-07T17:38:34Z

On it!

janosh · 2023-08-07T18:17:12Z

@utf Coverage is back up to 100%. 🎉

utf · 2023-08-13T10:33:37Z

Brilliant, thank you!

janosh added enhancement New feature or request ux User experience labels Jul 17, 2023

janosh mentioned this pull request Jul 25, 2023

Flow + Job magic methods #368

Closed

janosh added 15 commits August 7, 2023 08:44

implement Flow __len__, __contains__, __iter__, __repr__ and arithmet…

9087529

…ic operators

implement Job implement __contains__, __hash__, __repr__, __eq__

5f105d4

increase Flow.__repr__ indentation for every level of subflow

e0c1555

drop 'consisting of n jobs' from repr

f9d3ece

add Flow.jobs setter

771869d

__setitem__ raise TypeError if value is not a Job or Flow

a92252e

Flow.__sub__ raise ValueError if the value is not in the flow

11a02ce

fix line too long

c3ca465

add test_flow_magic_methods()

2fb76d7

add test_flow_magic_methods_edge_cases()

d38c8e0

still unfinished

add test_flow_repr()

387da32

self.__class__.__name__ -> type(self).__name__

47a0347

use broader typing.Sequence for type hints

763033e

rewrite __copy__ to __deepcopy__

e5aaf06

Shallow copy doesn't make sense; jobs aren't allowed to belong to multiple flows

finish test_flow_magic_methods_edge_cases()

1cf1b51

janosh force-pushed the flow-job-magic-methods branch from aadb604 to 1cf1b51 Compare August 7, 2023 15:46

janosh added 2 commits August 7, 2023 09:46

Merge branch 'main' into flow-job-magic-methods

8d3f400

fix mypy

2094ecd

janosh added 2 commits August 7, 2023 09:56

fix bad import auto-complete

8fd0925

make Job.__eq__ more strict: require self.__dict__ == other.__dict__ …

7a19168

…for equality

janosh added 2 commits August 7, 2023 10:03

revert accidental auto-fix of implicit optional

59584ca

add test_job_magic_methods()

9940df0

coverage ignore if TYPE_CHECKING

e26a924

janosh added 5 commits August 7, 2023 10:45

move test error msg for multiple stores to test_store_inputs()

955c6b8

add test for Flow.__setitem__ raising TypeError

6dd27bd

add test Flow.__setitem__ with single item and __add__ with bad type

1307d30

delete Job.__getitem__

962ad44

test passing single job to @jobs setter

94fdb64

document difference between flow.iterflow() and flow.__iter__()

781d3b5

utf merged commit 10aad48 into main Aug 13, 2023
8 checks passed

janosh deleted the flow-job-magic-methods branch August 13, 2023 15:20

gpetretto mentioned this pull request Aug 15, 2023

Problem with Flow's __deepcopy__ method #398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Flow` + `Job` magic methods #369

`Flow` + `Job` magic methods #369

janosh commented Jul 17, 2023

utf commented Jul 18, 2023

davidwaroquiers commented Jul 18, 2023

utf commented Jul 18, 2023

janosh commented Jul 18, 2023

janosh commented Jul 18, 2023

janosh commented Jul 29, 2023

utf commented Aug 7, 2023

janosh commented Aug 7, 2023

codecov bot commented Aug 7, 2023 •

edited

Loading

janosh commented Aug 7, 2023

utf commented Aug 7, 2023

janosh commented Aug 7, 2023

janosh commented Aug 7, 2023

utf commented Aug 13, 2023

Flow + Job magic methods #369

Flow + Job magic methods #369

Conversation

janosh commented Jul 17, 2023

utf commented Jul 18, 2023

davidwaroquiers commented Jul 18, 2023

utf commented Jul 18, 2023

janosh commented Jul 18, 2023

janosh commented Jul 18, 2023

janosh commented Jul 29, 2023

__eq__

iterflow

__repr__

utf commented Aug 7, 2023

janosh commented Aug 7, 2023

codecov bot commented Aug 7, 2023 • edited Loading

Codecov Report

janosh commented Aug 7, 2023

utf commented Aug 7, 2023

janosh commented Aug 7, 2023

janosh commented Aug 7, 2023

utf commented Aug 13, 2023

`Flow` + `Job` magic methods #369

`Flow` + `Job` magic methods #369

`eq`

`iterflow`

`repr`

codecov bot commented Aug 7, 2023 •

edited

Loading