Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dataset IDs and Image IDs when target is Project #65

Open
will-moore opened this issue Jan 10, 2022 · 1 comment
Open

Support Dataset IDs and Image IDs when target is Project #65

will-moore opened this issue Jan 10, 2022 · 1 comment

Comments

@will-moore
Copy link
Member

will-moore commented Jan 10, 2022

Currently, this csv works with a target of Dataset, with Image IDs validated and set to -1 if invalid:

# header image,d,l,s
Image,ROI_Area,Channel_Index,Channel_Name
1277,0.0469,1,DAPI
1278,0.142,2,GFP-AuroraB
1279,0.093,3,TRITC-edit
1280,0.112233,4,cy5
1,100.1,5,Invalid-Image-ID

However, if I try to use a Project as target, with a Dataset column this fails (same result if the header column type is l):

# header image,dataset,d,l,s
Image,Dataset,ROI_Area,Channel_Index,Channel_Name
1277,211,0.0469,1,DAPI
1278,211,0.142,2,GFP-AuroraB
1279,211,0.093,3,TRITC-edit

with:

  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1069, in preprocess_from_handle
    self.preprocess_data(reader)
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1167, in preprocess_data
    self.post_process()
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1343, in post_process
    did = int(columns_by_name["dataset"].values[i])

This occurs during the preprocess_data(), where we are trying to find the longest Dataset Name, so that the Dataset Name column can be created of the correct size (before we populate the table).
This is trying to look-up Image Name for a row that only has Image ID. However, to get Image Names from a ProjectWrapper, we first need the Dataset ID, since images_by_id is self.images_by_id[did][iid] = image.
However, we don't really need to group the Image IDs under Dataset ID since they are globally unique.

Trying the same with HCS data...

With a Screen...

# header image,plate,d,l,s
Image,Plate,ROI_Area,Channel_Index,Channel_Name
3559,251,0.0469,1,DAPI
3560,251,0.142,2,GFP-AuroraB
3563,251,0.093,3,TRITC-edit
$ omero metadata populate Screen:601 --file test-image-plate-ids.csv -vvv
...
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1127, in parse
    self.preprocess_from_handle(f1)
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1069, in preprocess_from_handle
    self.preprocess_data(reader)
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1167, in preprocess_data
    self.post_process()
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1385, in post_process
    pid = columns_by_name['plate'].values[i]
IndexError: list index out of range

or with the header plate column set to l I get:

  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1167, in preprocess_data
    self.post_process()
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 1386, in post_process
    iname = self.value_resolver.get_image_name_by_id(iid, pid)
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 389, in get_image_name_by_id
    return self.wrapper.get_image_name_by_id(iid, pid)
  File "/Users/wmoore/Desktop/METADATA/omero-metadata/src/omero_metadata/populate.py", line 568, in get_image_name_by_id
    raise Exception("Cannot resolve image to plate")

None of the usage of populate metadata with Image IDs is yet documented in the README or covered in the tests, so it's hard to know if any of this is supported?

cc @emilroz @stick @sbesson

@will-moore
Copy link
Member Author

I'm seeing the same issues when trying to support ROI IDs for Dataset. #62
Currently, the workaround is to not add a Roi Name column when we find a roi column IF the target class is a Dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant