Skip to content
This repository has been archived by the owner on Sep 13, 2024. It is now read-only.

Option to output parent/child aligned single cell profiles instead of image/object number #8

Open
gwaybio opened this issue Apr 26, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@gwaybio
Copy link
Member

gwaybio commented Apr 26, 2021

Currently cytominer_transport/_generator.py combines objects based on "ImageNumber" and "ObjectNumber".

Instead, we should consider combining objects by their appropriate "Parent_{compartment}" and "Child_{compartment}".

Pros

  • The profilers are increasingly using single cells as the analytical unit, instead of aggregated profiles
  • if cytominer_transport could output analysis ready single cell data, it will save profilers loads of time (it'll basically obviate a pycytominer step of taking in parquet files, rearranging rows based on parent/child columns and then outputting a new file

Cons

  • If some experiments have a high discrepancy between objects (e.g. 20,000 measurements in one object aligned to a total of 500 measurements in a separate object) will cause a much larger file size
    • Although I am not entirely sure this is not happening even now with pandas.concat(axis=1)

concatenated_object_records = pandas.concat(
[concatenated_object_records, object_records], axis=1
)

Notes

@gwaybio gwaybio added the enhancement New feature or request label Apr 26, 2021
@gwaybio
Copy link
Member Author

gwaybio commented Apr 26, 2021

@bethac07
Copy link
Member

A complication here is that, in the case of multiple objects, there can be many relationship types-

  • entirely uncorrelated (A and B are identified separately and have no parent/child relationship at all)
  • one-or-zero-to-any (large parent objects with zero-to-many smaller child objects in it; some small objects may have no parent)
  • one-to-zero-or-one (an original parent object filtered into 0-or-1 child objects, like Cells into GFPPositiveCells)
  • one-to-one (Nuclei into Cells, then Nuclei and Cells into Cytoplasms, guaranteed that all objects are one-to-one; this is the condition that what you linked supports)

I think the logic to detect and/or support all of them might be rough; we should consider how to handle each of them, and whether or not the user needs to pass a flag or a configuration or something to make it easier to decide.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants