Skip to content
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.

make ontology work with any kind of parental relationships #540

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkobject
Copy link

makes it much slower if one chooses something like "part_of" for certain kinds of ontologies (90mns for "part_of" on all tissues)

Copy link

netlify bot commented Feb 20, 2024

Deploy Preview for bionty-base-091d ready!

Name Link
🔨 Latest commit fdd2c4c
🔍 Latest deploy log https://app.netlify.com/sites/bionty-base-091d/deploys/65d4bd1442ee3500098d418e
😎 Deploy Preview https://deploy-preview-540--bionty-base-091d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sunnyosun
Copy link
Member

sunnyosun commented Feb 26, 2024

Thank you for this PR @jkobject ! Just that it won't have an effect yet for users who want customized parsing, since we pull the reference parquet files from the s3.

To enable user to define their own relationship types, we need to either enable a local mode for the reference files (the risk is that users won't be able to get new references automatically when it's updated); or we figure something out specifically for lamindb.

@Zethson @falexwolf how do you think we should go about this?

Related issue: laminlabs/bionty#37

@jkobject
Copy link
Author

I am not sure I understand what you mean but on my end doing: bt.Tissue.public().pronto().to_df(include_rel ="part_of") seems to work

@sunnyosun
Copy link
Member

I am not sure I understand what you mean but on my end doing: bt.Tissue.public().pronto().to_df(include_rel ="part_of") seems to work

Yes, this works, but it won't work for canonical accessors such as .from_public() or .from_values() or public.df().

Does accessing the df with pronto().to_df(...) alone already meet what you need?

@jkobject
Copy link
Author

I might be able to rewrite my code but right now it is true that I am using bt.Tissue.filter().df(include=["parents__ontology_id"]) and it wouldn't work for this..

@Zethson
Copy link
Member

Zethson commented Mar 5, 2024

@jkobject thank you for the PR and sorry for chipping in so late. I think I accidentally deleted the Github notification email.

So we could of course upload monstrous .parquet files to our S3 that include all relationships, but this will probably greatly increase the download and parsing times. Maybe there'd be a way to provide partial reading support for a subset of columns of the parquet file then if we store things appropriately.

I'd probably suggest a solution that involves lamindb?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants