Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to Add 'Normal' Record to bt.Disease #98

Open
Kang-chen opened this issue Aug 20, 2024 · 8 comments
Open

Request to Add 'Normal' Record to bt.Disease #98

Kang-chen opened this issue Aug 20, 2024 · 8 comments
Assignees

Comments

@Kang-chen
Copy link

Incredible work on this project!

I would like to kindly request the addition of a "normal" record to the bt.Disease .
Currently, this specific record appears to be missing, which could lead to inconsistencies when integrating with other datasets. But, in the laminlabs/cellxgene repository includes a "normal" record.
image

This addition could help streamline data integration and ensure that users can effectively merge and analyze datasets without encountering discrepancies.

Thank you for considering this enhancement!

@Zethson Zethson self-assigned this Aug 25, 2024
@Zethson Zethson transferred this issue from laminlabs/cellxgene-lamin Aug 25, 2024
@Zethson
Copy link
Member

Zethson commented Aug 25, 2024

Dear @Kang-chen

I'm happy that you find our packages useful! I've transferred this to the bionty repository where I think this fits a bit better.

  1. We'll discuss internally whether we want to support default or control values for ontologies out of the box. I find them to be very useful but at the same time we don't want to modify the ontologies for no reason.
  2. You can easily create your own bt.Disease record and save it to your database with control or normal for the name field. normal = bt.Disease(name="normal").save() should do the trick.
  3. In this case, we used the https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPATO_0000461?lang=en value to seed the information of the Disease record which you have seen above. It is the closest information to a normal Disease record that we could find. However, it's not perfect either. I do see how, for integration purposes, it would be best to have only a single normal or control record which is why I will bring this up for an internal discussion.

Please keep the feedback coming! I hope this is helpful. If not, please don't be afraid to get back to us.

@ck-HN
Copy link

ck-HN commented Aug 29, 2024

Thank you for the detailed explanation! I appreciate the internal discussion regarding the inclusion of a "normal" record.

I have a follow-up question regarding the precise retrieval of synonyms. Specifically, when I execute a query like bt.Disease.search("normal", field="synonyms", limit=5).df(), the first entry is not the one where I manually added "normal" as a synonym.
image

Could you please clarify how to perform a more precise search for synonyms to ensure that the most relevant records, especially those where "normal" is explicitly listed as a synonym, appear at the top of the results?

Thanks again for your assistance!

@Zethson
Copy link
Member

Zethson commented Aug 30, 2024

Dear @ck-HN

the issue here is that our search also searches for this query in the description. So for the disease cutaneous fibrous histiocytoma -> the description also contains morphologic abnormality twice which contains the query normal. However, this is totally misleading and not what you're looking for.

We have already noticed that our search does not quite perform as well as our search on the user interface and we will take this as a chance to improve it. I am afraid that it might take a bit of time though as we are currently also working on other tasks.

I'll report back!

@falexwolf
Copy link
Member

I might be missing something: @Zethson, the search restricted to synonyms should not look into description and I'd be surprised if it did.

My guess is that synonyms also contains normal in the record shown first (disease id=1).

@Zethson
Copy link
Member

Zethson commented Aug 30, 2024

@falexwolf I had mixed up the column headers 🤦 . Sorry, the hits are indeed in synonyms and not in definition or description. Nevertheless, the synonyms include morphologic abnormality and not normal and therefore it's not as precise as it should be.

@falexwolf
Copy link
Member

That is astonishing. 🤔 abnormality should be ranked lower also in the "simpler" client-side search. Can I reproduce this via lamin load laminlabs/cellxgene and bt.Disease.search("normal", field="synonyms", limit=5).df()?

@ck-HN
Copy link

ck-HN commented Sep 1, 2024

Hi @Zethson ,

When importing datasets, I often encounter the error: ValueError: cannot assign a synonym that is already associated with a record to a different record. Since I need to ensure that the same synonyms have consistent ontology, I've been using add_synonym.

I understand you have other development plans, but could you suggest a workaround to help me avoid this issue in the meantime? Thanks a lot!

@sunnyosun
Copy link
Member

Hi @ck-HN,

Using add_synonym is the recommended approach. However, in your case, you seem to be trying to add the same synonym to different records. We don't allow this as it will mess up the synonym standardization.

When you get the error, it prints a list of records that are already associate with this synonym. You need to remove_synonym for these records before you can associate this synonym to a new record.

Hope this helpes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants