Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a uPheno release product for data analysis #932

Open
matentzn opened this issue Mar 7, 2024 · 2 comments
Open

Create a uPheno release product for data analysis #932

matentzn opened this issue Mar 7, 2024 · 2 comments

Comments

@matentzn
Copy link
Collaborator

matentzn commented Mar 7, 2024

@pnrobinson requested a uPheno release product that we should add to the uPheno2 release before the end of March. Given this picture

image

I hope I understood correctly @pnrobinson that, given the above picture, you want the following table:

taxon upheno_id original_phenotype gene human_orthologue
NCBITaxon:10090 UPHENO:0034327 MP:0030719 ncbi.gene:68646 hgnc:26404

Is this correct?

@pnrobinson
Copy link

@matentzn this table would be exactly what we need!

@matentzn
Copy link
Collaborator Author

matentzn commented Mar 9, 2024

First draft

Code to generate Table
from neo4j import GraphDatabase

# Connect to the Neo4j database
bolt_url = "ASK_NICO"
driver = GraphDatabase.driver(bolt_url)

# Define the Cypher query
query = """
MATCH
(upheno:`biolink:PhenotypicFeature` WHERE upheno.id STARTS WITH "UPHENO:")<-[:`biolink:subclass_of`]-(phenotype:`biolink:PhenotypicFeature`)<-[gena:`biolink:has_phenotype`]-(gene:`biolink:Gene`)-[:`biolink:orthologous_to`]-(human_gene:`biolink:Gene` WHERE "NCBITaxon:9606" IN human_gene.in_taxon)
RETURN 
    upheno.id, 
    phenotype.id, 
    gene.id, 
    gena.negated,
    CASE WHEN gene.in_taxon IS NOT NULL AND size(gene.in_taxon) > 0 
         THEN REDUCE(s = "", x IN gene.in_taxon | s + x + CASE WHEN x <> gene.in_taxon[size(gene.in_taxon)-1] THEN "|" ELSE "" END) 
         ELSE "" END AS gene_in_taxon, 
    human_gene.id,
    gena.primary_knowledge_source,
    gena.publications
"""

# Run the query and print the results
data = []
with driver.session() as session:
    results = session.run(query)
    for record in results:
        data.append(record)

import pandas as pd
df = pd.DataFrame(data, columns=["upheno_grouping", "phenotype", "gene", "negated", "taxon", "human_orthologue", "source", "publications"])
df

Draft result:

upheno_grouping phenotype gene negated taxon human_orthologue source publications
UPHENO:0000508 ZP:0000606 ZFIN:ZDB-GENE-040426-1675 NCBITaxon:7955 HGNC:9721 infores:zfin ['ZFIN:ZDB-PUB-170311-8']
UPHENO:0000508 ZP:0000606 ZFIN:ZDB-GENE-040426-1675 NCBITaxon:7955 HGNC:30262 infores:zfin ['ZFIN:ZDB-PUB-170311-8']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00044068 NCBITaxon:6239 HGNC:12927 infores:wormbase ['PMID:16803962']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00009178 NCBITaxon:6239 HGNC:15664 infores:wormbase ['PMID:22073243']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00009178 NCBITaxon:6239 HGNC:15663 infores:wormbase ['PMID:22073243']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9984 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9983 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9982 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00022620 NCBITaxon:6239 HGNC:20165 infores:wormbase ['PMID:25635455']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00022620 NCBITaxon:6239 HGNC:17407 infores:wormbase ['PMID:25635455']

@pnrobinson if this works for you, you can do a first experiment with this table:

https://www.dropbox.com/scl/fi/zbjt48afy4efkbki8szy5/upheno_gene_human_orthologues.tsv?rlkey=yr0vl7ky3ldeaura8kllagubn&dl=0

@kevinschaper did all the heavy lifting, so THANK YOU!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants