Skip to content

February 11, 2021

Tiffany J. Callahan edited this page Oct 30, 2023 · 28 revisions

PKT Human Disease Knowledge Graph Benchmark Builds (v2.0.0)

Build Date: February 11, 2021

Resources:


KG Benchmark Builds can also be obtained from Zenodo:


🗂 For additional information on the KG file types please see the following Wiki page


Required Input Documents


Data


Logs: pkt_builder_phases12_log.log

Original Data Processed Data
Metadata
downloaded_build_metadata.txt preprocessed_build_metadata.txt
ontology_cleaning_report.txt
Files
9606.protein.links.v11.0.txt
COMBINED.DEFAULT_NETWORKS.BP_COMBINING.txt
CTD_chem_gene_ixns.tsv
CTD_chem_go_enriched.tsv
CTD_chemicals_diseases.tsv
CTD_genes_pathways.tsv
ChEBI2Reactome_All_Levels.txt
GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct
Homo_sapiens.GRCh38.102.entrez.tsv
Homo_sapiens.GRCh38.102.gtf
Homo_sapiens.GRCh38.102.uniprot.tsv
Homo_sapiens.gene_info
ReactomePathways.txt
UniProt2Reactome_All_Levels.txt
chebi_with_imports.owl
clo_with_imports.owl
compath_canonical_pathway_mappings.txt
curated_gene_disease_associations.tsv
disease_mappings.tsv
ext_with_imports.owl
gene_association.reactome
genomic_sequence_ontology_mappings.xlsx
genomic_typing_dict.pkl
go_with_imports.owl
goa_human.gaf
hgnc_complete_set.txt
hp_with_imports.owl
human_pro_classes.html
kegg_reactome.csv
mesh2021.nt
mondo_with_imports.owl
names.tsv
phenotype.hpoa
pr_with_imports.owl
promapping.txt
proteinatlas_search.tsv
pw_with_imports.owl
ro_with_imports.owl
so_with_imports.owl
uniprot-cofactor-catalyst.tab
uniprot_identifier_mapping.tab
variant_summary.txt
vo_with_imports.owl
zooma_tissue_cell_mapping_04JAN2020.xlsx
CLINVAR_VARIANT_GENE_DISEASE_PHENOTYPE_EDGES.txt
DISEASE_MONDO_MAP.txt
ENSEMBL_GENE_ENTREZ_GENE_MAP.txt
ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt
ENTREZ_GENE_ENSEMBL_TRANSCRIPT_MAP.txt
ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt
GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt
HPA_GTEX_RNA_GENE_PROTEIN_EDGES.txt
HPA_GTEx_TISSUE_CELL_MAP.txt
HPA_tissues.txt
INVERSE_RELATIONS.txt
MESH_CHEBI_MAP.txt
Merged_gene_rna_protein_identifiers.pkl
PHENOTYPE_HPO_MAP.txt
PheKnowLator_MergedOntologies.owl
REACTOME_PW_GO_MAPPINGS.txt
RELATIONS_LABELS.txt
SO_GENE_TRANSCRIPT_VARIANT_TYPE_MAPPING.txt
STRING_PRO_ONTOLOGY_MAP.txt
UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt
UNIPROT_PROTEIN_CATALYST.txt
UNIPROT_PROTEIN_COFACTOR.txt
chebi_with_imports.owl
clo_with_imports.owl
ensembl_identifier_data_cleaned.txt
ext_with_imports.owl
go_with_imports.owl
hp_with_imports.owl
human_pro.owl
mondo_with_imports.owl
node_metadata_dict.pkl
pr_with_imports.owl
pw_with_imports.owl
ro_with_imports.owl
so_with_imports.owl
subclass_construction_map.pkl
vo_with_imports.owl



Knowledge Graphs


🚨 Scroll to the right 👉 to see all of the available data 🚨

Instance-based Build
Standard Relations Inverse Relations
OWL OWL-NETS OWL OWL-NETS
Logs
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
Metadata
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
Data Files
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL.nt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_AnnotationsOnly.nt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_LogicOnly.nt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Integers.txt
node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl

PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS.nt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Integers.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_decoding_dict.pkl

PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified.nt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Integers.txt
PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_decoding_dict.pkl

node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL.nt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_AnnotationsOnly.nt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_LogicOnly.nt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Integers.txt
node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl

PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS.nt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Integers.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_decoding_dict.pkl

PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified.nt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_NodeLabels.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Integers.txt
PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_decoding_dict.pkl

node_metadata_dict.pkl
Class-based Build
Standard Relations Inverse Relations
OWL OWL-NETS OWL OWL-NETS
Logs
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
pkt_build_log.log
subclass_map_log.json
Metadata
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
edge_source_metadata.txt
ontology_source_metadata.txt
Data Files
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL.nt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_AnnotationsOnly.nt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_LogicOnly.nt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integers.txt
node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl

PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS.nt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Integers.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_decoding_dict.pkl

PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified.nt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Integers.txt
PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_decoding_dict.pkl

node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL.nt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_AnnotationsOnly.nt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_LogicOnly.nt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integers.txt
node_metadata_dict.pkl
Master_Edge_List_Dict.json
PheKnowLator_MergedOntologies.owl

PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS.nt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Integers.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_decoding_dict.pkl

PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified.nt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_NetworkxMultiDiGraph.gpickle
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_NodeLabels.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Identifiers.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Integer_Identifier_Map.json
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Integers.txt
PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_decoding_dict.pkl

node_metadata_dict.pkl

Important Build Updates

We provide several different types of output, each of which is described briefly below. Please note that in order to create the logic (XXXX_OWL_LogicOnly.nt) and annotation (XXXX_OWL_AnnotationsOnly.nt) subsets of each graph and be able to combine them (XXXX_OWL.nt) we have added a namespace to all BNode or anonymous nodes. More specifically, there are two kinds of pkt namespaces you will find within these files:

  1. https://github.com/callahantiff/PheKnowLator/pkt/. This namespace is used for all non-ontology data defined owl:Class and owl:NamedIndividual objects that are added in order to integrate non-ontological entities (see here for more information).
  2. https://github.com/callahantiff/PheKnowLator/pkt/bnode/. This namespace is used for all existing BNode or anonymous nodes and is applied to these types of entities prior to subsetting an input graph.

To remove the second type of namespacing from BNode that are part of the original ontologies used in each build, you can run the code shown below:

from pkt.utils import removes_namespace_from_bnodes

# remove bnode namespaces
updated_graph = removes_namespace_from_bnodes(org_graph) 

Please also note that for all builds prior to v3.0.2, there are 2,008 nodes in the NodeLabels.txt files that contain foreign characters. While there is now code in place to prevent this error from happening in the future, there is also a solution to account for the prior builds. The (bad_node_patch.json) file contains a dictionary where the outer keys are the entity_uri and the outer values are another dictionary where the inner keys are label and description/definition and the inner values for these inner keys are the updated strings without foreign characters. An example of this dictionary is shown below:

key = '<http://purl.obolibrary.org/obo/UBERON_0000468>'

print(bad_node_patch[key])
>>> {'label': 'multicellular organism', 'description/definition': 'Anatomical structure that is an individual member of a species and consists of more than one cell.'}

The code to identify the nodes with erroneous foreign characters is shown below:

import re
import pandas as pd

# link to downloaded `NodeLabels.txt` file
input_file = `'NodeLabels.txt'`

# load data as Pandas DataFrame
nodedf = pd.read_csv(input_file, sep='\t', header=0)

# identify bad nodes and filter DataFrame so it only contains these rows
nodedf['bad'] = nodedf['label'].apply(lambda x: re.search("[\u4e00-\u9FFF]", x) if not pd.isna(x) else None)
nodedf_bad_nodes = nodedf[~pd.isna(nodedf['bad'])].drop_duplicates()


Return to Top


Clone this wiki locally