February 11, 2021

PKT Human Disease Knowledge Graph Benchmark Builds (`v2.0.0`)

Build Date: `February 11, 2021`

Resources:

Associated GitHub Release: https://github.com/callahantiff/PheKnowLator/releases/tag/v2.0.0
PyPI Release: https://pypi.org/manage/project/pkt-kg/release/2.0.1/
DockerHub Build: https://hub.docker.com/repository/docker/callahantiff/pheknowlator

KG Benchmark Builds can also be obtained from Zenodo:

Class Builds
- Standard Relations
  - OWL: https://zenodo.org/doi/10.5281/zenodo.8180539
  - OWLNETS: https://zenodo.org/doi/10.5281/zenodo.8180545
- Inverse Relations
  - OWL: https://zenodo.org/doi/10.5281/zenodo.8180550
  - OWLNETS: https://zenodo.org/doi/10.5281/zenodo.8180555
Instance
- Standard Relations
  - OWL: https://zenodo.org/doi/10.5281/zenodo.8180558
  - OWLNETS: https://zenodo.org/doi/10.5281/zenodo.8180564
Inverse Relations
- OWL: https://zenodo.org/doi/10.5281/zenodo.8180588
- OWLNETS: https://zenodo.org/doi/10.5281/zenodo.8180584

🗂 For additional information on the KG file types please see the following Wiki page

Required Input Documents

Data

Logs: pkt_builder_phases12_log.log

Original Data	Processed Data
Metadata
downloaded_build_metadata.txt	preprocessed_build_metadata.txt ontology_cleaning_report.txt
Files
9606.protein.links.v11.0.txt COMBINED.DEFAULT_NETWORKS.BP_COMBINING.txt CTD_chem_gene_ixns.tsv CTD_chem_go_enriched.tsv CTD_chemicals_diseases.tsv CTD_genes_pathways.tsv ChEBI2Reactome_All_Levels.txt GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct Homo_sapiens.GRCh38.102.entrez.tsv Homo_sapiens.GRCh38.102.gtf Homo_sapiens.GRCh38.102.uniprot.tsv Homo_sapiens.gene_info ReactomePathways.txt UniProt2Reactome_All_Levels.txt chebi_with_imports.owl clo_with_imports.owl compath_canonical_pathway_mappings.txt curated_gene_disease_associations.tsv disease_mappings.tsv ext_with_imports.owl gene_association.reactome genomic_sequence_ontology_mappings.xlsx genomic_typing_dict.pkl go_with_imports.owl goa_human.gaf hgnc_complete_set.txt hp_with_imports.owl human_pro_classes.html kegg_reactome.csv mesh2021.nt mondo_with_imports.owl names.tsv phenotype.hpoa pr_with_imports.owl promapping.txt proteinatlas_search.tsv pw_with_imports.owl ro_with_imports.owl so_with_imports.owl uniprot-cofactor-catalyst.tab uniprot_identifier_mapping.tab variant_summary.txt vo_with_imports.owl zooma_tissue_cell_mapping_04JAN2020.xlsx	CLINVAR_VARIANT_GENE_DISEASE_PHENOTYPE_EDGES.txt DISEASE_MONDO_MAP.txt ENSEMBL_GENE_ENTREZ_GENE_MAP.txt ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt ENTREZ_GENE_ENSEMBL_TRANSCRIPT_MAP.txt ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt HPA_GTEX_RNA_GENE_PROTEIN_EDGES.txt HPA_GTEx_TISSUE_CELL_MAP.txt HPA_tissues.txt INVERSE_RELATIONS.txt MESH_CHEBI_MAP.txt Merged_gene_rna_protein_identifiers.pkl PHENOTYPE_HPO_MAP.txt PheKnowLator_MergedOntologies.owl REACTOME_PW_GO_MAPPINGS.txt RELATIONS_LABELS.txt SO_GENE_TRANSCRIPT_VARIANT_TYPE_MAPPING.txt STRING_PRO_ONTOLOGY_MAP.txt UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt UNIPROT_PROTEIN_CATALYST.txt UNIPROT_PROTEIN_COFACTOR.txt chebi_with_imports.owl clo_with_imports.owl ensembl_identifier_data_cleaned.txt ext_with_imports.owl go_with_imports.owl hp_with_imports.owl human_pro.owl mondo_with_imports.owl node_metadata_dict.pkl pr_with_imports.owl pw_with_imports.owl ro_with_imports.owl so_with_imports.owl subclass_construction_map.pkl vo_with_imports.owl

Knowledge Graphs

🚨 Scroll to the right 👉 to see all of the available data 🚨

Instance-based Build
Standard Relations		Inverse Relations
OWL	OWL-NETS	OWL	OWL-NETS
Logs
pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json
Metadata
edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt
Data Files
Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL.nt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_AnnotationsOnly.nt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_LogicOnly.nt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_relationsOnly_OWL_Triples_Integers.txt node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS.nt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_Triples_Integers.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_decoding_dict.pkl PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified.nt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_Triples_Integers.txt PheKnowLator_v2.0.0_full_instance_relationsOnly_OWLNETS_INSTANCE_purified_decoding_dict.pkl node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL.nt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_AnnotationsOnly.nt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_LogicOnly.nt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_inverseRelations_OWL_Triples_Integers.txt node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS.nt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_Triples_Integers.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_decoding_dict.pkl PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified.nt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_NodeLabels.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_Triples_Integers.txt PheKnowLator_v2.0.0_full_instance_inverseRelations_OWLNETS_INSTANCE_purified_decoding_dict.pkl node_metadata_dict.pkl
Class-based Build
Standard Relations		Inverse Relations
OWL	OWL-NETS	OWL	OWL-NETS
Logs
pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json	pkt_build_log.log subclass_map_log.json
Metadata
edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt	edge_source_metadata.txt ontology_source_metadata.txt
Data Files
Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_AnnotationsOnly.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_LogicOnly.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integers.txt node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_Triples_Integers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_decoding_dict.pkl PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_Triples_Integers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWLNETS_SUBCLASS_purified_decoding_dict.pkl node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_AnnotationsOnly.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_LogicOnly.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integers.txt node_metadata_dict.pkl	Master_Edge_List_Dict.json PheKnowLator_MergedOntologies.owl PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_Triples_Integers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_decoding_dict.pkl PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_Triples_Integers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWLNETS_SUBCLASS_purified_decoding_dict.pkl node_metadata_dict.pkl

Important Build Updates

We provide several different types of output, each of which is described briefly below. Please note that in order to create the logic (XXXX_OWL_LogicOnly.nt) and annotation (XXXX_OWL_AnnotationsOnly.nt) subsets of each graph and be able to combine them (XXXX_OWL.nt) we have added a namespace to all BNode or anonymous nodes. More specifically, there are two kinds of pkt namespaces you will find within these files:

https://github.com/callahantiff/PheKnowLator/pkt/. This namespace is used for all non-ontology data defined owl:Class and owl:NamedIndividual objects that are added in order to integrate non-ontological entities (see here for more information).
https://github.com/callahantiff/PheKnowLator/pkt/bnode/. This namespace is used for all existing BNode or anonymous nodes and is applied to these types of entities prior to subsetting an input graph.

To remove the second type of namespacing from BNode that are part of the original ontologies used in each build, you can run the code shown below:

from pkt.utils import removes_namespace_from_bnodes

# remove bnode namespaces
updated_graph = removes_namespace_from_bnodes(org_graph)

Please also note that for all builds prior to v3.0.2, there are 2,008 nodes in the NodeLabels.txt files that contain foreign characters. While there is now code in place to prevent this error from happening in the future, there is also a solution to account for the prior builds. The (bad_node_patch.json) file contains a dictionary where the outer keys are the entity_uri and the outer values are another dictionary where the inner keys are label and description/definition and the inner values for these inner keys are the updated strings without foreign characters. An example of this dictionary is shown below:

key = '<http://purl.obolibrary.org/obo/UBERON_0000468>'

print(bad_node_patch[key])
>>> {'label': 'multicellular organism', 'description/definition': 'Anatomical structure that is an individual member of a species and consists of more than one cell.'}

The code to identify the nodes with erroneous foreign characters is shown below:

import re
import pandas as pd

# link to downloaded `NodeLabels.txt` file
input_file = `'NodeLabels.txt'`

# load data as Pandas DataFrame
nodedf = pd.read_csv(input_file, sep='\t', header=0)

# identify bad nodes and filter DataFrame so it only contains these rows
nodedf['bad'] = nodedf['label'].apply(lambda x: re.search("[\u4e00-\u9FFF]", x) if not pd.isna(x) else None)
nodedf_bad_nodes = nodedf[~pd.isna(nodedf['bad'])].drop_duplicates()

Return to Top

Project Information

Tutorials and Use Cases

Tutorials
- Implementing OWL-NETS
- Processing an RDF Graph
Use Cases
- Drug Safety
  - KG-based Investigation of Drug-Outcome Pairs

Releases

Benchmarks and Builds
- Archived

Human Disease KG Builds

Archived Builds
- v1.0.0
  - September 3, 2019
- v2.0.0
- v2.1.0
- v3.0.2
  - October 18, 2021
  - November 01, 2021

FAQs

How to Get Involved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

February 11, 2021

PKT Human Disease Knowledge Graph Benchmark Builds (`v2.0.0`)

Build Date: `February 11, 2021`

Required Input Documents

Data

Knowledge Graphs

🚨 Scroll to the right 👉 to see all of the available data 🚨

Important Build Updates

Project Information

Tutorials and Use Cases

Releases

Human Disease KG Builds

FAQs

Enabling Reproducible Research

Clone this wiki locally

February 11, 2021

PKT Human Disease Knowledge Graph Benchmark Builds (v2.0.0)

Build Date: February 11, 2021

Required Input Documents

Data

Knowledge Graphs

🚨 Scroll to the right 👉 to see all of the available data 🚨

Important Build Updates

Project Information

Tutorials and Use Cases

Releases

Human Disease KG Builds

FAQs

Enabling Reproducible Research

Clone this wiki locally

PKT Human Disease Knowledge Graph Benchmark Builds (`v2.0.0`)

Build Date: `February 11, 2021`