Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wxwilcke committed Apr 19, 2021
1 parent d529645 commit 214b2d3
Showing 1 changed file with 18 additions and 13 deletions.
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,39 @@
# Relational Graph Convolutional Network for Multimodal Knowledge Graphs

PyTorch implementation of a multimodal relational graph convolutional network (Multimodal R-GCN) for heterogeneous data encoded as knowledge graph, as discussed in our paper [End-to-End Entity Classification on Multimodal Knowledge Graphs](https://arxiv.org/abs/2003.12383) (2020).
PyTorch implementation of a multimodal relational graph convolutional network (MR-GCN) for heterogeneous data encoded as knowledge graph, as introduced in our paper [End-to-End Learning on Multimodal Knowledge Graphs](http://www.semantic-web-journal.net/content/end-end-learning-multimodal-knowledge-graphs) (2021).

By directly reading N-Triples, a common serialization format for knowledge graphs, the MR-GCN can perform node classification and link prediction on any arbitrary knowledge graph that makes use of the RDF data model. To facilitate multimodal learning, the MR-GCN supports 33 different datatypes encompassing six different modalities, including images, natural language, and spatial information, all of which are automatically inferred from the datatype annotations in the graph and processed accordingly.

## Getting Started

To install this implementation, clone the repository and run:
1) To install, clone the repository and run:

```
python setup.py install
```

Once installed, we must first prepare our dataset by running
Once installed, we must first prepare a dataset by calling `mkdataset` with a configuration file `<dataset>.toml` as argument. For the datasets used in our paper, the configuration files are available in the `./if/` directory. To create a configuration file for a different dataset, simply copy and edit `template.toml`. Note that node classification and link prediction require different options.

2) To prepare a dataset, run

```
python mrgcn/mkdataset.py --config ./if/<name>.toml --output ./data/ -vv
python mrgcn/mkdataset.py --config ./if/<dataset>.toml --output ./data/ -vv
```

This will create a tar file (`<NAME[unix_time]>.tar`) with all data necessary to run subsequent experiments. To include all supported modalities in the dataset, ensure that `include` is set to `true` in the configuration file for all modalities (we can include/exclude these during training as long as they are included here). The original graph is now no longer needed. Note that we must here choose between letting literal values with the same value become one node (`separate_literals = false`) or keep them as as many nodes as there are literals (`separate_literals = true`). We thus need to create two dataset variations per graph if we want to train on both.
This will create a tar file (`<DATASET[unix_time]>.tar`) with all data necessary to run subsequent experiments. To include all supported modalities in the dataset, ensure that `include` is set to `true` in the configuration file for all modalities (we can include/exclude these during training as long as they are included here). The original graph is now no longer needed. Note that we must here choose between letting literal values with the same value become one node (`separate_literals = false`) or keep them as many nodes as there are unique literals (`separate_literals = true`). .

Run the Multimodal R-GCN on the prepared dataset by running:
3) Run the MR-GCN on the prepared dataset by running:

```
python mrgcn/run.py --input ./data/<NAME[unix_date]>.tar --config ./if/<name>.toml -vv
python mrgcn/run.py --input ./data/<DATASET[unix_date]>.tar --config ./if/<dataset>.toml -vv
```

This will report the CE loss and accuracy on the validation set. Use the `--test` flag to report that of the test set.
This will report the CE loss and accuracy on the validation set for node classification, and the MRR and hits@k for link prediction. Use the `--test` flag to report that of the test set.

## Reproduction

To reproduce our classification experiments we need the configuration files for `AIFB`, `MUTAG`, `BGS`, `AM`, `DMG`, and `SYNTH` as available in the `./if/` folder, and the accompanying graphs which are available [here](https://gitlab.com/wxwilcke/mmkg). Use the version of this repository tagged as [v1.0](https://gitlab.com/wxwilcke/mrgcn/-/tags/v1.0).
To reproduce the experiments of our paper, first acquire the datasets from [here](https://gitlab.com/wxwilcke/mmkg), and use the version of this repository tagged as [v2.0](https://gitlab.com/wxwilcke/mrgcn/-/tags/v2.0). Note that there exists a previous iteration of our paper called [End-to-End Entity Classification on Multimodal Knowledge Graphs](https://arxiv.org/abs/2003.12383) (2020) which only considered node classification and which uses [v1.0](https://gitlab.com/wxwilcke/mrgcn/-/tags/v1.0) of this repository.


## Supported data types

Expand Down Expand Up @@ -99,16 +104,16 @@ Images:
- kgbench:base64Image (http://kgbench.info/dt)
```

Note that images are expected to be in binary format and included in the graph.
Note that images are expected to be formatted as binary-encoded strings and included in the graph.

## Cite

While we await our paper to be accepted, please cite us as follows if you use this code in your own research.

```
@article{wilcke2020mrgcn,
title={End-to-End Entity Classification on Multimodal Knowledge Graphs},
author={Wilcke, WX and Bloem, P and de Boer, V and van’t Veer, RH and van Harmelen, FAH},
year={2020}
title={End-to-End Learning on Multimodal Knowledge Graphs},
author={Wilcke, WX and Bloem, P and de Boer, V and van’t Veer, RH},
year={2021}
}
```

0 comments on commit 214b2d3

Please sign in to comment.