Requirements

Anaconda3 (python 3.6)
Pytorch 1.3.1
gensim 3.6.0

Easy Run

cd ./model/code/
python train.py

You may change the dataset by modifying the variable "dataset = 'example'" in the top of the code "train.py" or use arguments (see train.py).

Our datasets can be downloaded from Google Drive. PS: I have accidentally deleted some files, but I tried to restore them, hope they will run correctly.

Prepare for your own dataset

The following files are required:

./model/data/YourData/
    ---- YourData.cites                // the adjcencies
    ---- YourData.content.text         // the features of texts
    ---- YourData.content.entity       // the features of entities
    ---- YourData.content.topic        // the features of topics
    ---- train.map                     // the index of the training node
    ---- vali.map                      // the index of the validation nodes
    ---- test.map                      // the index of the testing nodes

The format is as following:

YourData.cites

Each line contains an edge: "idx1\tidx2\n". eg: "98 13"
YourData.content.text

Each line contains a node: "idx\t[features]\t[category]\n", note that the [features] is a list of floats with '\t' as the delimiter. eg: "59 1.0 0.5 0.751 0.0 0.659 0.0 computers" If used for multi-label classification, [category] must be one-hot with space as delimiter, eg: "59 1.0 0.5 0.751 0.0 0.659 0.0 0 1 1 0 1 0".
YourData.content.entity

Similar with .text, just change the [category] to "entity". eg: "13 0.0 0.0 1.0 0.0 0.0 entity"
YourData.content.topic

Similar with .text, just change the [category] to "topic". eg: "64 0.10 1.21 8.09 0.10 topic"
*.map

Each line contains an index: "idx\n". eg: "98"

You can see the example in ./model/data/example/*

A simple data preprocessing code is provided. Successfully running it requires a token of tagme's account (my personal token is provided in tagme.py, but may be invalid in the future), Wikipedia's entity descriptions, and a word2vec model containing entity embeddings. You can prepare them yourself or obtain our files from Google Drive and unzip them to ./data/ .

Then, you should prepare a data file like ./data/example/example.txt, whose format is: "[idx]\t[category]\t[content]\n".

Finally, modify the variable "dataset = 'example'" in the top of following codes and run:

python tagMe.py
python build_network.py
python build_features.py
python build_data.py

Use HGAT as GNN

If you just wanna use the HGAT model as a graph neural network, you can just prepare some files following the above format:

 ./model/data/YourData/
    ---- YourData.cites                // the adjcencies
    ---- YourData.content.*            // the features of *, namely node_type1, node_type2, ...
    ---- train.map                     // the index of the training node
    ---- vali.map                      // the index of the validation nodes
    ---- test.map                      // the index of the testing nodes

And change the "load_data()" in ./model/code/utils.py

type_list = [node_type1, node_type2, ...]
type_have_label = node_type

See the codes for more details.

Citation

If you make advantage of the HGAT model in your research, please cite the following in your manuscript:

@article{yang2021hgat,
  author = {Yang, Tianchi and Hu, Linmei and Shi, Chuan and Ji, Houye and Li, Xiaoli and Nie, Liqiang},
  title = {HGAT: Heterogeneous Graph Attention Networks for Semi-Supervised Short Text Classification},
  year = {2021},
  publisher = {Association for Computing Machinery},
  volume = {39},
  number = {3},
  doi = {10.1145/3450352},
  journal = {ACM Transactions on Information Systems},
  month = may,
  articleno = {32},
  numpages = {29},
}

@inproceedings{linmei2019heterogeneous,
  title={Heterogeneous graph attention networks for semi-supervised short text classification},
  author={Linmei, Hu and Yang, Tianchi and Shi, Chuan and Ji, Houye and Li, Xiaoli},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={4823--4832},
  year={2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Requirements

Easy Run

Prepare for your own dataset

Use HGAT as GNN

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Requirements

Easy Run

Prepare for your own dataset

Use HGAT as GNN

Citation