HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Paper

This is the official implementation and dataset of the paper "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs", published at EMNLP 2024.

Data

All preprocessed data is located in the [data/hyperbert directory](data/hyperbert directory). This folder contains preprocessed PyTorch Geometric (PyG) graph files (stored as pickle files) for the benchmark datasets used in HyperBERT. The available datasets include:

cora_co: The Cora co-citation hypergraph.
dblp_a: The DBLP academic publications hypergraph.
imdb: The IMDB movie hypergraph.
pubmed: The PubMed citation hypergraph.

Each dataset is stored in a file named <dataset_name>_pyg.pkl (for example, imdb_pyg.pkl). These PyG graphs include the following key components:

Node attributes:
- title and abstract for datasets like cora_co, dblp_a, and pubmed.
- For the imdb dataset, nodes include the title (movie title) along with additional attributes such as actors, year, and runtime.
hyperedge_index: A sparse representation of the hypergraph incidence matrix.
y: A tensor of node labels.

For a hands-on example of how to load these datasets for training or inspection, see the demo_load_hypergraph_datasets.py script. The script demonstrates how to use our get_data_loader function to create a PyTorch DataLoader from a preprocessed dataset which can be then used for training models.

Citation

@inproceedings{bazaga-etal-2024-hyperbert,
    title = "{H}yper{BERT}: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs",
    author = "Bazaga, Adri{\'a}n  and
      Lio, Pietro  and
      Micklem, Gos",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.537/",
    doi = "10.18653/v1/2024.findings-emnlp.537"
}

Contact

For feedback, questions, or press inquiries please contact Adrián Bazaga

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data/hyperbert		data/hyperbert
dataset		dataset
figures		figures
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
auto_conda_install.sh		auto_conda_install.sh
create_conda_env.sh		create_conda_env.sh
requirements.txt		requirements.txt
run_finetuning.sh		run_finetuning.sh
run_preprocessing.sh		run_preprocessing.sh
run_tests.sh		run_tests.sh
run_training.sh		run_training.sh
torch_requirements.txt		torch_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Paper

Data

Citation

Contact

About

Uh oh!

Uh oh!

Languages

License

AdrianBZG/HyperBERT

Folders and files

Latest commit

History

Repository files navigation

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Paper

Data

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages