Name		Name	Last commit message	Last commit date
parent directory ..
data/relation_extraction		data/relation_extraction
src		src
.gitignore		.gitignore
README.md		README.md
get_pretrained_weights.sh		get_pretrained_weights.sh
requirements.txt		requirements.txt
run_full_inference.sh		run_full_inference.sh
run_inference.sh		run_inference.sh
run_train.sh		run_train.sh
run_zs_bert_training.sh		run_zs_bert_training.sh

README.md

CreoleVal - Relation Classification

relation_classification/
├─ data/
│  ├─ relation_extraction/
├─ src/
│  ├─ data_generation/
│  ├─ ZS_BERT/
│  │  ├─ Wiki-ZSL/
├─ output/
├─ model/

Getting Started

The codes for ZS-BERT are adapted from the repository https://github.com/dinobby/ZS-BERT, paper https://aclanthology.org/2021.naacl-main.272/.

Environment Setup

Tested with: Python 3.10 and conda, Ubuntu 22 with Nvidia GPU

conda create -n re python=3.10
python3 -m pip install -r requirements.txt
- nvidia packages are for Linux machine with GPUs.

Training

Enable your Python environment;
Run the run_train.sh for full training or run_zs_bert_training.sh for single model. The relation_classification/ should be your working directory in order for all the relative paths to be resolved correctly;
After the training is done, the output will be stored in the saved_models/ directory

This will train the zero-shot relatione extraction model with the transformer bert-base-multilingual-cased and sentence embedder bert-base-nli-mean-tokens. Output is stored in model/ directory.

The run_train.sh and run_zs_bert_training.sh scripts will automatically pull and store the data in the directory src/ZS_BERT/Wiki-ZSL. Data is fetched from HuggingFace yiyic/ukp_m5.

Pretrained Weights

The trained ZS-Bert models, fine-tuned on bert-base-multilingual-cased and xlm-roberta-base, combined with 4 different sentence transformers, bert-base-nli-mean-tokens, bert-large-nli-mean-tokens, xlm-r-bert-base-nli-mean-tokens and xlm-r-100langs-bert-base-nli-mean-tokens are uploaded to HuggingFace.

Use bash get_pretrained_weights.sh to download the weights, these will be saved in the ./pretrained_weights/ directory.

3. Inference

To run inference

Activate your python environment
Run the script ./infer_zsbert.sh with the corresponding arguments.

E.g. run_inference.sh bert-base-multilingual-cased bert-base-multilingual-cased pretrained_weights 1 2 This will use the mBERT base model with BERT sentence embedder from the pretrained_weights directory using random_seed=1 and batch_size=2

Modify these arguments to suit your needs.

Data Generation

Directory src/data_generation

See the details in the paper.

List of Properties in the Generated Datasets:

ID: LABEL, Description

P106 : occupation, occupation of a person; see also "field of work" (Property:P101), "position held" (Property:P39)
P131 : located in the administrative territorial entity , the item is located on the territory of the following administrative entity. Use P276 (location) for specifying locations that are non-administrative places and for items about events
P17 : country, sovereign state of this item; don't use on humans
P30 : continent, continent of which the subject is a part
P31 : instance of, that class of which this subject is a particular example and member
P36 : capital, primary city of a country, province, state or other type of administrative territorial entity
P37 : official language, language designated as official by this item
P39 : position held, subject currently or formerly holds the object position or public office
P495 : country of origin, country of origin of this item (creative work, food, phrase, product, etc.)
P1376 : capital of country , state, department, canton or other administrative division of which the municipality is the governmental seat
P2341 : indigenous to, area or ethnic group that a language, folk dance, cooking style, food or other cultural expression is found (or was originally found)
P2936 : language used, language widely used (spoken or written) in this place or at this event
P361 : part of, object of which the subject is a part (if this subject is already part of object A which is a part of object B, then please only make the subject part of object A). Inverse property of "has part" (P527, see also "has parts of the class" (P2670)).

Note that of these 13 Properties, 4 have no samples in the English Wiki-SZL train dataset: P1376, P2341, P2936, and P361.

For further details on the distribution of these Properties across the Creole datasets, please see the relation classification analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relation_classification

relation_classification

README.md

CreoleVal - Relation Classification

Getting Started

Environment Setup

Training

Pretrained Weights

3. Inference

Data Generation

List of Properties in the Generated Datasets:

Files

relation_classification

Directory actions

More options

Directory actions

More options

Latest commit

History

relation_classification

Folders and files

parent directory

README.md

CreoleVal - Relation Classification

Getting Started

Environment Setup

Training

Pretrained Weights

3. Inference

Data Generation

List of Properties in the Generated Datasets: