BETag

This repository contains the code for the paper BETag: Behavior-enhanced Item Tagging with Finetuned Large Language Models.

Repository Status

We plan to make this code open source. However, as we are in the process of applying for a patent related to this work, the repository is temporarily unavailable.

We are committed to completing the process as quickly as possible and will make the repository publicly accessible once the patent process is finalized. Thank you for your understanding and patience.

Installation for BETag Generation

Install PyTorch (version >= 2.0) with the appropriate CUDA version for your system.
Install dependencies using the following command:
- Alternatively, you can manually check and install dependencies listed in pyproject.toml.

pip install -e .

Base Tag Generation

Base tags serve as the foundational representation of products and can be any relevant tags. We provide a script (base_tags_generation.py) for generating base tags using an LLM API.

The base tags must be organized in the following format for subsequent BE-finetuning and BETag Generation:

Mapping[PID, list[str]]

BE-Finetuning

To finetune the model on your own dataset, you need:

Training interaction sequences: A list of interaction sequences (e.g., list[list[PID]]).
Base Tags: A mapping of product IDs (PIDs) to lists of tags (Mapping[PID, list[Tag]]).

Steps:

Prepare the dataset and specify paths to your data in a dotenv configuration file. For example:

inters_path = dataset/amazon.scientific/inters.train.json
base_tags_path = dataset/amazon.scientific/base_tags.json
...

Run the finetuning script:

python beft.py --env path/to/the/.env

Notes:

Preprocessed datasets used in the paper are available here.
Default environment configurations can be found in the envs.default directory.
- Finetuned checkpoints are available on google drive.

BETag Generation

For BETag Generation, interactions are not required for BETag Generation.

Requirements:

Base Tags: Use the same base tags as in BE-finetuning.
Checkpoint: Path to the finetuned LLM checkpoint.

Steps:

Configure the dotenv file with required paths.
Run the generation script:

python begen.py --env path/to/the/.env

Output Files:

The output directory will contain the following files:

generation_config.json: Contains the generation configuration.
raw_predict.json: The raw output of the LLM.
raw_betags.json: Parsed BETags in the format Mapping[PID, list[list[str]]].
- For each product, the generated tags for each beam are stored separately.
- You can select the top-M beams for each product:
```
betags = {pid: beams[:TOP_M+1] for pid, beams in raw_betags.items()}
```
  - Beams are sorted by score, from highest to lowest. The base tags are included as the first beam, resulting in M+1 beams.
- To use weighted tags or select top-K tags via:
```
from collections import Counter
betags = {pid: Counter(sum(beams, [])).most_common(TOP_K) for pid, beams in betags.items()}
```

Notes:

Generated BETags are available here.

Credits

The Amazon dataset used in this work was from Recformer.

Citation

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
envs.default		envs.default
.gitignore		.gitignore
.pylintrc		.pylintrc
base_tags_generation.py		base_tags_generation.py
beft.py		beft.py
begen.py		begen.py
changelog.md		changelog.md
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BETag

Repository Status

Installation for BETag Generation

Base Tag Generation

BE-Finetuning

Steps:

Notes:

BETag Generation

Requirements:

Steps:

Output Files:

Notes:

Credits

Citation

About

Releases

Packages

Languages

idssplab/BETag

Folders and files

Latest commit

History

Repository files navigation

BETag

Repository Status

Installation for BETag Generation

Base Tag Generation

BE-Finetuning

Steps:

Notes:

BETag Generation

Requirements:

Steps:

Output Files:

Notes:

Credits

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages