DestT5 (Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding)

Dataset & code for DestT5 (NLP for ConvAI, ACL 2023)

If you use this dataset or repository, please cite the following paper:

@inproceedings{glenn2023correcting,
  author = {Parker Glenn, Parag Pravin Dakle, Preethi Raghavan},
  title = "Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding",
  booktitle = "Proceedings of the 5th Workshop on NLP for Conversational AI",
  publisher = "Association for Computational Linguistics",
  year = "2023"
}

Performance

Below we display the exact-match (EM%) and execution-accuracy (EX%) of DestT5 on the SPLASH dataset, as well as the auxiliary test sets available in the NLEdit codebase.

		Seq2Struct (SPLASH)	EditSQL	TaBERT	RAT-SQL	T5-Large
DestT5 (parkervg/destt5-schema-prediction with parkervg/destt5-text2sql)	EM%	53.43	31.82	31.47	28.37	26.1
	EX%	56.86	40.3	28.84	36.53	30.43

T5-large Dataset

The file data/splash-t5-3vnuv1vf.json contains 112 annotations for interactive semantic parsing.

Given randomly selected errors on the Spider dataset by tscholak/3vnuv1vf, natural language feedback is given to correct the erroneous parse.

Model Training

Our codebase is based off the great implementation of Picard. Specifically, we make the following updates to the DataTrainingArguments at seq2seq/utils/dataset.py to re-create the experiments described in the paper.

use_gold_concepts: bool = field(
        default=False,
        metadata={
            "help": "Whether or not to serialize input only with columns/tables/values present in the gold query."
        },
    )

use_serialization_file: Optional[List[str]] = field(
    default=None,
    metadata={
        "help": "If specified, points to the output of a T5 concept prediction model. Uses predictions as serialization to current text-to-sql model"
    },
)

include_explanation: Optional[bool] = field(
    default=False,
    metadata={
        "help": "Boolean defining whether to serialize explanation in SPLASH training"
    },
)

include_question: Optional[bool] = field(
    default=False,
    metadata={
        "help": "Boolean defining whether to serialize question in SPLASH training"
    },
)

splash_train_with_spider: Optional[bool] = field(
    default=False,
    metadata={
        "help": "Boolean defining whether to interleave Spider train set with Splash train"
    },
)

shuffle_splash_feedback: Optional[bool] = field(
    default=False,
    metadata={
        "help": "Test to see if model is actually using feedback, by running evaluation on test set with shuffled feedback"
    },
)

shuffle_splash_question: Optional[bool] = field(
    default=False,
    metadata={
        "help": "Test to see if model is actually using question, by running evaluation on test set with shuffled questions"
    },
)

task_type: Optional[str] = field(
    default="text2sql",
    metadata={"help": "One of text2sql, schema_prediction"},
)

spider_eval_on_splash: Optional[bool] = field(
    default=False,
    metadata={"help": "Whether we're running a Spider model on SPLASH. Only use question, in that case."},
)

Usage

First, clone the repo.

This repo uses submodules, we can install them with the following commands.

git submodule init
git submodule update

Then, create a destt5 conda env with the following command.

conda env create --file env.yml

Download Datasets

This work requires both the Spider dataset and the Splash dataset.

First, download Spider.zip here. Place this file in seq2seq/datasets/spider.

Then

Then, to run the training for DestT5, run the following command.

python -m seq2seq.run_seq2seq ./seq2seq/configs/question/text2sql-t5-base-schema-generator.json

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
img		img
seq2seq		seq2seq
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.submodules		.submodules
README.md		README.md
env.yml		env.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DestT5 (Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding)

Performance

T5-large Dataset

Model Training

Usage

Download Datasets

About

Releases

Packages

Languages

parkervg/DestT5

Folders and files

Latest commit

History

Repository files navigation

DestT5 (Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding)

Performance

T5-large Dataset

Model Training

Usage

Download Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages