TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition [NAACL 2024]

Method Overview

Abstract

Abstract Table reasoning is a challenging task that requires understanding both natural language questions and structured tabular data. Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation, but they often struggle with large tables due to their limited input length. In this paper, we propose TabSQLify, a novel method that leverages text-to-SQL generation to decompose tables into smaller and relevant sub-tables, containing only essential information for answering questions or verifying statements, before performing the reasoning task. In our comprehensive evaluation on four challenging datasets, our approach demonstrates comparable or superior performance compared to prevailing methods reliant on full tables as input. Moreover, our method can reduce the input context length significantly, making it more scalable and efficient for large scale table reasoning applications. Our method performs remarkably well on the WikiTQ benchmark, achieving an accuracy of 64.7%. Additionally, on the TabFact benchmark, it achieves a high accuracy of 79.5%. These results surpass other LLM-based baseline models on gpt-3.5-turbo (chatgpt). TabSQLify can reduce the table size significantly alleviating the computational load on LLMs when handling large tables without compromising performance.

Code

Citation

If you want to cite our papers, please use:

@inproceedings{nahid-rafiei-2024-tabsqlify,
    title = "{T}ab{SQL}ify: Enhancing Reasoning Capabilities of {LLM}s Through Table Decomposition",
    author = "Nahid, Md Mahadi Hasan and
      Rafiei, Davood",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.320",
    doi = "10.18653/v1/2024.naacl-long.320",
    pages = "5725--5737",
    abstract = "Table reasoning is a challenging task that requires understanding both natural language questions and structured tabular data. Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation, but they often struggle with large tables due to their limited input length. In this paper, we propose TabSQLify, a novel method that leverages text-to-SQL generation to decompose tables into smaller and relevant sub-tables, containing only essential information for answering questions or verifying statements, before performing the reasoning task. In our comprehensive evaluation on four challenging datasets, our approach demonstrates comparable or superior performance compared to prevailing methods reliant on full tables as input. Moreover, our method can reduce the input context length significantly, making it more scalable and efficient for large-scale table reasoning applications. Our method performs remarkably well on the WikiTQ benchmark, achieving an accuracy of 64.7{\%}. Additionally, on the TabFact benchmark, it achieves a high accuracy of 79.5{\%}. These results surpass other LLM-based baseline models on gpt-3.5-turbo (chatgpt). TabSQLify can reduce the table size significantly alleviating the computational load on LLMs when handling large tables without compromising performance.",
}

@inproceedings{nahid-rafiei-2024-normtab,
    title = "{N}orm{T}ab: Improving Symbolic Reasoning in {LLM}s Through Tabular Data Normalization",
    author = "Nahid, Md Mahadi Hasan and
      Rafiei, Davood",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.203",
    pages = "3569--3585",
    abstract = "In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
analysis		analysis
bin		bin
datasets		datasets
lib/python3.10/site-packages		lib/python3.10/site-packages
outputs		outputs
utils		utils
.DS_Store		.DS_Store
A0_TabSQLify_NAACL_2024-print.pdf		A0_TabSQLify_NAACL_2024-print.pdf
FeTaQA_Full_1.csv		FeTaQA_Full_1.csv
FeTaQA_Full_1.json		FeTaQA_Full_1.json
FeTaQA_compute_score.py		FeTaQA_compute_score.py
README.md		README.md
fetaQA-v1_test.jsonl		fetaQA-v1_test.jsonl
fetaqa_rc_B_2.jsonl		fetaqa_rc_B_2.jsonl
json_to_csv.py		json_to_csv.py
method-2C.pdf		method-2C.pdf
method.jpg		method.jpg
prepare_data_experiment_large.py		prepare_data_experiment_large.py
pyvenv.cfg		pyvenv.cfg
ragas_dataset_prepare.py		ragas_dataset_prepare.py
ragas_eval.py		ragas_eval.py
requirements.txt		requirements.txt
run_fetaqa_A.py		run_fetaqa_A.py
run_tabfact_A.py		run_tabfact_A.py
run_wtq_A.py		run_wtq_A.py
tabfact_eval.py		tabfact_eval.py
tf_cut_more_than_50_percent.jsonl		tf_cut_more_than_50_percent.jsonl
tf_no_cut.jsonl		tf_no_cut.jsonl
wtq_compute_scores.py		wtq_compute_scores.py
wtq_cut_more_than_50_percent.jsonl		wtq_cut_more_than_50_percent.jsonl
wtq_eval.py		wtq_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition [NAACL 2024]

Method Overview

Abstract

Code

Citation

About

Releases

Packages

Languages

mahadi-nahid/TabSQLify

Folders and files

Latest commit

History

Repository files navigation

TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition [NAACL 2024]

Method Overview

Abstract

Code

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages