TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

About TAPHSIR

TAPHSIR is a tool associated with the ICSE'22 technical paper titled "Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study". TAPHSIR provides a hybrid solution composed of two components. The first component employs machine learning (ML) for detecting anaphoric ambiguity. The second component uses a pre-trained language model, SpanBERT, for anaphora resolution.

The tool was developed at SnT / University of Luxembourg with funding from Luxembourg's National Research Fund (FNR).

What is released?

./artifact/: is the folder that contains our ML and SpanBERT trained models and some python scripts that enable running TAPHSIR, e.g., text processing methods.
./example.txt: is a txt file that contains an example from our technical paper. This example is used to illustrate how TAPHSIR can be used (see Usage Example below).
./taphsir.py: is the main python file that contains the scripts to run TAPHSIR.
./output/: is the folder that will contain the results produced by TAPHSIR. Output excel files will be generated and placed in this folder depending on whether one runs the detection compontent, resolution component, or both.
./requirements.txt: is the file containing the required libraries needed to run TAPHSIR.

How to use TAPHSIR?

TAPHSIR is implemented in Python 3.8. Youn can run TAPHSIR using the following steps.

Getting the Project

Clone the GitLab repository TAPHSIR to your local machine, for example using the following command:

git clone https://gitlab.uni.lu/sabualhaija/taphsir

Navigate to the project main folder on your local machine

cd path/to/taphsir/

Follow the instructions in the Installation File for setting up the environment for running TAPHSIR.

Usage Example

Once you set up your environment, you can run TAPHSIR using the following command:

python taphsir.py --doc path-to-doc --mode mode --detection model

where:

mode is a parameter indicating what component from TAPHSIR you want to run. The value of this parameter is selected from three options: 1 (to run the detection component only), 2 (to run the resolution component only), and 3 (to run both components). Note that 3 is the default value.
model is a parameter indicating which ML model one wants to apply for anaphoric ambiguity detection. The value of this parameter can be: LF (to use ML model pre-trained on language features only), FE (to use the ML model that is pre-trained on feature embeddings only), or Ensemble (to use ensemble ML pre-trained on both sets of features). More details on these models can be found in our technical paper.

For running TAPHSIR with the default parameters on Example.txt, use the following command (assuming that you have already navigated to the main folder of TAPHSIR on your local machine):

python taphsir.py --doc Example.txt

Output of TAPHSIR

Once the execution is completed, one or two excel files resulting from running the components of TAPHSIR according to the mode parameter will be generated and placed in the folder ./output/. In the case of running the tool with the default parameters on Example.txt, two output files, detection.xlsx and resolution.xlsx, are generated.

The detection.xlsx file contains the following columns:

Column	Content
Id	A unique identifier for each pronoun in the context.
Context	The context where a pronoun is occurring.
Pronoun	The pronoun for which the analysis has been performed.
Detected As	The detection result of TAPHSIR for that pronoun ambiguous or unambiguous.

The detection.xlsx file contains the following columns:

Column	Content
Id	A unique identifier for each pronoun in the context.
Context	The context where a pronoun is occurring.
Pronoun	The pronoun for which the analysis has been performed.
Resolved As	The resolution result of TAPHSIR for that pronoun indicating to which antecdent this pronoun refers.

*Note: in case of multiple pronouns in the same context, the output files will contain one row per pronoun. *

How to cite?

If you wish to use or compare with TAPHSIR, please cite the following paper: Ezzini, S., Abualhaija, S., Arora, C., Sabetzadeh, M. (2022, May). Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study. In 44th International Conference on Software Engineering (ICSE).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
artifact		artifact
COPYING		COPYING
Example.txt		Example.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
libraries.txt		libraries.txt
taphsir.py		taphsir.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

About TAPHSIR

What is released?

How to use TAPHSIR?

Getting the Project

Usage Example

Output of TAPHSIR

How to cite?

About

Releases

Packages

Languages

License

SNTSVV/taphsir

Folders and files

Latest commit

History

Repository files navigation

TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

About TAPHSIR

What is released?

How to use TAPHSIR?

Getting the Project

Usage Example

Output of TAPHSIR

How to cite?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages