This project aims to compare 3 methods for the transformation of unstructured textual content in Hebrew legal documents into structured data
To set up the environment and install the necessary dependencies for this project, follow the steps below:
First, clone the repository from GitHub:
git clone https://github.com/shay681/Constructing-Structured-Database-from-Unstructured-Legal-Documents.git
cd Constructing-Structured-Database-from-Unstructured-Legal-Documents
It's a good practice to create a virtual environment to manage dependencies. Use the following command to create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# For Windows:
venv\Scripts\activate
# For macOS/Linux:
source venv/bin/activate
Install the required dependencies using the requirements.txt
file provided:
pip install -r requirements.txt
If you're running the models on a GPU, ensure that CUDA is installed. You can check CUDA installation instructions from NVIDIA's official website.
Make sure the CUDA version matches your GPU and driver requirements.
All datasets that are used in this project can be found here: Datasets
Dataset | Description |
---|---|
Legal_Clauses | Dataset containing legal clauses extracted from Hebrew legal documents |
Precedents | Dataset of legal precedents from Hebrew legal documents |
Inference_Legal_Clauses | Inference dataset for predicting legal clauses in unstructured text |
Inference_Precedents | Inference dataset for predicting legal precedents in unstructured text |
All models that are trained and used in this project can be found here: Models
Model Name | Desciption |
---|---|
HeBERT_finetuned_Legal_Clauses | HeBERT model fine-tuned on legal clauses |
HeBERT_finetuned_Precedents | HeBERT model fine-tuned on precedents |
Text2Text_Legal_Clauses_finetuned_model | mT5 model fine-tuned on legal clauses |
Text2Text_Precedents_finetuned_model | mT5 model fine-tuned on precedents |
- hardware
CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz GPU: NVIDIA TESLA P40-2Q (24GB)
Created by Shay Doner. This is my final project as part of intelligent systems M.Sc studies at Afeka College in Tel-Aviv. For more cooperation, please contact email: [email protected]