Constructing-Structured-Database-from-Unstructured-Legal-Documents

This project aims to compare 3 methods for the transformation of unstructured textual content in Hebrew legal documents into structured data

Install & Dependencies

To set up the environment and install the necessary dependencies for this project, follow the steps below:

1. Clone the Repository

First, clone the repository from GitHub:

git clone https://github.com/shay681/Constructing-Structured-Database-from-Unstructured-Legal-Documents.git

cd Constructing-Structured-Database-from-Unstructured-Legal-Documents

2. Create a Virtual Environment (Optional but recommended)

It's a good practice to create a virtual environment to manage dependencies. Use the following command to create and activate a virtual environment:

# Create virtual environment
python -m venv venv

# Activate virtual environment
# For Windows:
venv\Scripts\activate
# For macOS/Linux:
source venv/bin/activate

3. Install Dependencies

Install the required dependencies using the requirements.txt file provided:

pip install -r requirements.txt

4. Install CUDA (For GPU Acceleration)

If you're running the models on a GPU, ensure that CUDA is installed. You can check CUDA installation instructions from NVIDIA's official website.

Make sure the CUDA version matches your GPU and driver requirements.

Datasets

All datasets that are used in this project can be found here: Datasets

Dataset	Description
Legal_Clauses	Dataset containing legal clauses extracted from Hebrew legal documents
Precedents	Dataset of legal precedents from Hebrew legal documents
Inference_Legal_Clauses	Inference dataset for predicting legal clauses in unstructured text
Inference_Precedents	Inference dataset for predicting legal precedents in unstructured text

Models

All models that are trained and used in this project can be found here: Models

Model Name	Desciption
HeBERT_finetuned_Legal_Clauses	HeBERT model fine-tuned on legal clauses
HeBERT_finetuned_Precedents	HeBERT model fine-tuned on precedents
Text2Text_Legal_Clauses_finetuned_model	mT5 model fine-tuned on legal clauses
Text2Text_Precedents_finetuned_model	mT5 model fine-tuned on precedents

Tested Platform

hardware

CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
GPU: NVIDIA TESLA P40-2Q (24GB)

About Me

Created by Shay Doner. This is my final project as part of intelligent systems M.Sc studies at Afeka College in Tel-Aviv. For more cooperation, please contact email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Language Model Q&A.ipynb		Language Model Q&A.ipynb
README.md		README.md
REGEX.ipynb		REGEX.ipynb
Text2Text Generation.ipynb		Text2Text Generation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constructing-Structured-Database-from-Unstructured-Legal-Documents

Install & Dependencies

1. Clone the Repository

2. Create a Virtual Environment (Optional but recommended)

3. Install Dependencies

4. Install CUDA (For GPU Acceleration)

Datasets

Models

Tested Platform

About Me

About

Releases

Packages

Languages

shay681/Constructing-Structured-Database-from-Unstructured-Legal-Documents

Folders and files

Latest commit

History

Repository files navigation

Constructing-Structured-Database-from-Unstructured-Legal-Documents

Install & Dependencies

1. Clone the Repository

2. Create a Virtual Environment (Optional but recommended)

3. Install Dependencies

4. Install CUDA (For GPU Acceleration)

Datasets

Models

Tested Platform

About Me

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages