This repository contains the implementation of a research-driven project aimed at restoring damaged Sinhala handwritten documents using image processing, machine learning (ML), and natural language processing (NLP) techniques. The project is divided into four modules, each addressing different aspects of document damage detection and restoration.
Handwritten Sinhala documents, often preserved in libraries, offices, and archives, are subject to various damages such as ink blotches, wear and tear, insect attacks, and missing words. Unlike English text restoration, Sinhala script poses unique challenges due to its curved characters and complex structures. This project applies advanced techniques to detect, classify, and restore damaged handwritten content, ensuring readability while preserving original writing styles.
- Detects damaged areas in handwritten Sinhala documents.
- Classifies damages into ink blotches, wear and tear, blurred text, insect attacks, and peeled-off areas.
- Categorizes detected damages into minor (affecting parts of letters) or contextual (missing full words or phrases).
- Estimates missing content based on word spacing, character structure, and paragraph context.
- Restores blurred or bleed-through text to improve readability.
- Reconstructs minor letter-level damages while preserving original handwriting style.
- Ensures restored text blends seamlessly into the document.
- Restores larger missing sections, including entire letters, words, and phrases.
- Ensures the restored text maintains semantic coherence with the surrounding content.