RGBD Document Unwarping

This is an undergraduate research project at the University of Hong Kong, supervised by Prof. Kenneth K.Y. Wong, where we achived SOTA performace in terms of MS-SSIM (0.512578), and Local Distortion (7.581896). We tried different Content+3D combinations to do multimodel learning, and find out that RGB+D is the best combination for this task.

How to Use

Please refer to DewarpNet for how to start, and please load data correctly. We provide different models, loaders, training and infer scripts for various combination of Content+3D. We also provide our joint training code.

Best Model and Result

You can download our best model here and our result here. We evaluate the result using the same code as DocUNet, our Matlab version is 2022a. Note that in the benchmark set, the 64th sample is up side down, please rotate it back before evaluation.

Contribution and Novelty

We achieved SOTA performance compared with methods with the same pipeline. Specifically, we have improved the SOTA method by 0.32% and 1.93% in terms of MS-SSIM and LD respectively, using only about 1/3 parameters and 79.51% GPU memory.
In document dewarping, we are the first to combine RGB and 3D information to do multimodal learning.
We propose to use Adjoint Loss and Identical Loss so that the model can distinguish 3D and RGB information.

Proposed Pipeline:

Adjoint Loss and Identical Loss

Training Details

First train the three models using ground truth labels

For the semantic segmentation task, use cross-entropy loss For the depth prediction task, use L1 Loss and the ground truth masked image For BM prediction model, we input the ground truth depth and masked image

For ground truth training

we trained the BM model for 81 epochs, with batch size = 200, and learning rate = 0.0001. We reduce the learning rate by half when the validation loss doesn’t decrease for 5 epochs continuously. We don’t use auxiliary loss here because we found the performance will be worse if we use the auxiliary loss here.

After ground truth training

we do joint training for the 3 models, i.e., the latter 2 models take the previous models’ outputs as their inputs. And we minimize all losses together, including the cross-entropy loss for the semantic segmentation, the L1 Loss for the depth prediction, L1 Loss for the BM prediction, and the auxiliary losses.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
loaders		loaders
models		models
README.md		README.md
inferbm_normal.py		inferbm_normal.py
inferbmfromRGBD_msk.py		inferbmfromRGBD_msk.py
inferbmfromRGBD_msk_DocTrseg.py		inferbmfromRGBD_msk_DocTrseg.py
inferbmfromRGBDeviation.py		inferbmfromRGBDeviation.py
inferbmfromRGBP.py		inferbmfromRGBP.py
inferbmfromWC_NM.py		inferbmfromWC_NM.py
trainbm6_reconin.py		trainbm6_reconin.py
trainboundary.py		trainboundary.py
trainjoint_masked_depth.py		trainjoint_masked_depth.py
trainjoint_masked_depth_full.py		trainjoint_masked_depth_full.py
trainjoint_masked_depth_fullreconin.py		trainjoint_masked_depth_fullreconin.py
trainjoint_nm_full.py		trainjoint_nm_full.py
trainjoint_rgb_sign0p_flow_norecon.py		trainjoint_rgb_sign0p_flow_norecon.py
trainjoint_rgbp_flow_reconin.py		trainjoint_rgbp_flow_reconin.py
trainu2net.py		trainu2net.py
trainu2net_depth.py		trainu2net_depth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RGBD Document Unwarping

How to Use

Best Model and Result

Contribution and Novelty

Proposed Pipeline:

Adjoint Loss and Identical Loss

Training Details

First train the three models using ground truth labels

For ground truth training

After ground truth training

Quantitative Comparison

Qualitative Comparison

Ablation Study

Acknowledgment

About

Releases

Packages

Languages

wsn1226/Content3D-Document_Unwarping

Folders and files

Latest commit

History

Repository files navigation

RGBD Document Unwarping

How to Use

Best Model and Result

Contribution and Novelty

Proposed Pipeline:

Adjoint Loss and Identical Loss

Training Details

First train the three models using ground truth labels

For ground truth training

After ground truth training

Quantitative Comparison

Qualitative Comparison

Ablation Study

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages