Skip to content

wsn1226/Content3D-Document_Unwarping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RGBD Document Unwarping

This is an undergraduate research project at the University of Hong Kong, supervised by Prof. Kenneth K.Y. Wong, where we achived SOTA performace in terms of MS-SSIM (0.512578), and Local Distortion (7.581896). We tried different Content+3D combinations to do multimodel learning, and find out that RGB+D is the best combination for this task.

How to Use

Please refer to DewarpNet for how to start, and please load data correctly. We provide different models, loaders, training and infer scripts for various combination of Content+3D. We also provide our joint training code.

Best Model and Result

You can download our best model here and our result here. We evaluate the result using the same code as DocUNet, our Matlab version is 2022a. Note that in the benchmark set, the 64th sample is up side down, please rotate it back before evaluation.

Contribution and Novelty

  1. We achieved SOTA performance compared with methods with the same pipeline. Specifically, we have improved the SOTA method by 0.32% and 1.93% in terms of MS-SSIM and LD respectively, using only about 1/3 parameters and 79.51% GPU memory.
  2. In document dewarping, we are the first to combine RGB and 3D information to do multimodal learning.
  3. We propose to use Adjoint Loss and Identical Loss so that the model can distinguish 3D and RGB information.

Proposed Pipeline:

image

Adjoint Loss and Identical Loss

image

Training Details

First train the three models using ground truth labels

For the semantic segmentation task, use cross-entropy loss For the depth prediction task, use L1 Loss and the ground truth masked image For BM prediction model, we input the ground truth depth and masked image

For ground truth training

we trained the BM model for 81 epochs, with batch size = 200, and learning rate = 0.0001. We reduce the learning rate by half when the validation loss doesn’t decrease for 5 epochs continuously. We don’t use auxiliary loss here because we found the performance will be worse if we use the auxiliary loss here.

After ground truth training

we do joint training for the 3 models, i.e., the latter 2 models take the previous models’ outputs as their inputs. And we minimize all losses together, including the cross-entropy loss for the semantic segmentation, the L1 Loss for the depth prediction, L1 Loss for the BM prediction, and the auxiliary losses.

Quantitative Comparison

image

Qualitative Comparison

image

image

image

image

image

Ablation Study

image

Acknowledgment

Our framework is based on DewarpNet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages