URGENT HELP: Fine-Tuning: SAST Algorithm - Failed Inference #14488

VishyAnand28 · 2025-01-03T16:18:37Z

VishyAnand28
Jan 3, 2025

Dear Paddle Community,

I recently had minor success from training with DB algorithm and results were decent. I tried to do the same for SAST. However the inference results are extremely poor. Both fails to detect bounding boxes and also it draws cross bounding boxes.

Code:
ocr = PaddleOCR(use_gpu=True, det_algorithm='SAST', det_model_dir=r'C:\Users\I011786\Pictures\01_Paddle\04_Paddle_Models\03_detector_pretrained\02_SAST\v4_train\v4_inference', use_angle_cls=True, lang='en')
......
img_path = r"............\Pictures\01_Paddle\03_img_test\custom_img_11.jpg"
result = ocr.ocr(img_path, cls=True)

I have done inference on very simple images that I randomly took. It performs extremely worse on complex images, mostly with cross bounding box. I have varied value of nms_thresh from 0.2 to 0.6 but it does not help. Unsure what went wrong.
train.log
Config file is attached in the comments.
PFA my train log and picture!

I humbly request guidance from @GreatV @WenmuZhou @LDOUBLEV @MissPenguin @tink2123 @UserWangZz and others ........ for guidance.

VishyAnand28 · 2025-01-03T16:19:11Z

VishyAnand28
Jan 3, 2025
Author

Config File:
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: /home/jovyan/01_Paddle/04_Paddle_Models/03_detector_pretrained/02_SAST/v4_train
save_epoch_step: 50
eval_batch_step:

0
400
cal_metric_during_train: false
pretrained_model: /home/jovyan/01_Paddle/04_Paddle_Models/03_detector_pretrained/02_SAST/det_r50_vd_sast_icdar15_v2.0_train/best_accuracy.pdparams
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: null
save_res_path: null
Architecture:
model_type: det
algorithm: SAST
Transform: null
Backbone:
name: ResNet_SAST
layers: 50
Neck:
name: SASTFPN
with_cab: true
Head:
name: SASTHead
Loss:
name: SASTLoss
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 5
regularizer:
name: L2
factor: 0.0001
PostProcess:
name: SASTPostProcess
score_thresh: 0.5
sample_pts_num: 2
nms_thresh: 0.2
expand_scale: 1.0
shrink_ratio_of_width: 0.3
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /home/jovyan/01_Paddle
label_file_list:
- /home/jovyan/01_Paddle/02_paddle_img_det_aug_v1/paddle_det_prep_aug_v1_train.txt
  ratio_list:
- 1.0
  transforms:
- DecodeImage:
  img_mode: BGR
  channel_first: false
- DetLabelEncode: null
- SASTProcessTrain:
  image_shape:
  - 512
  - 512
    min_crop_side_ratio: 0.3
    min_crop_size: 24
    min_text_size: 4
    max_text_size: 512
- KeepKeys:
  keep_keys:
  - image
  - score_map
  - border_map
  - training_mask
  - tvo_map
  - tco_map
    loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 16
    num_workers: 4
    Eval:
    dataset:
    name: SimpleDataSet
    data_dir: /home/jovyan/01_Paddle
    label_file_list:
- /home/jovyan/01_Paddle/02_paddle_img_det_aug_v1/paddle_det_prep_aug_v1_eval.txt
  transforms:
- DecodeImage:
  img_mode: BGR
  channel_first: false
- DetLabelEncode: null
- DetResizeForTest:
  resize_long: 1536
- NormalizeImage:
  scale: 1./255.
  mean:
  - 0.485
  - 0.456
  - 0.406
    std:
  - 0.229
  - 0.224
  - 0.225
    order: hwc
- ToCHWImage: null
- KeepKeys:
  keep_keys:
  - image
  - shape
  - polys
  - ignore_tags
    loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 2
    profiler_options: null

0 replies

GreatV · 2025-01-03T16:20:14Z

GreatV
Jan 3, 2025
Maintainer

It seems like you are encountering issues with the inference results after fine-tuning the SAST algorithm for text detection. Below is a detailed explanation and some potential solutions based on the information you provided and related documents.

1. Issues with SAST Inference Results

The problems you described include:

Bounding boxes are not being detected properly.
Cross bounding boxes are being drawn.
Adjusting nms_thresh did not help.

Potential Reasons for Poor Inference Results:

Pre-trained Model Mismatch:
Ensure that the pre-trained model you are using is compatible with your configuration file. For SAST, you should use the pre-trained model provided for SAST (e.g., ResNet50_vd_ssld_pretrained.pdparams) from the PaddleOCR Detection Models documentation.
Incorrect Training Configuration:
Verify that the configuration file (det_r50_vd_sast_icdar15.yml) is set up correctly for your dataset. Key parameters to check:
- Backbone and Pretrained Path: Ensure the pretrained_model path points to the ResNet50 pre-trained model.
- Dataset Annotations: Ensure that your training dataset annotations are in the correct format for SAST. SAST requires both quadrilateral annotations and text-level annotations.
Data Augmentation:
Over-aggressive data augmentation (e.g., rotations, scaling) can sometimes degrade performance, especially for simple images. Review your augmentation settings in the configuration file.
Overfitting or Underfitting:
If your training dataset is small or not diverse, the model may overfit or fail to generalize properly. Check the training logs (train.log) for signs of overfitting, such as a large gap between training and validation loss.

2. Suggested Steps to Debug and Improve Results

Here are some actionable steps to address the issues:

Step 1: Verify the Pre-trained Model

Make sure you downloaded and used the correct pre-trained model:

wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams

Ensure that the pretrained_model path in your .yml file matches where this model is saved.

Step 2: Fine-tuning Configuration

Update the configuration file (det_r50_vd_sast_icdar15.yml) to ensure it aligns with your dataset and training requirements:

Set the pretrained_model path.
Ensure the batch_size and learning_rate are appropriate for your hardware and dataset size.
Verify that nms_thresh is set correctly, but note that extreme values (e.g., too low or too high) can worsen results.

Step 3: Debugging Cross Bounding Boxes

Cross bounding boxes often indicate issues with the post-processing step, particularly with the NMS (Non-Maximum Suppression) or thresholding values. Try:

Adjusting the score_thresh parameter in the configuration file.
Inspecting the output of the model before NMS to confirm whether the issue originates from the raw predictions or the NMS step.

Step 4: Evaluate Training Data Quality

Ensure that your training data annotations are accurate, especially the quadrilateral annotations required by SAST.
Check for any mislabeled or corrupted samples in your dataset.

Step 5: Monitor Training Metrics

Inspect your train.log file for:

Training loss and validation loss trends.
Ensure that the model converges properly. If the loss fluctuates significantly, consider reducing the learning rate or using a smaller batch size.

Step 6: Test with Pre-trained SAST Model Without Fine-tuning

Run inference using the default pre-trained SAST model on your test images:

ocr = PaddleOCR(use_gpu=True, det_algorithm='SAST', use_angle_cls=True, lang='en')

This can help you determine whether the issue lies with your fine-tuning process or the algorithm itself.

3. Fine-tuning Text Recognition with VisionLAN

Regarding your confusion about VisionLAN:

Pre-trained Model for Fine-tuning:
You can use the pre-trained en_PP-OCRv4_rec model for general text recognition tasks. However, if you specifically want to fine-tune VisionLAN, download the VisionLAN-specific pre-trained model from the link provided in the VisionLAN documentation.

Example for VisionLAN pre-trained model:
```
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/VisionLAN/visionlan_pretrained_model.tar
```
Switching Configuration:
To fine-tune VisionLAN, modify the configuration file to point to the VisionLAN pre-trained model and adjust the dataset path. Ensure the model type in the configuration aligns with VisionLAN.

4. Additional Resources

Fine-tuning Guide: PaddleOCR Fine-tuning Documentation
SAST Configuration Example: SAST Config File
VisionLAN Guide: VisionLAN Documentation

Conclusion

The poor inference results with SAST are likely due to a mismatch in the pre-trained model, incorrect configuration, or issues with your training data. Start by verifying the pre-trained model and configuration file, then debug your dataset and training process. For VisionLAN, ensure you use the correct pre-trained model and configuration for fine-tuning.

Feel free to share additional details (e.g., snippets from your train.log or configuration file) for more specific guidance.

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

VishyAnand28 · 2025-01-03T21:50:31Z

VishyAnand28
Jan 3, 2025
Author

@GreatV I tried using the ResNet50_vd_ssld_pretrained.pdparams file but the validation dataset is loading forever.

It has been hours just to load 800 data points. Why is that?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URGENT HELP: Fine-Tuning: SAST Algorithm - Failed Inference #14488

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

URGENT HELP: Fine-Tuning: SAST Algorithm - Failed Inference #14488

VishyAnand28 Jan 3, 2025

Replies: 3 comments

VishyAnand28 Jan 3, 2025 Author

GreatV Jan 3, 2025 Maintainer

1. Issues with SAST Inference Results

Potential Reasons for Poor Inference Results:

2. Suggested Steps to Debug and Improve Results

Step 1: Verify the Pre-trained Model

Step 2: Fine-tuning Configuration

Step 3: Debugging Cross Bounding Boxes

Step 4: Evaluate Training Data Quality

Step 5: Monitor Training Metrics

Step 6: Test with Pre-trained SAST Model Without Fine-tuning

3. Fine-tuning Text Recognition with VisionLAN

4. Additional Resources

Conclusion

VishyAnand28 Jan 3, 2025 Author

VishyAnand28
Jan 3, 2025

VishyAnand28
Jan 3, 2025
Author

GreatV
Jan 3, 2025
Maintainer

VishyAnand28
Jan 3, 2025
Author