Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

sayyid-abolfazl · 2025-03-12T05:16:07Z

Hello and thank you for your amazing work on the SFace model!

I am currently working on training the SFace model using the repository and have tested it on various datasets. My ultimate goal is to first achieve the same accuracy as your pre-trained model and then train it on my custom dataset. However, I am encountering some issues and discrepancies compared to the official model, and I would greatly appreciate your guidance on resolving them. Below are the details of my observations and questions:

1. Model Size Discrepancy

The size of my trained SFace model is 5.1 MB, while the official model provided in the repository is 39 MB.
- What could be causing this significant difference in model size?
- Are there specific configurations or components included in the official model that I might be missing?

2. Accuracy Discrepancy

The accuracy of my trained SFace model is significantly lower than the accuracy of your pre-trained model.
- What could be the potential reasons for this gap in performance?
- What steps or adjustments can I take to improve the accuracy to match your pre-trained model?

3. Output Embedding Size Discrepancy

My trained model produces an embedding size of 512, while I noticed that the official SFace model has an embedding size of 128.
- Why is there a difference in the embedding sizes?
- How can I configure my training process to produce an embedding size of 128 instead of 512?

4. Training Configuration Review

Below is the configuration I used for training my model. Could you please review it and let me know if there are any parameters or settings that need to be adjusted to achieve results closer to your pre-trained model?

{
    'SEED': 1337,
    'INPUT_SIZE': [112, 112],
    'EMBEDDING_SIZE': 512,
    'DROP_LAST': True,
    'WEIGHT_DECAY': 0.0005,
    'MOMENTUM': 0.9,
    'GPU_ID': [0],
    'DEVICE': device(type='cuda', index=0),
    'MULTI_GPU': False,
    'NUM_EPOCH': 125,
    'STAGES': [35, 65, 95, 205],
    'LR': 0.1,
    'BATCH_SIZE': 240,
    'DATA_ROOT': '../faces_emore/',
    'EVAL_PATH': '../eval/',
    'BACKBONE_NAME': 'MobileFaceNet',
    'HEAD_NAME': 'SFaceLoss',
    'TARGET': ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'],
    'BACKBONE_RESUME_ROOT': '',
    'HEAD_RESUME_ROOT': '',
    'WORK_PATH': 'face_empire'
}

parser.add_argument('--param_s', default=64.0, type=float)
parser.add_argument('--param_k', default=80.0, type=float)
parser.add_argument('--param_a', default=0.87, type=float)
parser.add_argument('--param_b', default=1.22, type=float)

If there is a need for specific changes in the following files, please advise:

sface_torch/config.py
sface_torch/train_SFace_torch.py
sface_torch/backbone/model_mobilefacenet.py

5. Training Parameters and Threshold Details

Training Parameters: What specific parameters did you use to train the official SFace model (e.g., learning rate schedules, optimizer settings, data augmentation techniques, etc.)? This would help me align my training process with yours.
Threshold Details: I noticed the use of a cosine threshold, threshold_cosine = 0.363, in some evaluation scripts.
- How was this threshold value determined?
- Is it dataset-specific, or is it a general threshold applicable across different datasets?

6. Training Logs for Reference

Below is a sample of my training logs for reference. If you notice anything unusual or suboptimal in the metrics or training behavior, please let me know:

Epoch 8 Batch 185960    Speed: 797.15 samples/s    intra_Loss -25.3453 (-26.1291)    inter_Loss 16.8062 (18.1256)    Wyi 0.4486 (0.4653)    Wj 0.0001 (0.0001)    Prec@1 77.917 (82.729)
Epoch 8 Batch 185980    Speed: 696.71 samples/s    intra_Loss -26.5736 (-26.1947)    inter_Loss 19.4150 (18.5150)    Wyi 0.4811 (0.4683)    Wj 0.0001 (0.0001)    Prec@1 87.500 (82.583)
Epoch 8 Batch 186000    Speed: 709.95 samples/s    intra_Loss -26.4986 (-26.2168)    inter_Loss 18.4467 (18.5980)    Wyi 0.4808 (0.4673)    Wj 0.0001 (0.0001)    Prec@1 86.250 (82.333)
Learning rate 0.100000
Perform Evaluation on ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'] , and Save Checkpoints...
(14000, 512)
[cfp_ff][186000]XNorm: 102.98364
[cfp_ff][186000]Accuracy-Flip: 0.98029+-0.00629
[cfp_ff][186000]Best-Threshold: 1.45500
(12000, 512)
[cplfw][186000]XNorm: 85.15097
[cplfw][186000]Accuracy-Flip: 0.78867+-0.02125
[cplfw][186000]Best-Threshold: 1.54200
(12000, 512)
[calfw][186000]XNorm: 103.92467
[calfw][186000]Accuracy-Flip: 0.90883+-0.01038
[calfw][186000]Best-Threshold: 1.49800
(14000, 512)
[cfp_fp][186000]XNorm: 86.52919
[cfp_fp][186000]Accuracy-Flip: 0.80686+-0.02192
[cfp_fp][186000]Best-Threshold: 1.68900
(10000, 512)
[vgg2_fp][186000]XNorm: 89.77735
[vgg2_fp][186000]Accuracy-Flip: 0.84040+-0.01292
[vgg2_fp][186000]Best-Threshold: 1.59500
(12000, 512)
[lfw][186000]XNorm: 104.07785
[lfw][186000]Accuracy-Flip: 0.98400+-0.00642
[lfw][186000]Best-Threshold: 1.43000
(12000, 512)
[agedb_30][186000]XNorm: 100.46037
[agedb_30][186000]Accuracy-Flip: 0.89783+-0.01895
[agedb_30][186000]Best-Threshold: 1.57000
highest_acc: [0.9847142857142857, 0.8046666666666666, 0.9238333333333332, 0.8068571428571429, 0.85, 0.9865, 0.9065000000000001]
Epoch 8 Batch 186020    Speed: 56.99 samples/s    intra_Loss -26.4323 (-26.0712)    inter_Loss 19.7517 (18.7060)    Wyi 0.4774 (0.4650)    Wj 0.0001 (0.0001)    Prec@1 85.000 (82.271)

7. Model Conversion to ONNX

I would like to convert my trained SFace model to the ONNX format for deployment.

Could you please provide guidance on how to properly convert the SFace model to ONNX?
Are there any specific considerations or steps I should follow to ensure compatibility and performance after conversion?

Thank you so much for your time and assistance! I am looking forward to your insights and recommendations.

The text was updated successfully, but these errors were encountered:

fengyuentau added the help wanted Extra attention is needed label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

sayyid-abolfazl commented Mar 12, 2025 •

edited

Loading

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

Comments

sayyid-abolfazl commented Mar 12, 2025 • edited Loading

1. Model Size Discrepancy

2. Accuracy Discrepancy

3. Output Embedding Size Discrepancy

4. Training Configuration Review

5. Training Parameters and Threshold Details

6. Training Logs for Reference

7. Model Conversion to ONNX

sayyid-abolfazl commented Mar 12, 2025 •

edited

Loading