Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions About Training the SFace Model and Discrepancies in Model Size, Accuracy, and Output Dimensions #288

Open
sayyid-abolfazl opened this issue Mar 12, 2025 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@sayyid-abolfazl
Copy link

sayyid-abolfazl commented Mar 12, 2025

Hello and thank you for your amazing work on the SFace model!

I am currently working on training the SFace model using the repository and have tested it on various datasets. My ultimate goal is to first achieve the same accuracy as your pre-trained model and then train it on my custom dataset. However, I am encountering some issues and discrepancies compared to the official model, and I would greatly appreciate your guidance on resolving them. Below are the details of my observations and questions:

1. Model Size Discrepancy

  • The size of my trained SFace model is 5.1 MB, while the official model provided in the repository is 39 MB.
    • What could be causing this significant difference in model size?
    • Are there specific configurations or components included in the official model that I might be missing?

2. Accuracy Discrepancy

  • The accuracy of my trained SFace model is significantly lower than the accuracy of your pre-trained model.
    • What could be the potential reasons for this gap in performance?
    • What steps or adjustments can I take to improve the accuracy to match your pre-trained model?

3. Output Embedding Size Discrepancy

  • My trained model produces an embedding size of 512, while I noticed that the official SFace model has an embedding size of 128.
    • Why is there a difference in the embedding sizes?
    • How can I configure my training process to produce an embedding size of 128 instead of 512?

4. Training Configuration Review

Below is the configuration I used for training my model. Could you please review it and let me know if there are any parameters or settings that need to be adjusted to achieve results closer to your pre-trained model?

{
    'SEED': 1337,
    'INPUT_SIZE': [112, 112],
    'EMBEDDING_SIZE': 512,
    'DROP_LAST': True,
    'WEIGHT_DECAY': 0.0005,
    'MOMENTUM': 0.9,
    'GPU_ID': [0],
    'DEVICE': device(type='cuda', index=0),
    'MULTI_GPU': False,
    'NUM_EPOCH': 125,
    'STAGES': [35, 65, 95, 205],
    'LR': 0.1,
    'BATCH_SIZE': 240,
    'DATA_ROOT': '../faces_emore/',
    'EVAL_PATH': '../eval/',
    'BACKBONE_NAME': 'MobileFaceNet',
    'HEAD_NAME': 'SFaceLoss',
    'TARGET': ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'],
    'BACKBONE_RESUME_ROOT': '',
    'HEAD_RESUME_ROOT': '',
    'WORK_PATH': 'face_empire'
}

parser.add_argument('--param_s', default=64.0, type=float)
parser.add_argument('--param_k', default=80.0, type=float)
parser.add_argument('--param_a', default=0.87, type=float)
parser.add_argument('--param_b', default=1.22, type=float)

If there is a need for specific changes in the following files, please advise:

  • sface_torch/config.py
  • sface_torch/train_SFace_torch.py
  • sface_torch/backbone/model_mobilefacenet.py

5. Training Parameters and Threshold Details

  • Training Parameters: What specific parameters did you use to train the official SFace model (e.g., learning rate schedules, optimizer settings, data augmentation techniques, etc.)? This would help me align my training process with yours.
  • Threshold Details: I noticed the use of a cosine threshold, threshold_cosine = 0.363, in some evaluation scripts.
    • How was this threshold value determined?
    • Is it dataset-specific, or is it a general threshold applicable across different datasets?

6. Training Logs for Reference

Below is a sample of my training logs for reference. If you notice anything unusual or suboptimal in the metrics or training behavior, please let me know:

Epoch 8 Batch 185960    Speed: 797.15 samples/s    intra_Loss -25.3453 (-26.1291)    inter_Loss 16.8062 (18.1256)    Wyi 0.4486 (0.4653)    Wj 0.0001 (0.0001)    Prec@1 77.917 (82.729)
Epoch 8 Batch 185980    Speed: 696.71 samples/s    intra_Loss -26.5736 (-26.1947)    inter_Loss 19.4150 (18.5150)    Wyi 0.4811 (0.4683)    Wj 0.0001 (0.0001)    Prec@1 87.500 (82.583)
Epoch 8 Batch 186000    Speed: 709.95 samples/s    intra_Loss -26.4986 (-26.2168)    inter_Loss 18.4467 (18.5980)    Wyi 0.4808 (0.4673)    Wj 0.0001 (0.0001)    Prec@1 86.250 (82.333)
Learning rate 0.100000
Perform Evaluation on ['cfp_ff', 'cplfw', 'calfw', 'cfp_fp', 'vgg2_fp', 'lfw', 'agedb_30'] , and Save Checkpoints...
(14000, 512)
[cfp_ff][186000]XNorm: 102.98364
[cfp_ff][186000]Accuracy-Flip: 0.98029+-0.00629
[cfp_ff][186000]Best-Threshold: 1.45500
(12000, 512)
[cplfw][186000]XNorm: 85.15097
[cplfw][186000]Accuracy-Flip: 0.78867+-0.02125
[cplfw][186000]Best-Threshold: 1.54200
(12000, 512)
[calfw][186000]XNorm: 103.92467
[calfw][186000]Accuracy-Flip: 0.90883+-0.01038
[calfw][186000]Best-Threshold: 1.49800
(14000, 512)
[cfp_fp][186000]XNorm: 86.52919
[cfp_fp][186000]Accuracy-Flip: 0.80686+-0.02192
[cfp_fp][186000]Best-Threshold: 1.68900
(10000, 512)
[vgg2_fp][186000]XNorm: 89.77735
[vgg2_fp][186000]Accuracy-Flip: 0.84040+-0.01292
[vgg2_fp][186000]Best-Threshold: 1.59500
(12000, 512)
[lfw][186000]XNorm: 104.07785
[lfw][186000]Accuracy-Flip: 0.98400+-0.00642
[lfw][186000]Best-Threshold: 1.43000
(12000, 512)
[agedb_30][186000]XNorm: 100.46037
[agedb_30][186000]Accuracy-Flip: 0.89783+-0.01895
[agedb_30][186000]Best-Threshold: 1.57000
highest_acc: [0.9847142857142857, 0.8046666666666666, 0.9238333333333332, 0.8068571428571429, 0.85, 0.9865, 0.9065000000000001]
Epoch 8 Batch 186020    Speed: 56.99 samples/s    intra_Loss -26.4323 (-26.0712)    inter_Loss 19.7517 (18.7060)    Wyi 0.4774 (0.4650)    Wj 0.0001 (0.0001)    Prec@1 85.000 (82.271)

7. Model Conversion to ONNX

I would like to convert my trained SFace model to the ONNX format for deployment.

  • Could you please provide guidance on how to properly convert the SFace model to ONNX?
  • Are there any specific considerations or steps I should follow to ensure compatibility and performance after conversion?

Thank you so much for your time and assistance! I am looking forward to your insights and recommendations.


@fengyuentau fengyuentau added the help wanted Extra attention is needed label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants