Height Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

Model Details

Architecture: SpeechBrain ECAPA-TDNN embeddings (192-dim) + SVR regressor
- Output: Predicted height in centimeters (continuous value)
Training Data:
- The height data was gained by querying the height parameter of VoxCeleb1 in conjunction with VoxCeleb2 from Wikidata and converted it to centimeters.
- It contains 1715 persons with height information for both datasets (VoxCeleb1 and VoxCeleb2), 1621 of which are present in VoxCeleb2.
- The code and data can be found in src\voxceleb_height_data_collection.
- The original VOXCELEB ENRICHMENT FOR AGE AND GENDER RECOGNITION dataset can be found here.
Performance:
- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
Audio Processing:
- Input format: Any audio file format supported by soundfile
- Automatically converted to: 16kHz, mono, single channel, 256 Kbps

Installation

You can install the package directly from GitHub:

pip install git+https://github.com/griko/voice-height-regression.git

Usage

from voice_height_regression import HeightRegressionPipeline

# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
    "griko/height_reg_svr_ecapa_voxceleb"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")

Limitations

Model was trained on celebrity voices from YouTube interviews
Performance may vary on:
- Different audio qualities
- Different recording conditions
- Multiple simultaneous speakers

Citation

If you use this model in your research, please cite:

TBD

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

VoxCeleb2 dataset for providing the training data
SpeechBrain team for their excellent speech processing toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Height Estimation Model

Model Details

Installation

Usage

Limitations

Citation

License

Acknowledgments

About

Releases

Packages

Languages

License

griko/voice-height-regression

Folders and files

Latest commit

History

Repository files navigation

Height Estimation Model

Model Details

Installation

Usage

Limitations

Citation

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages