Egocentric Audio Visual Speaker Localization

The simulated dataset Ego-AVST is available on request

Experiments on Ego-AVST

Split the dataset into chunks so that every chunk in a subfolder containing image, audio, gt, bbox, depth image

cd data
Using the function split in split_avst.py
Calculate GCCPHAT for each audio clip in each chunk

Using the function test in gccphat_avst.py
Train

python main_avst.py
Evaluation

Put the path of pretrained model in args.load_path in main_avst.py

Comment the training code
trainer.fit(model, data_module)

Uncomment the code block
checkpoint = torch.load(args.load_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()

Experiments on EasyCom

Split the data into training set and test set. Session 1, 2, 3 will be in the test set and the remaining sessions will be in the training set. The whole video list can be found in video.txt under the EasyCom folder.
Split the dataset into chunks so that every chunk in a subfolder containing image, audio, gt

cd data
Using the function split in split.py
Calculate GCCPHAT for each audio clip in each chunk

Using the function test in gccphat.py
Train

python main.py
Evaluation

Put the path of pretrained model in args.load_path in main.py

Comment the training code
trainer.fit(model, data_module)

Uncomment the code block
checkpoint = torch.load(args.load_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()

Reference

EasyCom Dataset https://github.com/facebookresearch/EasyComDataset
Transformer Block: https://github.com/jadore801120/attention-is-all-you-need-pytorch
GCCPHAT Calculation: https://github.com/yinkalario/Two-Stage-Polyphonic-Sound-Event-Detection-and-Localization
Lightning: https://github.com/miracleyoo/pytorch-lightning-template

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
EasyCom		EasyCom
data		data
model		model
transformer		transformer
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_avst.py		main_avst.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Egocentric Audio Visual Speaker Localization

The simulated dataset Ego-AVST is available on request

Experiments on Ego-AVST

Experiments on EasyCom

Reference

About

Releases

Packages

Languages

License

KawhiZhao/Egocentric-Audio-Visual-Speaker-Localization

Folders and files

Latest commit

History

Repository files navigation

Egocentric Audio Visual Speaker Localization

The simulated dataset Ego-AVST is available on request

Experiments on Ego-AVST

Experiments on EasyCom

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages