-
Split the dataset into chunks so that every chunk in a subfolder containing image, audio, gt, bbox, depth image
cd data
Using the functionsplit
insplit_avst.py
-
Calculate GCCPHAT for each audio clip in each chunk
Using the function
test
ingccphat_avst.py
-
Train
python main_avst.py
-
Evaluation
Put the path of pretrained model in
args.load_path
inmain_avst.py
Comment the training code
trainer.fit(model, data_module)
Uncomment the code block
checkpoint = torch.load(args.load_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()
-
Split the data into training set and test set. Session 1, 2, 3 will be in the test set and the remaining sessions will be in the training set. The whole video list can be found in
video.txt
under theEasyCom
folder. -
Split the dataset into chunks so that every chunk in a subfolder containing image, audio, gt
cd data
Using the functionsplit
insplit.py
-
Calculate GCCPHAT for each audio clip in each chunk
Using the function
test
ingccphat.py
-
Train
python main.py
-
Evaluation
Put the path of pretrained model in
args.load_path
inmain.py
Comment the training code
trainer.fit(model, data_module)
Uncomment the code block
checkpoint = torch.load(args.load_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()
EasyCom Dataset https://github.com/facebookresearch/EasyComDataset
Transformer Block: https://github.com/jadore801120/attention-is-all-you-need-pytorch
GCCPHAT Calculation: https://github.com/yinkalario/Two-Stage-Polyphonic-Sound-Event-Detection-and-Localization
Lightning: https://github.com/miracleyoo/pytorch-lightning-template