This is the repository accompanying the paper submission Object Permanence through Audio-Visual Representations. In this work, we proposed a multimodal neural network model, using partially observed trajectory and audio, to predict the trajectory and final position of a dropped object.
Dataset is available at https://intuitivecomputing.jhu.edu/openscience.html
Pretrained weights for combined model is provided in multimodal_pretrained_weights.pkl.
partition.txt provides a dictionary of indices we used for training, validation, and testing.