We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models.
我们提出了一种简单、模块化且通用的方法,该方法通过增加几何和外观细节来上采样粗糙的 3D 模型。尽管现在存在生成式 3D 模型,但它们的质量尚未达到图像和视频领域同类产品的质量。我们证明了可以直接将现有的(预训练的)视频模型用于 3D 超分辨率,从而绕过高质量 3D 训练模型库短缺的问题。我们描述了如何重新利用视频上采样模型,这些模型在 3D 上不一致,并将它们与 3D 整合结合起来,以产生 3D 一致的结果。作为输出,我们生成了高质量的高斯喷溅模型,这些模型以对象为中心且效果显著。我们的方法不受类别限制,可以轻松整合到现有的 3D 工作流程中。我们在多种 3D 输入上评估了我们提出的 SuperGaussian,这些输入在复杂性和表现形式(例如,高斯喷溅或 NeRFs)方面都具有多样性,并证明我们的简单方法显著提高了最终 3D 模型的保真度。