Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 5.06 KB

File metadata and controls

27 lines (14 loc) · 5.06 KB

2018-08-21

这篇文章介绍两篇 ECCV 2018最新的 paper,一篇提出新的弱监督和半监督框架可实现含无限数量标签的语义分割;另一篇提出使用立体匹配网络作为proxy 来从合成数据中学习深度,并使用预测的立体视差图来监督单目深度估计网络。

Semantic Segmentation

《Concept Mask: Large-Scale Segmentation from Semantic Concepts》

ECCV 2018

Abstract:Existing works on semantic segmentation typically consider a small number of labels, ranging from tens to a few hundreds. With a large number of labels, training and evaluation of such task become extremely challenging due to correlation between labels and lack of datasets with complete annotations. We formulate semantic segmentation as a problem of image segmentation given a semantic concept, and propose a novel system which can potentially handle an unlimited number of concepts, including objects, parts, stuff, and attributes. We achieve this using a weakly and semi-supervised framework leveraging multiple datasets with different levels of supervision. We first train a deep neural network on a 6M stock image dataset with only image-level labels to learn visual-semantic embedding on 18K concepts. Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts. Finally, we train an attention-driven class agnostic segmentation network using an 80-category fully annotated dataset. We perform extensive experiments to validate that the proposed system performs competitively to the state of the art on fully supervised concepts, and is capable of producing accurate segmentations for weakly learned and unseen concepts.

摘要:关于语义分割的现有工作通常考虑少量标签,范围从几十到几百。由于标签之间的相关性以及缺少具有完整注释的数据集,因此对于大量标签,对此类任务的训练和评估变得极具挑战性。我们将语义分割表示为给定语义概念的图像分割问题,并提出一种新颖的系统,它可以处理无限数量的概念,包括对象,部件,东西和属性。我们使用弱监督和半监督框架来实现这一目标,该框架利用具有不同监督级别的多个数据集。我们首先在6M图像数据集上训练深度神经网络,仅使用图像级标签来学习18K概念的视觉语义嵌入。然后,我们使用带有750个概念的边界框注释的curated 数据集来优化和扩展嵌入网络以预测注意力图。最后,我们使用80类完全注释的数据集训练注意力驱动的类不可知分割网络。我们进行了大量实验,以验证所提出的系统在完全监督的概念上与现有技术相比具有竞争力,并且能够为弱学习和看不见的概念产生准确的分割。

arXiv:https://arxiv.org/abs/1808.06032

Monocular Depth Estimation

《Learning Monocular Depth by Distilling Cross-domain Stereo Networks》

ECCV 2018

Abstract:Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving. Existing supervised and unsupervised methods face great challenges. Supervised methods require large amounts of depth measurement data, which are generally difficult to obtain, while unsupervised methods are usually limited in estimation accuracy. Synthetic data generated by graphics engines provide a possible solution for collecting large amounts of depth data. However, the large domain gaps between synthetic and realistic data make directly training with them challenging. In this paper, we propose to use the stereo matching network as a proxy to learn depth from synthetic data and use predicted stereo disparity maps for supervising the monocular depth estimation network. Cross-domain synthetic data could be fully utilized in this novel framework. Different strategies are proposed to ensure learned depth perception capability well transferred across different domains. Our extensive experiments show state-of-the-art results of monocular depth estimation on KITTI dataset.

摘要:单目深度估计旨在估计单个图像的像素深度图,其在场景理解和自动驾驶中具有广泛的应用。现有的监督和无监督方法面临巨大挑战。监督方法需要大量深度测量数据,这些数据通常难以获得,而无监督方法通常在估计精度方面受到限制。合成数据为收集大量深度数据提供了可能的解决方案。然而,合成数据和实际数据之间存在较大的域(domain)差距,这使得直接训练具有一定挑战性。在本文中,我们建议使用立体匹配网络作为proxy 来从合成数据中学习深度,并使用预测的立体视差图来监督单目深度估计网络。跨域合成数据可以在这个新颖的框架中得到充分利用。提出了不同的策略来确保学习深度感知能力在不同域之间良好地传递。我们的广泛实验显示了KITTI数据集上单目深度估计的最新结果。

arXiv:https://arxiv.org/abs/1808.06586