Multimodal feature fusion for establishing novel 3D saliency models

1 Dec 2017– 30 Nov 2019
External identifier
NKFIH KH-126688
Founded %

The project aims to process the data of novel 3D sensors (e.g. Microsoft Kinect, Lidar, MRI, CT) available in a wide range of application fields and to fuse them with 2D image modalities to build saliency models, which are able to automatically and efficiently emphasize visually dominant regions. Such models not only tighten the region of interest for further image processing steps, but facilitate and increase the efficiency of segmentation in different application fields with available 3D sensor data, e.g. remote sensing, medical imaging, 3D reconstruction and video surveillance systems.


Beside existing 2D sensors (e.g. cameras and photo machine), there is a growing range of consumer-grade 3D sensors nowadays (e.g. Microsoft Kinect). These sensing technologies provide the ability to capture 3D information, which was not available before with 2D cameras. Such important 3D data modality is depth information. Automatic saliency detection is a fundamental problem in computer vision, which aims to automatically predict where human looks in the image and locate the image regions that most attract human’s visual attention. To estimate saliency efficiently, the model should apply low-level features inspired by human vision: e.g. calculate the salient image regions with distinctive color/contrast. Depth information is an important 3D feature in human vision system, which is also available now in machine vision with the aforementioned sensors. By fusing color/contrast and depth, saliency can be more accurately estimated, which helps to facilitate and increase the performance of further image processing steps. 3D data is available in multiple application areas, like remote sensing (Lidar sensors), medical imaging (MRI, CT), therefore by elaborating 3D saliency models, data processing in those areas can be improved.

Steps of salient object detection.
Salient object detection: (a) Original image; (b) Texture-based saliency detection; (c) Salient point set and initial contour; (d) Salient object contour.




+36 1 279 6158