Abstract

This is a collaborative project about a higher level interpretation of the visual world.

In producing image analysis, video surveillance, remote sensing, medical diagnosis software tools, the usual problem is the actual parameterizing of the processing algorithms. Sometimes it is obvious for the human operator, and he must make the evident tuning. Other times it is not so evident, and the problem itself to be solved must be defined before. In most cases the setting is done by applying the experience of retrieved training data from previous experiments.

We address the fundamental problem of automatic extraction of visual information from raw sensory data. The main theoretical contribution consists of coherent models for sensory understanding the visual world. In particular, we will investigate how human vision works and adapts itself via perceptual learning to solve special tasks like the interpretation of medical images or analysis of aerial and satellite images. Mathematical and computational models of these techniques will then be derived and applied to image processing algorithms capable to solve similar tasks. Although many unsupervised algorithms have been proposed in the literature, a fully automatic solution to real-life image processing tasks remains a challenge.

A retrieval system will also provide latent relations between objects, events, scenes and scenarios and will be able to generate hypotheses of scene objects and events based on the prior knowledge and experience. Subjects are identified as the highest hierarchical node in ontology, and the joint set of extracted subjects will suggest a more accurate query class. Thus, the interaction of retrieval subsystem with the fusion and inference mechanisms will provide a more reliable scenario description in case of incomplete data.

Models of human visual attentional selection, scene understanding and perceptual learning will be investigated and relevant knowledge will be applied in the computer vision systems proposed. In addition, the learning mechanisms leading to highly efficient extraction of task-relevant visual information by human experts will be investigated in the one of consortium partner’s laboratory using eye tracking and visual psychophysics as experimental techniques. The following specific questions will be addressed: how learning affects fixation patterns and attentional selection processes in complex visual images and cluttered natural scenes; 2. how learning effects integration and selection of visual features of the target and suppression of distractor features.

The results of the proposed project are basically scientific: cooperation among image processing groups and behavioural sciences is an extraordinary chance and a great value for the above goals.

Participants:

§ Tamás Szirányi - Consortium leader (sziranyi@sztaki.hu), MTA SZTAKI, Distributed Events Analysis research group

§ László Czúni (czuni@almos.vein.hu), Pannon (Veszprém) University

§ Zoltán Kató (kato@inf.u-szeged.hu), Institute of Informatics, University of Szeged, Department of Image Processing and Computer Graphics.

§ Zoltán Vidnyánszky (vidnyanszky@digitus.itk.ppke.hu), Pazmany Peter Catholic University (PPKE), Faculty of Information Technology

Introduction

When human observers are interpreting images, they are not only taking into account direct observations like color or intensity, but also a priori knowledge about the world. However, such a complex, interacting method is rarely used in image processing systems. State-of-the-art algorithms are mainly bottom-up trying to extract some useful information solely from the observed image data which is then interpreted. Obviously, low level image data alone cannot provide reliable information. Furthermore, the extraction of visual information is always task-driven, i.e. a computer vision system must be able to select the task relevant part of the sensory information and suppress any task-irrelevant information. While human vision is astonishingly accurate and robust in selecting the “right” visual attributes, modern machine vision systems are challenged by the lack of coherent models to drive low-level image processing in extracting task-relevant visual information. How does the human vision system subconsciously choose what to learn and store out of massive visual data in daily life? For such knowledge, to which degree of regularity or compression that our vision system “decides” to process in order to achieve efficiency and robustness? How to mathematically model or quantify such activities? – Theories of visual perception attempt to answer these questions. The principal goal of the proposed research is to investigate what techniques are used by the human vision to solve specific image processing tasks and to what degree these methods could be adopted by computer vision systems. In particular, we are interested in developing coherent models capable to automatically select and integrate relevant visual attributes in order to extract task specific visual information from various kind of images. When fully developed, the proposed models are applicable to problems of great importance in a wide range of areas of broad impact, for example, remote sensing; medical imaging; security systems; object-based low bit rate image coding; object tracking in video; and many others.

We see six main factors here:

What we are just looking at? - Realizing the main kind of the subjected scene;
What is to be detected in details? – Possible words of the semantic of the scene;
How can I estimate from one detected object the presence of other objects? – scene semantic;
Probabilistic reasoning of objects and their composition from image attributes – global and local parameters;
Retrieving similarities to the subjected scenes in image/video database – How can we search in undefined premises?
Routines of information extraction in human visual perception: how a human expert can set the system and define the problem for analysis?

The consortium members have a strong background renown by the international scientific community as well as a broad and thorough experience in industrial project and international cooperation in the given topic. Within this collaborative project, the partners play a complementary role, thus the consortium’s expertise covers a broad range of human and computer vision areas like segmentation, motion detection, cue integration, shape modeling and representation, object recognition and classification, 3D reconstruction and scene interpretation. The consortium members have been actively involved in various industrial projects:

1. Developing new video compression algorithms for Samsung

2. Research and development for Tateyama (Japan) in video surveillance

3. Developing a restoration software system for archive movie films, made for the Hungarian Film Archive

4. Developing surveillance software system for the Hungarian Police

5. Developing video tools for National Instruments

6. Developing medical image processing algorithm for GE Healthcare

7. Research and development of tree crown detection algorithms for the Hungarian Forestry Service

The goal of the project have been addressed in last years when working in given problems. The results of the proposed project are basically scientific: similar cooperation among image processing groups and behavioural sciences is an extraordinary chance and a great value for the above goals. The achieved results will be applied in surveillance systems, medical diagnosis, designing intelligent city projects, attack prevention in security tasks, and automatic driving tasks.