This paper describes a visual perception system for a social robot. The central part of this system is an artificial attention mechanism that discriminates the most relevant information from all the visual information perceived by the robot. It is composed by three stages. At the preattentive stage, the concept of saliency is implemented based on ‘proto-objects’ [37]. From these objects, different saliency maps are generated. Then, the semiattentive stage identifies and tracks significant items according to the tasks to accomplish. This tracking process allows to implement the ‘inhibition of return’. Finally, the attentive stage fixes the field of attention to the most relevant object depending on the behaviours to carry out. Three behaviours have been implemented and tested which allow the robot to detect visual landmarks in an initially unknown environment, and to recognize and capture the upper-body motion of people interested in interact with it.