We describe a sensory substitution scheme that converts a video stream into an audio stream in real-time. It was initially developed as a research tool for studying human ability to learn new ways of perceiving the world: the Vibe can give us the ability to learn a kind of ‘vision’ by audition. It converts a video stream into a continuous stereophonic audio signal that conveys information coded from the video stream. The conversion from the video stream to the audio stream uses a kind of retina with receptive fields. Each receptive field controls a sound source and the user listens to a sound that is a mixture of all these sound sources. Compared to other existing vision-to-audition sensory substitution devices, the Vibe is highly versatile in particular because it uses a set of configurable units working in parallel. In order to demonstrate the validity and interest of this method of vision to audition conversion, we give the results of an experiment involving a pointing task to targets memorised through visual perception or through their auditory conversion by the Vibe. This article is also an opportunity to precisely draw the general specifications of this scheme in order to prepare its implementation on an autonomous/mobile hardware.