Research Article

A Comparative Analysis of Visual Encoding Models Based on Classification and Segmentation Task-Driven CNNs

Figure 1

Main process of visual encoding. (a) Natural image stimuli; (b) visual processing of human brain; (c) real fMRI responses obtained by an MRI scanner; (d) CNN features of natural images extracted by pretrained CNN; (e) predicted voxel responses. When subjects saw the visual stimuli, the corresponding brain signals would be generated in the visual areas of the brain, and the fMRI responses were obtained through the MRI scanner. Using the pretrained network to extract the features of natural images, the CNN features of each layer were linearly mapped to voxel space, and the feature layer with the best prediction performance was selected as the best encoding feature layer to obtain predicted voxel responses. Then, the correlation coefficient between predicted responses and real responses was calculated to evaluate the prediction performance of the encoding model.