Advances in Multimedia

Research Article

[Retracted] Attention Feature Network Extraction Combined with the Generation Algorithm of Multimedia Image Description

The network image description is extracted and generated based on the feature of the attention image.

	Input: The image data set and the Wiki text data set are input.
Output: The image feature description text is output. The following steps are taken for each image in the data set:
Step1. The image feature of the first layer is extracted;
Step2. The image feature of this layer is transferred to the first layer of the LSTM for the initialization of ;
Step3. The image feature of the ith layer is extracted;
Step4 The word vector , the hidden layer of the previous layer of LSTM, and the image feature are input into the next layer of LSTM, and the next output word is calculated accordingly;
Step5. The loss “Loss” is calculated based on the cross entropy, and the parameters are adjusted according to the feedback;
Step6. Return to Step3 until the output is <END> or the maximum length of the sentence is reached;
Step7. Return the image description text.