Journal of Healthcare Engineering

Research Article

[Retracted] Aided Evaluation of Motion Action Based on Attitude Recognition

Table 2

Feature extractor based on convolution neural network design.


Name	Description

LeNet [18]	Proposed by LeCun in 1998, it is the first CNN. It has a seven-level convolution network dedicated to classifying numbers and has the ability to classify numbers without being affected by minor distortion, rotation, and changes in position and scale.

AlexNet	Proposed by Krizhevesky et al., by deepening CNN and applying many parameter optimization strategies to enhance CNN’s learning ability, it is considered as the first deep CNN architecture, showing the pioneering achievements of image classification and recognition tasks.

ZefNet	It is recognized as the winner of ILSVRC (CNN competition) in 2013. It uses deconvolution to visually analyze CNN’s intermediate feature map, finds a way to improve the model by analyzing feature behavior, and fine-tunes AlexNet to improve its performance. It manages to achieve a Top-5 error rate of only 14.8%. This achievement of ZefNet is achieved by adjusting AlexNet’s super parameters and keeping the same structure. In order to further improve the effectiveness and accuracy of ZefNet, more deep learning elements have been added.

GoogleNet	GoogleNet, which won the 2014-ILSVRC competition, introduced a new concept of inception blocks into CNN, integrating multiscale convolution transformations through split, transform, and merge ideas. This block encapsulates filters of different sizes (1 × 1, 3 × 3, and 5 × 5) to capture spatial information of different scales (fine-grained and coarse-grained). In addition to improving learning ability, GoogleNet focuses on improving the efficiency of CNN parameters.

VGG [19]	With the successful application of CNN in image recognition, Simonyan et al. put forward a simple and effective design principle of CNN architecture. Their architecture, called VGG, is a modular layered pattern. VGG has a depth of 19 layers to simulate the relationship between depth and network presentation ability. VGG replaces 11 × 11 and 5 × 5 filters with a stack of 3 × 3 convolution layers. Experiments show that placing 3 × 3 filters at the same time can achieve the effect of large-size filters.