Research Article
Evaluation of Vision Transformers for Traffic Sign Classification
Table 6
Evaluation results on the traffic sign datasets.
| Model | Germany | India | China | | Training | Validation | Testing | Training | Validation | Testing | Training | Validation | Testing |
| Convolutional neural networks | VGG16 | 99.89% | 99.94% | 98.84% | 99.77% | 98.75% | 98.44% | 99.65% | 99.52% | 99.21% | ResNet | 99.88% | 99.82% | 98.37% | 99.92% | 99.06% | 97.47% | 99.72% | 99.41% | 99.25% | DenseNet | 99.97% | 99.90% | 98.82% | 100.00% | 99.38% | 98.59% | 99.95% | 99.69% | 99.42% | MobileNet | 99.87% | 99.56% | 97.41% | 99.77% | 96.83% | 95.98% | 99.70% | 98.40% | 98.05% | SqueezeNet | 99.52% | 99.56% | 96.69% | 98.54% | 96.21% | 96.65% | 99.21% | 98.91% | 98.24% | ShuffleNet | 98.96% | 98.81% | 95.49% | 99.92% | 98.75% | 99.11% | 98.96% | 98.84% | 95.53% | MnasNet | 99.96% | 99.18% | 96.17% | 100.00% | 98.10% | 96.80% | 99.67% | 99.18% | 96.26% | Vision Transformers | ViT | 98.27% | 98.89% | 83.77% | 98.80% | 96.54% | 97.10% | 94.35% | 94.79% | 93.53% | ViT (RealFormer) | 98.45% | 99.19% | 86.03% | 98.67% | 95.94% | 96.65% | 93.62% | 94.21% | 94.22% | ViT (Sinkhorn Transformer) | 94.69% | 97.04% | 82.29% | 95.99% | 94.02% | 94.79% | 80.68% | 85.61% | 84.71% | ViT (Nyströmformer) | 79.15% | 83.15% | 62.41% | 90.47% | 80.13% | 80.95% | 86.97% | 79.08% | 79.10% | TNT | 96.83% | 97.73% | 84.39% | 97.71% | 92.75% | 94.42% | 96.25% | 94.52% | 95.05% | Performance gap | CNN (best)-Transformer (best) | 12.81% | 2.01% | 4.37% |
|
|