Research Article

Is Vehicle Plate Corner Prediction by Vision Transformer Better than CNNs?

Table 1

Different configurations of the proposed ViT model, given the input image resolution of according to combinations of four parameters: patch size , depth , number of multiheads , and embedding dimension . In our experiments, a total of 600 possible configurations can be generated under configuration constraints.

Patch size ()Depth ()Number of multiheads ()Embedding dimension ()

13, 26, 52, 104, 2084, 8, 16, 32, 64, 12824, 8, 16, 32, 64, 128
48,16, 32, 64, 128
816, 32, 64, 128
1632, 64, 128
3264, 128