Abstract

In recent decades, automatic vehicle classification plays a vital role in intelligent transportation systems and visual traffic surveillance systems. Especially in countries that imposed a lockdown (mobility restrictions help reduce the spread of COVID-19), it becomes important to curtail the movement of vehicles as much as possible. For an effective visual traffic surveillance system, it is essential to detect vehicles from the images and classify the vehicles into different types (e.g., bus, car, and pickup truck). Most of the existing research studies focused only on maximizing the percentage of predictions, which have poor real-time performance and consume more computing resources. To highlight the problems of classifying imbalanced data, a new technique is proposed in this research article for vehicle type classification. Initially, the data are collected from the Beijing Institute of Technology Vehicle Dataset and the MIOvision Traffic Camera Dataset. In addition, adaptive histogram equalization and the Gaussian mixture model are implemented for enhancing the quality of collected vehicle images and to detect vehicles from the denoised images. Then, the Steerable Pyramid Transform and the Weber Local Descriptor are employed to extract the feature vectors from the detected vehicles. Finally, the extracted features are given as the input to an ensemble deep learning technique for vehicle classification. In the simulation phase, the proposed ensemble deep learning technique obtained 99.13% and 99.28% of classification accuracy on the MIOvision Traffic Camera Dataset and the Beijing Institute of Technology Vehicle Dataset. The obtained results are effective compared to the standard existing benchmark techniques on both datasets.

1. Introduction

In recent times, developing an intelligent traffic surveillance system has become an emerging research topic, where it delivers an innovative tool to improve driver satisfaction, efficiency, and transportation safety [1]. Automatic vehicle classification plays a crucial role in intelligent traffic surveillance systems, and it supports several applications like traffic flow analysis, electronic toll collection, and intelligent parking systems [2, 3]. Due to the COVID-19 outbreak and mobility restrictions, citizens were allowed to move out of the home to procure only essential goods in groceries or pharmacies. Intelligent traffic surveillance systems can track down any motorists entering to the worst-affected region from low-risk areas.

Automatic vehicle classification is a challenging task while the videos are being collected from traffic surveillance cameras [4]. Captured traffic surveillance images are lower-resolution images and are subjected to several weather conditions, illumination conditions, and occlusion [5]. In addition, vehicle types generate a lot of intraclass and interclass similarities which affect vehicle classification performance [6]. In order to address the aforementioned problems, several machine learning methods and data manipulation techniques have been developed in order to deal with the imbalanced data classification [79]. Compared to other objects, vehicles have different structural characteristics, larger intraclass variations, and larger interclass distances, and these factors make vehicle detection and classification a challenging task [10] because a single classifier in the classification stage would seem impossible to detect. The existing research on various detection mechanisms has resulted in efficient identification of incidences while others have the same limitations of standard identification versions [11, 12]. The motivation of this research study is to highlight the aforementioned issues and to deal with the imbalanced data, and a new technique is proposed in this research paper for vehicle type classification.

Initially, the surveillance videos or images are collected by the Beijing Institute of Technology (BIT) Vehicle Dataset and the MIOvision Traffic Camera Dataset (MIO-TCD). Additionally, the visual ability of the collected vehicle images is improved by implementing the Adaptive Histogram Equalization (AHE) method and then the Gaussian Mixture Model (GMM) which are utilized to detect vehicles from the denoised images. The GMM model provides higher detection accuracy, adaptation to image content, simplicity of implementation, and fast computation in vehicle detection. After recognizing the vehicles, the hybrid feature extraction is accomplished by using the Steerable Pyramid Transform (SPT) and the Weber Local Descriptor (WLD) to extract feature vectors from the detected images. By implementing high-level global descriptors, the semantic gap between the extracted feature vectors is limited and results in better classification, reduced training time, and overfitting issues. Finally, the ensemble deep learning technique is used to classify the vehicle types such as the 11 classes in MIO-TCD and the 6 classes in the BIT Vehicle Dataset. Lastly, the proposed ensemble deep learning technique performance is analyzed in terms of the False Discovery Rate (FDR), the False Omission Rate (FOR), recall, precision, and accuracy. The simulation results confirmed that the proposed ensemble deep learning technique is significant in vehicle type classification related to the state-of-the-art techniques. In contrast, one of the drawbacks of using the ensemble deep learning technique is the vanishing gradient problem, which occurs when a large input space is mapped into a smaller one; this problem can be highlighted in future work.

Liu et al. [13] developed the Generative Adversarial Nets (GANs) to classify vehicles from traffic surveillance videos. The developed approach consists of three steps in vehicle classification. Initially GAN was trained on a collected traffic dataset to generate adversarial samples for the rare classes. In the second step, an ensemble-based Convolutional Neural Network (CNN) was trained on the imbalanced dataset, and then sample selection was carried out to eliminate the lower quality adversarial samples. Finally, the selected adversarial samples were utilized to refine the ensemble model on the augmented dataset. Extensive experiments showed that the developed GAN approach achieved effective performance in vehicle classification on MIO-TCD by means of the Cohen kappa score, mean recall, precision, and mean precision. However, degradation issues will occur in the developed GAN approach, when the deeper networks are about to converge. Fu et al. [14] developed a new vehicle classification technique on the basis of a hierarchical multi-SVM (multi-Support Vector Machine) classifier. Initially, the foreground objects were extracted from the surveillance videos, and then the hierarchical multi-SVM technique was developed for vehicle classification. Additionally, a voting-based correction approach was used to track the classified vehicles for the performance evaluation. In this literature study, a practical system was developed based on the hierarchical multi-SVM technique for robust vehicle classification in a heavy traffic scene. Hence, the developed technique is ineffective in practical crowded traffic scenes, due to the different views, shadows, and heavy occlusion. Further, Şentaş et al. [15] used the tiny YOLO with the SVM classifier for vehicle detection and classification. In the experimental segment, the performance of the developed model was validated on the BIT Vehicle Dataset in light of precision and recall. The result of the experiment confirms that the developed model significantly classifies the vehicle types in real-time-streaming traffic videos. However, SVM was a binary classifier, which supports only binary classification that was a major limitation in this study. Wang et al. [16] developed a vehicle type classification system based on the faster R-CNN technique. The performance of the developed technique was evaluated on a real-time dataset which contains real scene images captured at the crossroads. As a future enhancement, a novel technique is needed to improve the ability to detect a vehicle which is occluded due to different illumination conditions, angles, and scales of the images. Zhuo et al. [17] developed a CNN model for vehicle classification which includes two important steps such as fine tuning and pretraining. In the pretraining step, GoogLeNet was applied on the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) dataset in order to get the initial model with connection weight. In the fine tuning step, the obtained initial model was fine-tuned on the vehicle dataset to achieve final classification. In this literature study, the collected highway surveillance videos include six vehicle categories like van, minibus, truck, bus, car, and motorcycle. In the experimental phase, the performance analysis was carried out on the vehicle dataset by means of accuracy. However, the developed CNN model is computationally expensive and has a major problem of “overfitting.” Murugan and Vijaykumar [18] developed a new framework for vehicle type classification that includes six main phases such as data preprocessing, detection of the vehicles, vehicle tracking, structural matching, extraction of the features, and vehicle classification. After collecting the traffic surveillance videos, data preprocessing was accomplished by using noise removal and color conversion. Further, the Otsu thresholding algorithm and background subtraction were used to detect the vehicles. Then, vehicle tracking was accomplished using the Kalman filter in order to track the moving vehicles. Additionally, the log Gabor filter and the Harrish corner detector were used to extract the feature vectors, and then the obtained features were fed to the Artificial Neural Fuzzy Inference System (ANFIS) for classification of the vehicles. Extensive experiments showed that the developed framework achieved significant performance in vehicle classification in light of error rate and accuracy. The developed framework increases the dimensionality issue that accounts for the model complexity. Dong et al. [19] implemented a new semisupervised CNN architecture for vehicle type classification. In the developed architecture, a sparse Laplacian filter was applied to extract the rich and discriminative information of the vehicles. In the output layer, a softmax classifier was trained by multitask learning for vehicle type classification. In this literature study, the features learned by the semisupervised CNN architecture were discriminative to work significantly in the complex scenes. Extensive experiments were evaluated on the BIT Vehicle Dataset and a public dataset in order to analyze the efficiency of the developed architecture in light of classification accuracy. The semisupervised CNN architecture includes several layers, so the training process consumes more time. Hedeya et al. [20] introduced a new densely connected single-split super learner and applied variants for vehicle type classification on the BIT Vehicle Dataset and MIO-TCD. The developed model was simple, and it does not require any logic reasoning and hand-crafted features to achieve better vehicle type classification performance. In the complex datasets, the developed model introduces the vanishing gradient problem that is a major concern in this literature study. Soon et al. [21] implemented a new semisupervised model for vehicle type classification on the basis of Principal Component Analysis Convolutional Network (PCN). In the developed model, convolutional filters were utilized to extract the hierarchical and discriminative features. The simulation result showed that the developed model obtained better performance in real-time applications, due to its robustness against noise contaminations, illumination conditions, rotation, and translation. The developed PCN model contains a greater number of training parameters that lead to an overfitting problem.

Awang et al. [22] developed the Sparse-Filtered CNN with Layer Skipping (SF-CNNLS) approach for vehicle type classification. In this literature study, three channels of the SF-CNNLS approach were applied to extract discriminant and rich vehicle features. Additionally, the global and local features of the vehicles were extracted from the three channels of an image based on their color, brightness, and shape. In the Experimental Results and Discussion, the performance of the developed SF-CNNLS approach was validated on a benchmark dataset. Finally, the softmax regression classifier was used to classify the vehicle types like truck, minivan, bus, passenger, taxi, car, and SUV. The developed softmax regression classifier includes higher-level layers; however, by embedding lower-resolution vehicle images, there may be a loss of vehicle type information. Nasaruddin et al. [23] developed an attention-based approach and a deep CNN technique for lightweight moving vehicle classification. In this literature, the developed model performance was validated on a real-time dataset by means of specificity, precision, and -score. However, the developed model performance was limited in such circumstances as baseline, camera jitter classes, and bad weather. The methods undertaken, datasets, advantage of using the developed methods in vehicle type classification, and disadvantage of the methods are clearly given for each literature paper. In order to address the above stated issues, a new ensemble deep learning technique is proposed in this research paper to improve vehicle type classification.

This paper is organized as follows. Methodology introduces two vehicle datasets and their parameters, as well as preprocessing data techniques and selected machine learning algorithms. The Experimental Results and Discussion describes the performance of the ensemble deep learning technique in terms of classification accuracy, provides comparative analysis between the proposed and existing technique, and discusses benefits and weaknesses of selected models. Finally, the last section presents our conclusions.

3. Methodology

In a recent scenario, vehicle type classification is the emerging research area in intelligent traffic systems, due to its wide range of applications that includes intelligent parking systems and traffic flow statistics [24]. Many approaches have been developed using vehicle type classification, which are commonly based on cameras, magnetic induction, and optic fibres [25]. The image-based approaches received great attention in the computer vision community with the extensive use of traffic surveillance cameras. The flow diagram of the ensemble deep learning technique is given in Figure 1.

3.1. Image Collection

In this research study, the proposed ensemble deep learning technique performance is tested on the BIT Vehicle Dataset and MIO-TCD. The BIT Vehicle Dataset is comprised of 9850 vehicle images with pixel sizes of and , which have been captured using two different cameras at different places and time. The BIT Vehicle Dataset consists of six vehicle types, namely, sedan, microbus, SUV, minivan, bus, and truck, and there are 5922, 883, 1392, 476, 558, and 822 images for each corresponding vehicle type [26]. The captured images are varied in terms of view points, surface color of the vehicles, scales, position of the vehicles, and illumination conditions. Due to the sizes of the vehicles and capturing delay, the top and bottom parts of the vehicles are not included in the images. The location of every vehicle is preannotated in the BIT Vehicle Dataset, because some images include one or two vehicles in the same image. The sample images of the BIT Vehicle Dataset are given in Figure 2. The BIT Vehicle Dataset link is as follows:https://www.programmersought.com/article/7654351045/.

In addition, the MIO-TCD classification dataset is comprised of 648,959 vehicle images, and it includes eleven vehicle types: bicycle, articulated truck, motorcycle, nonmotorized vehicle, bus, car, pedestrian, work van, pickup truck, single-unit truck, and background [27]. The data statistics of the MIO-TCD classification dataset is stated in Table 1. Every annotated image in the BIT Vehicle Dataset and MIO-TCD is stored in a structured format. Sample images of the MIO-TCD classification dataset are given in Figure 3. The MIO-TCD classification dataset link is as follows: https://github.com/hakimamarouche/MIO-TCD-classification.

3.2. Image Preprocessing and Vehicle Detection

After collecting the vehicle images, the AHE technique is used to enhance the visual ability of the images by altering the global image contrast. Additionally, the AHE technique calculates several histogram values for redistributing the lightness values of the vehicle images that enhances the local contrast and definitions of edges in every region of a vehicle image. Firstly, the collected images are considered as , and the number of gray level occurrences in the collected images is indicated as [28]. Hence, the probability of a grey level occurrence is computed using where is indicated as the number of image gray levels, which ranges between 0 and 255; is denoted as total image pixels; and is stated as the histogram value of the image pixel which is normalized between . Further, the cumulative distribution function (CDF) is computed for using

Then, a transformation form is developed to generate a new image with the flat histogram values. The transformed vehicle images have a linear CDF which is mathematically stated in where and are represented as constant variables that range between , and the variable is in the range of . In the AHE technique, a simple transformation is applied to map the pixel values back into their original image, which is mathematically determined in

After image preprocessing, GMM is applied to detect vehicles from the preprocessed images, . In the field of vehicle type classification, GMM is used for detecting and recognizing moving objects [29]. GMM is a statistical model, which describes spatial distribution and the properties of the data in the parameter space. GMM includes a parametric probability density function, which is comprised of numerous Gaussian component functions for detecting vehicles from the images [30], that is mathematically defined in equation (6). Sample preprocessed and vehicle-detected images are graphically represented in Figure 4: where is denoted as bivariate normal distribution with mean vector , is denoted as the th prior probability of Gaussian distribution, where the data sample produces, and is indicated as a covariance matrix.

3.3. Feature Extraction and Vehicle Classification

After vehicle detection, SPT and WLD are combined to extract feature vectors from the detected images, which decreases the overfitting risks, speeds up the training process, and enhances the data visualization ability. SPT is a linear multiorientation and multiscale image decomposition method, and it is developed to overcome the concerns of orthogonal separable wavelet decomposition [31]. At first, the SPT decomposition method categorizes the detected images into several orientations, and then scales the images based on the derivate operators in different directions with variable sizes, even though the bandwidth orientation of the subbands are equal to, where is stated as the number of orientations. The resultant subbands of the SPT method are rotation invariant and translation invariant [32].

In the SPT method, the detected images are decomposed into high- and low-frequency components using H0 and L0 filters. Additionally, the lower-frequency components are decomposed into two oriented band-pass components, and the low-frequency components are decomposed by using the oriented band-pass filters B0 and B1 and the low-pass filter L1. The more the number of orientations (increasing the derivative degree), the greater the number of pyramid levels produced and the finer is the orientation and scale tuning, which means a more robust representation of the images. In the SPT method, orientation of the filters should satisfy the following conditions: (i)The linear combination of the filters generates a filter of any orientation(ii)The filters are copied and rotated to develop another filter. So, all the filters are copies rotated from their counterparts

Next, every subband is convolved with the texture descriptor WLD to extract the active features from the images. WLD is a robust local texture feature descriptor, which is inspired from Weber’s law. WLD is comprised of two components such as image orientation and differential excitation to extract texture features from the vehicle-detected image. Hence, the differential excitation component is used for reflecting the changes of current pixels [3335], which is computed by utilizing

where is represented as the differential excitation of the current pixel , is stated as the ratio of the difference in current pixel intensity, is represented as th neighboring pixel of , and is stated as the number of neighbors. Further, the gradient orientation component of the current pixel is calculated using where and are the outputs of two filters, and , which are used to compute the differences between current and neighborhood image pixels, and is in the range of . Next, the extracted active feature vectors are fed to the ensemble deep learning technique for vehicle classification.

Additionally, an ensemble deep learning technique is proposed for vehicle type classification on traffic surveillance videos. The extracted features are fed to the ensemble deep learning technique in the input layer that reduces the classification bias and the training time. In order to highlight the concerns occurring because of extreme imbalanced data distributions, hybrid feature extraction (SPT and WLD) is devised in this research. Additionally, the size of the minority vehicle classes is reduced to a small number compared to the majority classes in the practical applications to avoid overfitting problems. One of the major advantages of using ResNet-152, ResNet-101, and ResNet-50 models is while it increases network depth, it also effectively eliminates negative outcomes. The proposed ensemble deep learning technique consists of a set CNN models which are trained on the balanced dataset with good initialization (pretrained on ImageNet). At last, the output of the ensemble techniques are combined by maximum voting policy based on the predictions of an individual technique.

As represented in Figure 5, the ensemble deep learning technique includes ResNet-152, ResNet-101, and ResNet-50. The proposed ensemble deep learning technique consists of three key phases: CNN techniques with good initial parameters, fine tuning of network parameters, and averaging models.

The residual networks (ResNets) are easy to optimize with limited training error, and it also gains higher classification accuracy from large datasets like the BIT Vehicle Dataset and MIO-TCD. The training error of ResNet-152, ResNet-101, and ResNet-50 on MIO-TCD is indicated in Figure 6. By increasing the number of epochs, the error percentage gradually decreases in the ResNet-152, ResNet-101, and ResNet-50 techniques. Pseudocode of the ensemble deep learning technique is given below.

Input: Size of feature space, training set, size of feature subspace, feature set, number of feature subspace, one test sample, and number of classes.
Output: Classification of vehicle types.
Process:
 For : classes
  Label the samples of class.
  Train the feature subsets using ResNet-152, ResNet-101, and ResNet-50.
End for
Calculate the value of counter
Output.

4. Experimental Results and Discussion

In this research, the proposed ensemble deep learning technique performance is simulated using MATLAB 2019a software with the following system requirements: operating system—Windows 10 (64 bit); processor—Intel core i9; hard disk—3 TB; and RAM—16 GB. In this research, the ensemble deep learning technique performance is validated by comparing with a few benchmark techniques such as the GAN-based deep ensemble technique [13], the tiny YOLO with SVM [15], the semisupervised CNN model [19], PCN [21], and the three channels of SF-CNNLS (TC-SF-CNNLS) approach [22]. The primary goal of this research study is to classify the vehicle types from the BIT Vehicle Dataset and MIO-TCD. The proposed ensemble deep learning technique performance is validated using 10-crossfold validation. Let FP be indicated as false positive, FN be denoted as false negative, TP be stated as true positive, and TN be represented as true negative. Five performance measures are used to analyze the performance of the proposed ensemble deep learning technique such as accuracy, precision, recall, FDR, and FOR [34]. The mathematical expressions of accuracy, precision, recall, FDR, and FOR are represented in

4.1. Quantitative Analysis on BIT Vehicle Dataset

Here, the proposed ensemble deep learning technique performance is investigated using the BIT Vehicle Dataset, which consists of six vehicle types such as sedan, microbus, SUV, minivan, bus, and truck. In this scenario, performance analysis is carried out by different classifiers such as the Long Short-Term Memory (LSTM) network, Multisupport Vector Machine (MSVM), -Nearest Neighbor (KNN), Deep Neural Network (DNN), and the ensemble deep learning technique with individual and hybrid feature extraction. By inspecting Table 2, the combination ensemble deep learning technique with the hybrid feature extraction achieved significant performance in vehicle type classification compared to other combinations by means of precision, recall, and accuracy. In Table 2, the proposed ensemble deep learning technique achieved a maximum recall of 99.72%, a precision of 98.24%, and an accuracy of 99.28% on the BIT Vehicle Dataset. The graphical comparison of the proposed ensemble deep learning technique on the BIT Vehicle Dataset in terms of precision, recall, and accuracy is denoted in Figure 7.

Similarly, in Table 3, the proposed ensemble deep learning technique performance is validated in terms of FDR and FOR on the BIT Vehicle Dataset. By inspecting Table 3, the combination of the ensemble deep learning technique with hybrid feature extraction achieved a minimum FDR of 3.92% and an FOR of 1.90% which are effective compared to other combinations in vehicle type classification. In the BIT Vehicle Dataset, 7,880 vehicle images are utilized for training, and 1,970 vehicle images are utilized for testing. The graphical comparison of the proposed ensemble deep learning technique on the BIT Vehicle Dataset in terms of FDR and FOR is represented in Figure 8. In addition to this, the running time of the proposed ensemble deep learning technique on the BIT Vehicle Dataset is 1.6 seconds per frame.

4.2. Quantitative Analysis on MIO-TCD

Here, MIO-TCD is used to validate the efficiency of the proposed ensemble deep learning technique in terms of precision, recall, accuracy, FDR, and FOR. MIO-TCD includes 648,959 images with 11 vehicle classes like single-unit truck, pickup truck, nonmotorized vehicle, car, pedestrian, articulated truck, background, motorcycle, bicycle, work van, and bus. In this scenario, 80% of the images are used for training, and 20% of the vehicle images are used for testing. By investigating Table 4, the combination of ensemble deep learning technique with hybrid feature extraction achieved a maximum precision of 99.12%, a recall value of 99.69%, and an accuracy of 99.13% on MIO-TCD. In this article, hybrid feature extraction significantly detects the statistical interactions and extracts the active feature vectors from the vehicle images. The graphical comparison of the proposed ensemble deep learning technique on MIO-TCD by means of precision, recall, and accuracy is denoted in Figure 9.

In Table 5, the proposed ensemble deep learning technique achieved a minimum FDR value of 0.44 and an FOR value of 0.32 compared to other combinations on MIO-TCD. In this study, the ensemble deep learning technique effectively maximizes the percentage of correct predictions that reduces the misclassification in dominant and minority vehicle categories. Graphical comparison of the ensemble deep learning technique on MIO-TCD by means of FDR and FOR is stated in Figure 10. Similarly, the running time of the proposed ensemble deep learning technique on MIO-TCD is 1.44 seconds per frame.

4.3. Comparative Analysis

The comparative analysis between the proposed and existing techniques are given in Table 6. Liu et al. [13] introduced a deep learning technique, namely, GANs, for classifying vehicles in traffic surveillance videos. Extensive experiments showed that the developed GANs achieved 96.41% precision on MIO-TCD. Additionally, Şentaş et al. [15] utilized the Tiny YOLO with the SVM classification technique for vehicle detection and classification. The simulation outcome showed that the developed model obtained 97.9% precision and 99.6% recall on the BIT Vehicle Dataset in vehicle type classification. Dong et al. [19] developed a novel semisupervised CNN model for vehicle type classification. The semisupervised CNN model used a sparse Laplacian filter to extract rich and discriminative features of the vehicles. The features learned by the CNN model were discriminative which works effectively in complex scenes. In the experimental phase, the developed semisupervised CNN model achieved 88.11% accuracy on the BIT Vehicle Dataset.

Soon et al. [21] developed a semisupervised model, namely, PCN for vehicle type classification. The developed PCN model utilized convolutional filters to extract hierarchical and discriminative features of the vehicles for better classification. The simulation results showed that the developed PCN model with the softmax classifier achieved 88.52% classification accuracy, and the PCN model with the SVM classifier achieved 88.39% accuracy on the BIT Vehicle Dataset. Additionally, Awang et al. [22] developed the TC-SF-CNNLS approach for vehicle type classification. In the experimental phase, the developed approach performance was validated on the BIT Vehicle Dataset in terms of recall, precision, and accuracy. The developed TC-SF-CNNLS approach achieved 93.8% accuracy by classifying the vehicle types like truck, minivan, bus, passenger, taxi, car, and SUV.

4.4. Discussion

As previously discussed, feature extraction and classification are the integral parts of vehicle type classification. In this research study, hybrid (SPT + WLD) descriptors are used to extract active feature vectors from the vehicle images that speed up the training process, reduce overfitting risk, and improve the data visualization ability. Hence, the effect of hybrid feature extraction in vehicle type classification is given in Tables 2, 3, 4, and 5. Additionally, a new ensemble deep learning technique is proposed in this research paper for learning the original dataset in order to classify unknown data. In most of the existing research works, an individual classifier causes bias in terms of a fixed set of parameters, where such bias is reduced by developing an ensemble classifier. In contrast, the performance of the ensemble classifier completely depends on the accuracy of the constituent classifiers, which has stronger generalization ability than the individual classifiers.

5. Conclusion

In this article, an ensemble deep learning technique is proposed for vehicle type classification which was primarily used for traffic surveillance systems. Nowadays, video surveillance has been utilised for additional reasons across the world during the COVID-19 pandemic. Our application uses a deep learning approach that consists of two major phases in vehicle type classification such as feature extraction and classification. In this research, hybrid (SPT + WLD) feature descriptors are applied to extract active feature vectors that reduce training time, improve classification accuracy, and diminish overfitting problems in the ensemble deep learning technique. In this study, the ensemble deep learning technique classifies 11 classes in MIO-TCD and 6 classes in the BIT Vehicle Dataset. In Experimental Results and Discussion, the ensemble deep learning technique achieved better performance in vehicle type classification compared to other classification techniques in terms of precision, recall, accuracy, FDR, and FOR. Compared to the existing benchmark techniques like the GAN-based deep ensemble technique, the Tiny YOLO with SVM, the semisupervised CNN model, the TC-SF-CNNLS, and the PCN with a softmax classifier, the proposed technique showed a maximum of 11.17% improvement in vehicle type classification by means of classification accuracy. In future work, a clustering-based segmentation algorithm is included in the proposed technique for improving vehicle type detection and classification. In addition to this, three-dimensional modelling, vehicle tracking, and occlusion handling are given emphasis for an effective intelligent transportation system.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This publication was created thanks to the support from the Operational Program Integrated Infrastructure for the Project: Identification and possibilities of implementation of new technological measures in transport to achieve safe mobility during a pandemic caused by COVID-19 (ITMS code: 313011AUX5), cofinanced by the European Regional Development Fund.