Research Article

Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach

Table 6

The summary of the incident reports and their label distribution in the training set before and after data oversampling, as well as validation and test sets.

ā€‰OriginalTrain (oversampled)ValidationTest

Human factors (HF)87356 (62.8%)87356 (25.4%)10941 (64.0%)16145 (63.4%)
Aircraft (AC)32690 (23.5%)65380 (19.0%)3823 (22.4%)6620 (26.0%)
Company policy (CP)5335 (3.8%)53350 (15.5%)635 (3.7%)1047 (4.1%)
Procedure (PR)5321 (3.8%)53210 (15.4%)645 (3.7%)1004 (4.0%)
Weather (WE)4979 (3.6%)49790 (14.5%)623 (3.7%)952 (3.7%)
Airport (AP)3424 (2.5%)34240 (10.0%)428 (2.4%)643 (2.5%)
Total139105 (100%)343326 (100%)17095 (100%)25451 (100%)

Validation and test data are maintained as imbalanced as the original training set to truly represent the data sample distribution.