Research Article

Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization

Table 6

The number of vulnerable functions and nonvulnerable functions when fine tuning the parameters of CodeBERT. In the training set and verification set, aiming to make vulnerable functions account for 1/10 of the total number of functions, we added synthetic data to the original dataset.

DatasetNo of vul. Functions (real-world | SARD)No. of total functions

Training set8675 (1189 | 7486)86759
Validation set2891 (395 | 2495)28919
Test set399 (399 | 0)26425