Research Article

Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization

Table 5

The vulnerable functions and nonvulnerable functions are elaborated in this table. The datasets are derived from 12 open-source projects written in C programming language and the Software Assurance Reference Dataset (SARD) project which contains artificially constructed test cases. In the real-world dataset, the vulnerable functions are labeled based on the description of CVE and NVD. The first column lists the name of the dataset, the second column lists the projects, and the last two columns list the number of vulnerable functions and nonvulnerable functions, respectively.

Data sourceDataset/collectionNo of functions used/collected
vulnerableNonvulnerable

Test cases from the SARD projectsC source code samples8371052290
Real-world open-source projectsAsterisk9417620
FFmpeg2495549
Httpd573843
ImageMagic3442361
LibPNG45577
LibTIFF123726
OpenSSL1597004
Pidgin298547
qemu14336063
samba2632819
VLC Player446013
Xen6708913
Total1983130035