Research Article

[Retracted] Software Systems Security Vulnerabilities Management by Exploring the Capabilities of Language Models Using NLP

Algorithm 5

BERT for tokenization and feature creation.
Input: security- and nonsecurity-related text with labeling process:
(1)Tokenization with BERT: tokenizer = 
 transformers.BertTokenizer
 from_pretrained (“bert-base-uncased”)
(2)Data preparation for the need of BERT, including lower casing of text, tokenizing, word split to word pieces, word to index matching with vocabulary file of BERT, add special tokens, and adding mask and segment tokens to each input
(3)Model architecture building with TF:
 model.compile (optimizer = tf.optimizers
 Adam (learning_rate = 2e − 5, epsilon = 1e − 08), loss = “binary_crossentropy,” metrics = [“accuracy”])
(4)Maximum sequence length set to 250
(5)TF Keras layers are built for model compilation
(6)Text converted to BERT input features:
 create_bert_input_features (tokenizer,
 train_text, max_seq_length)
(7)Data set split to 50% for training, 10% for validation, and 40% for testing
(8)Create function for BERT input features creation
(9)Feature IDs, feature masks, and feature segments are created for training and validation
(10)Model is trained and validated
(11)Test review data are converted in to BERT input features
(12)Model performance is evaluated with test data: from sklearn.metrics import
 confusion_matrix,
 classification_report, accuracy_score
Output:
Accuracy: 91.39%
Precision: 0.92
Recall: 0.91
F1-score: 0.88