Abstract

Many of the similarity-based virtual screening approaches assume that molecular fragments that are not related to the biological activity carry the same weight as the important ones. This was the reason that led to the use of Bayesian networks as an alternative to existing tools for similarity-based virtual screening. In our recent work, the retrieval performance of the Bayesian inference network (BIN) was observed to improve significantly when molecular fragments were reweighted using the relevance feedback information. In this paper, a set of active reference structures were used to reweight the fragments in the reference structure. In this approach, higher weights were assigned to those fragments that occur more frequently in the set of active reference structures while others were penalized. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity.

1. Introduction

Virtual screening refers to the use of a computer-based method to process compounds from a library or database of compounds in order to identify and select ones that are likely to possess a desired biological activity, such as the ability to inhibit the action of a particular therapeutic target. The selection of molecules with a virtual screening algorithm should yield a higher proportion of active compounds, as assessed by experiment, relative to a random selection of the same number of molecules [1].

Over recent decades, drug discovery companies have used combinatorial chemistry approaches to create large and diverse libraries of structures; therefore large arrays of compounds are formed by combining sets of different types of reagents, called building blocks, in a systematic and repetitive way. These libraries can be used as a source of new potential drugs, since the compounds in the libraries can be randomly tested or screened to find good drug compounds. Increasing the capabilities of testing compounds using chemoinformatic technologies such as high-throughput screening (HTS) enables hundreds of thousands of these compounds to be tested in a short time. Computers can be used to aid this process in a number of ways; for example, in the creation of virtual combinatorial libraries which can be much larger than their real counterparts. There are two methods for screening those libraries, looking into active sites of interest and looking for similarities to a known active compound. Recently, searching chemical databases has been done using computers instead of experiment, and this is known as the virtual screening technique [29].

Chemical information systems offer three principal types of searching facility. Early systems provided two types of retrieval mechanisms: structure searching and substructure searching. These mechanisms were later complemented by another access mechanism: similarity searching. There are many studies in the literature associated with the measurement of molecular similarity [1013]. However, the most common approaches are based on 2D fingerprints, with the similarity between a reference structure and a database structure computed using association coefficients such as the Tanimoto coefficient [1, 14].

Several methods have been used to further optimise the measures of similarity between molecules, including weighting, standardization, and data fusion [1518].

The Bayesian inference network (BIN) was originally developed for text document retrieval systems [19]. Many studies in information retrieval (IR) have shown that the retrieval effectiveness of BIN can be improved by fragment reweighting. Fragments reweighting is one of the most useful query modification techniques in IR systems [2022]. In our previous works, the retrieval performance of Bayesian inference network was observed to improve significantly when relevance feedback and turbo search screening were used [23].

In this paper, we enhanced the screening effectiveness of BIN using a weighting factor. In this approach, weighting factors are calculated for each fragment of the multireference input query based on the frequency of their occurrence in the set of references’ input. This weighting factor is later used to calculate a new weight for each fragment of the reference structure.

2. Material and Methods

This study has compared the retrieval results obtained using three different similarity-based screening models. The first screening system was based on the tanimoto (TAN) coefficient, which has been used in ligand-based virtual screening for many years and is now considered a reference standard. The second model was based on a basic BIN [24] using the Okapi (OKA) weight, which was found to perform the best in their experiments and which we shall refer to as the conventional BIN model. The third model, our proposed model, is a BIN based on reweighted fragments, which we shall refer to as the BINRF model. In what follows, we give a brief description of each of these three models.

2.1. Tanimoto-Based Similarity Model

This model used the continuous form of the tanimoto coefficient, which is applicable to nonbinary data of fingerprint. 𝑆𝐾,𝐿 is the similarity between objects or molecules 𝐾 and 𝐿, which, using tanimoto, is given by (1):S𝑘𝐿=𝑀𝑗=1𝑤𝑗𝑘𝑤𝑗𝑙𝑀𝑗=1𝑤𝑗𝑘2+𝑀𝑗=1𝑤𝑗𝑙2𝑀𝑗=1𝑤𝑘𝑤𝑗𝑙.(1)

For molecules described by continuous variables, the molecular space is defined by an 𝑀×𝑁 matrix, where entry 𝑤𝑗𝑖 is the value of the 𝑗th fragments (1𝑗𝑀) in the 𝑖th molecule (1𝑖𝑁). The origins of this coefficient can be found in a review paper by Ellis et al. [25].

2.2. Conventional BIN Model

The conventional BIN model, as shown in Figure 1, is used in molecular similarity searching. It consists of three types of nodes: compound nodes as roots, fragment nodes, and a reference structure node as leaf. The roots of the network are the nodes without parent nodes and the leaves are the nodes without child nodes. Each compound node represents an actual compound in the collection and has one or more fragment nodes as children. Each fragment node has one or more compound nodes as parents and one reference structure node as a child (or more where multiple references are used). Each network node is a binary value, taking one of the two values from the set {true,false}. The probability that the reference structure is satisfied given a particular compound is obtained by computing the probabilities associated with each fragment node connected to the reference structure node. This process is repeated for all the compounds in the database.

The resulting probability scores are used to rank the database in response to a bioactive reference structure in the order of decreasing probability of similar bioactivity to the reference structure.

To estimate the probability associating each compound to the reference structure, the probability for the fragment and reference nodes must be computed. One particular belief function, called OKA, has been found to have the most effective recall [24]. This function was used to compute the probabilities for the fragment nodes and is given by (2):belOKA𝑓𝑖=𝛼+(1𝛼)×𝑓𝑓𝑖𝑗𝑓𝑓𝑖𝑗||𝑐+0.5+1.5×𝑗||/||cavg||×(log𝑚+0.5)/𝑐𝑓𝑖,log(𝑚+1.0)(2) where 𝛼 = Constant; and experiments using the BIN show that the best value is 0.4 [26, 27], 𝑓𝑓𝑖𝑗 = frequency of the 𝑖th fragment within the 𝑗th compound reference structure, 𝑐𝑓𝑖 = number of compounds containing the 𝑖th fragment, |𝑐𝑗|= the size (in terms of number of fragments) of the 𝑗th compound, |𝐶avg| = the average size of all the compounds in the database, and 𝑚 = the total number of compounds.

To produce a ranking of the compounds in the collection with respect to a given reference structure, a belief function from In Query, the SUM operator, was used. If 𝑝1,𝑝2,,𝑝𝑛 represent the belief in the fragment nodes (parent nodes of 𝑟), then the belief at 𝑟 is given by (3):belsum(𝑟)=𝑛𝑖=1𝑝𝑖𝑛,(3) where 𝑛 = the number of the unique fragments assigned to reference structure 𝑟.

2.3. BINRF Model

The difference between the two models (BIN and BINRF) arises from the differences in the type of belief function used to produce the ranking of the compounds in the collection. In the conventional BIN model, the probability of the reference node is computed by summing the probabilities in the fragment nodes connected to the reference node. The fragment nodes participating in the final probability are scored equally (meaning that no extra weight given to any fragment node). This calculation is conducted using the SUM operator, as described above.

In the BINRF model, the reweighting factor is used to assign a new weight to the fragment. In order to produce this factor, it is necessary to start by analysing the occurrence of each fragment in the set of input references. The reweighing factor 𝑟𝑤𝑓𝑖is calculated using (4):𝑟𝑤𝑓𝑖=𝐹𝑓𝑖max𝐹,(4) where 𝐹𝑓𝑖 is the frequency of 𝑖th fragment in the set of references’ input and max𝐹 is the maximum fragment frequency in the set of references’ input.

New weights are then assigned to the fragments based on this factor, the new weight, 𝑛𝑤𝑖,of the 𝑖th fragment, is given by (5):𝑛𝑤𝑖=𝑤𝑖+𝑟𝑤𝑓𝑖,(5) where 𝑤𝑖 is the original frequency of the 𝑖th fragment in the reference input.

Consequently, the use of (4) and (5) to assign the new weights shows that higher weights will be assigned to those that occur more frequently in the set of references’ input structures.

2.4. Experimental Design

The searches were carried out on the MDL Drug Data Report (MDDR) database. The 102,516 molecules in the MDDR database were converted to Pipeline Pilot ECFC_4 fingerprints and folded to give 1024-element fingerprints [28].

For the screening experiments, three data sets (DS1–DS3) [29] were chosen from the MDDR database. Dataset DS1 contains 11 MDDR activity classes, with some of the classes involving actives that are structurally homogeneous and others involving actives that are structurally heterogeneous (structurally diverse). The DS2 dataset contains 10 homogeneous MDDR activity classes and the DS3 dataset contains 10 heterogeneous MDDR activity classes. Full details of these datasets are given in Tables 13. Each row in the tables contains an activity class, the number of molecules belonging to the class, and the class’s diversity, which was computed as the mean pair-wise Tanimoto similarity calculated across all pairs of molecules in the class using ECFP6. The pair-wise similarity calculations for all datasets were performed using Pipeline Pilot software [28].

For each dataset (DS1–DS3), the screening experiments were conducted with 10 reference structures selected randomly from each activity class and the similarity measure used to obtain an activity score for all of its compounds. These activity scores were then sorted in descending order with the recall of the active compounds, meaning the percentage of the desired activity class compounds that are retrieved in the top 1% and 5% of the resultant sorted activity scores, providing a measure of the performance of our similarity method.

3. Results and Discussion

Our goal was to identify different retrieval effectiveness of using different search approaches. In this study, we tested the TAN, BIN, and BINRF models against the MDDR database using three different datasets (DS1–DS3). The results of the searches of DS1–DS3 are presented in Tables 4-6, respectively, using cutoffs at both 1% and 5%.

In these tables, the first column from the left contains the results for the TAN, the second column contains the corresponding results when BIN is used, and the last column of each table contains the corresponding results when BINRF is used.

Each row in the tables lists the recall for the top 1% and 5% of a sorted ranking when averaged over the ten searches for each activity class; and the penultimate row in each table corresponds to the mean value for that similarity method when averaged over all of the activity classes for a dataset. The similarity method with the best recall rate in each row is strongly (**), and the recall value is boldfaced; any similarity method with an average recall within 1% and 5% of the value for the best similarity method is shown lightly (*). The bottom row in a table corresponds to the total number of (* and **) cells for each similarity method across the full set of activity classes.

Visual inspection of the recall values in Tables 46 enables comparisons to be made between the effectiveness of the various search models. However, a more quantitative approach is possible using the Kendall 𝑊 test of concordance [30].

This test shows whether a set of judges make comparable judgments about the ranking of a set of objects; here, the activity classes were considered the judges and the recall rates of the various search models the objects. The outputs of this test are the value of the Kendall coefficient and the associated significance level, which indicates whether this value of the coefficient could have occurred by chance. If the value is significant (for which we used cutoff values of both 0.01 and 0.05), then it is possible to give an overall ranking of the objects that have been ranked. The results of the Kendall analyses (for DS1–DS3) are reported in Table 7 and describe the top 1% and top 5% rankings for the various weighting functions. In Table 7, the columns show the dataset type, the recall percentage, the value of the coefficient, the associated probability, and the ranking of the methods.

Some of the activity classes, such as low-diversity activity classes, may contribute disproportionally to the overall value of mean recall. Therefore, using the mean recall value as the evaluation criterion could be impartial in some methods but not in others. To avoid this bias, the effective performances of the different methods have been further investigated based on the total number of (* and **) cells for each method across the full set of activity classes, as shown in the bottom rows of Tables 46. These (* and **) cell results are also listed in Table 8 (the results shown in the bottom rows of Tables 46 form the lower part of the results in Table 8).

Inspection of the DS1 search in Table 4 shows that BINRF produced the highest mean values when compared to the BIN and TAN. In addition, according to the total number of (* and **) cells in Table 4, BINRF is the best performing search across the 11 activity classes in terms of mean recall. Table 7 shows that the value of the Kendall coefficient for DS1 top 1% and 5%, 0.752, is significant at the 0.01 and 0.05 levels of statistical significance. Given that the result is significant, we can conclude that the overall ranking of the different procedures are BINRF > BIN > TAN and BINRF > TAN > BIN for the DS1 top 1% and 5%, respectively.

The good performance of the BINRF method is not restricted to DS1 since it also gives the best results for the top 1% and 5% for DS2 and DS3.

The DS3 searches are of particular interest since they involve the most heterogeneous activity classes in the three datasets used, and thus provide a tough test of the effectiveness of a screening method. Hert et al. [29] found that TSS (group fusion) was not preferred to the conventional similarity search for DS3 activity classes. However, when BINRF is used on this dataset, Tables 6 and 7 show that it gives the best performance of all the methods for this dataset at both cutoffs.

Visual inspection of the results in Tables 48 shows very clearly that reweighting reference fragments can significantly increase the effectiveness of the BIN method and the results are presented for the original search using TAN, BIN, and BINRF. A very surprising pattern of behaviour is observed in the DS3 results presented in Table 6 as the degree of enhancement in this more challenging screening task is remarkable.

In conclusion, we have introduced a new technique for utilising the effectiveness of retrieval when applying a BIN for ligand-based virtual screening. Simulated virtual screening experiments with MDDR datasets showed that the proposed techniques described here provide simple ways of enhancing the cost effectiveness of ligand-based virtual screening in chemical databases.

4. Conclusion

In this paper, we further investigated the impact of reweighting fragments on the Bayesian inference network performance for ligand-based virtual screening. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity. This finding is in line with our previous study, in which the relevance feedback information was used to reweight the fragments. However, it should be pointed out that while using relevance feedback information is limited only by computational cost, using a set of reference structures implies the availability of bioactivities.

Acknowledgment

This work is supported by Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Research University Grant Category (VOT Q.J130000.7128.00H72).