Ligand-Based Virtual Screening Using Bayesian Inference Network and Reweighted Fragments

Ahmed, Ali; Abdo, Ammar; Salim, Naomie

doi:https://doi.org/10.1100/2012/410914

The Scientific World Journal

On this page

Abstract Introduction Methods Results and Discussion Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2012 | Article ID 410914 | https://doi.org/10.1100/2012/410914

Ligand-Based Virtual Screening Using Bayesian Inference Network and Reweighted Fragments

Ali Ahmed,^1,2Ammar Abdo,^1,3and Naomie Salim¹

Academic Editor: G. D. Morse, M. A. Fischl

Received28 Oct 2011

Accepted11 Dec 2011

Published01 May 2012

Abstract

Many of the similarity-based virtual screening approaches assume that molecular fragments that are not related to the biological activity carry the same weight as the important ones. This was the reason that led to the use of Bayesian networks as an alternative to existing tools for similarity-based virtual screening. In our recent work, the retrieval performance of the Bayesian inference network (BIN) was observed to improve significantly when molecular fragments were reweighted using the relevance feedback information. In this paper, a set of active reference structures were used to reweight the fragments in the reference structure. In this approach, higher weights were assigned to those fragments that occur more frequently in the set of active reference structures while others were penalized. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity.

1. Introduction

Virtual screening refers to the use of a computer-based method to process compounds from a library or database of compounds in order to identify and select ones that are likely to possess a desired biological activity, such as the ability to inhibit the action of a particular therapeutic target. The selection of molecules with a virtual screening algorithm should yield a higher proportion of active compounds, as assessed by experiment, relative to a random selection of the same number of molecules [1].

Over recent decades, drug discovery companies have used combinatorial chemistry approaches to create large and diverse libraries of structures; therefore large arrays of compounds are formed by combining sets of different types of reagents, called building blocks, in a systematic and repetitive way. These libraries can be used as a source of new potential drugs, since the compounds in the libraries can be randomly tested or screened to find good drug compounds. Increasing the capabilities of testing compounds using chemoinformatic technologies such as high-throughput screening (HTS) enables hundreds of thousands of these compounds to be tested in a short time. Computers can be used to aid this process in a number of ways; for example, in the creation of virtual combinatorial libraries which can be much larger than their real counterparts. There are two methods for screening those libraries, looking into active sites of interest and looking for similarities to a known active compound. Recently, searching chemical databases has been done using computers instead of experiment, and this is known as the virtual screening technique [2–9].

Chemical information systems offer three principal types of searching facility. Early systems provided two types of retrieval mechanisms: structure searching and substructure searching. These mechanisms were later complemented by another access mechanism: similarity searching. There are many studies in the literature associated with the measurement of molecular similarity [10–13]. However, the most common approaches are based on 2D fingerprints, with the similarity between a reference structure and a database structure computed using association coefficients such as the Tanimoto coefficient [1, 14].

Several methods have been used to further optimise the measures of similarity between molecules, including weighting, standardization, and data fusion [15–18].

The Bayesian inference network (BIN) was originally developed for text document retrieval systems [19]. Many studies in information retrieval (IR) have shown that the retrieval effectiveness of BIN can be improved by fragment reweighting. Fragments reweighting is one of the most useful query modification techniques in IR systems [20–22]. In our previous works, the retrieval performance of Bayesian inference network was observed to improve significantly when relevance feedback and turbo search screening were used [23].

In this paper, we enhanced the screening effectiveness of BIN using a weighting factor. In this approach, weighting factors are calculated for each fragment of the multireference input query based on the frequency of their occurrence in the set of references’ input. This weighting factor is later used to calculate a new weight for each fragment of the reference structure.

2. Material and Methods

This study has compared the retrieval results obtained using three different similarity-based screening models. The first screening system was based on the tanimoto (TAN) coefficient, which has been used in ligand-based virtual screening for many years and is now considered a reference standard. The second model was based on a basic BIN [24] using the Okapi (OKA) weight, which was found to perform the best in their experiments and which we shall refer to as the conventional BIN model. The third model, our proposed model, is a BIN based on reweighted fragments, which we shall refer to as the BINRF model. In what follows, we give a brief description of each of these three models.

2.1. Tanimoto-Based Similarity Model

This model used the continuous form of the tanimoto coefficient, which is applicable to nonbinary data of fingerprint. is the similarity between objects or molecules and , which, using tanimoto, is given by (1):

For molecules described by continuous variables, the molecular space is defined by an matrix, where entry is the value of the th fragments () in the th molecule (). The origins of this coefficient can be found in a review paper by Ellis et al. [25].

2.2. Conventional BIN Model

The conventional BIN model, as shown in Figure 1, is used in molecular similarity searching. It consists of three types of nodes: compound nodes as roots, fragment nodes, and a reference structure node as leaf. The roots of the network are the nodes without parent nodes and the leaves are the nodes without child nodes. Each compound node represents an actual compound in the collection and has one or more fragment nodes as children. Each fragment node has one or more compound nodes as parents and one reference structure node as a child (or more where multiple references are used). Each network node is a binary value, taking one of the two values from the set . The probability that the reference structure is satisfied given a particular compound is obtained by computing the probabilities associated with each fragment node connected to the reference structure node. This process is repeated for all the compounds in the database.

The resulting probability scores are used to rank the database in response to a bioactive reference structure in the order of decreasing probability of similar bioactivity to the reference structure.

To estimate the probability associating each compound to the reference structure, the probability for the fragment and reference nodes must be computed. One particular belief function, called OKA, has been found to have the most effective recall [24]. This function was used to compute the probabilities for the fragment nodes and is given by (2): where = Constant; and experiments using the BIN show that the best value is 0.4 [26, 27], = frequency of the th fragment within the th compound reference structure, = number of compounds containing the th fragment, = the size (in terms of number of fragments) of the th compound, = the average size of all the compounds in the database, and = the total number of compounds.

To produce a ranking of the compounds in the collection with respect to a given reference structure, a belief function from In Query, the SUM operator, was used. If represent the belief in the fragment nodes (parent nodes of ), then the belief at is given by (3): where = the number of the unique fragments assigned to reference structure .

2.3. BINRF Model

The difference between the two models (BIN and BINRF) arises from the differences in the type of belief function used to produce the ranking of the compounds in the collection. In the conventional BIN model, the probability of the reference node is computed by summing the probabilities in the fragment nodes connected to the reference node. The fragment nodes participating in the final probability are scored equally (meaning that no extra weight given to any fragment node). This calculation is conducted using the SUM operator, as described above.

In the BINRF model, the reweighting factor is used to assign a new weight to the fragment. In order to produce this factor, it is necessary to start by analysing the occurrence of each fragment in the set of input references. The reweighing factor is calculated using (4): where is the frequency of th fragment in the set of references’ input and is the maximum fragment frequency in the set of references’ input.

New weights are then assigned to the fragments based on this factor, the new weight, of the th fragment, is given by (5): where is the original frequency of the th fragment in the reference input.

Consequently, the use of (4) and (5) to assign the new weights shows that higher weights will be assigned to those that occur more frequently in the set of references’ input structures.

2.4. Experimental Design

The searches were carried out on the MDL Drug Data Report (MDDR) database. The 102,516 molecules in the MDDR database were converted to Pipeline Pilot ECFC_4 fingerprints and folded to give 1024-element fingerprints [28].

For the screening experiments, three data sets (DS1–DS3) [29] were chosen from the MDDR database. Dataset DS1 contains 11 MDDR activity classes, with some of the classes involving actives that are structurally homogeneous and others involving actives that are structurally heterogeneous (structurally diverse). The DS2 dataset contains 10 homogeneous MDDR activity classes and the DS3 dataset contains 10 heterogeneous MDDR activity classes. Full details of these datasets are given in Tables 1–3. Each row in the tables contains an activity class, the number of molecules belonging to the class, and the class’s diversity, which was computed as the mean pair-wise Tanimoto similarity calculated across all pairs of molecules in the class using ECFP6. The pair-wise similarity calculations for all datasets were performed using Pipeline Pilot software [28].

For each dataset (DS1–DS3), the screening experiments were conducted with 10 reference structures selected randomly from each activity class and the similarity measure used to obtain an activity score for all of its compounds. These activity scores were then sorted in descending order with the recall of the active compounds, meaning the percentage of the desired activity class compounds that are retrieved in the top 1% and 5% of the resultant sorted activity scores, providing a measure of the performance of our similarity method.

3. Results and Discussion

Our goal was to identify different retrieval effectiveness of using different search approaches. In this study, we tested the TAN, BIN, and BINRF models against the MDDR database using three different datasets (DS1–DS3). The results of the searches of DS1–DS3 are presented in Tables 4-6, respectively, using cutoffs at both 1% and 5%.

In these tables, the first column from the left contains the results for the TAN, the second column contains the corresponding results when BIN is used, and the last column of each table contains the corresponding results when BINRF is used.

Each row in the tables lists the recall for the top 1% and 5% of a sorted ranking when averaged over the ten searches for each activity class; and the penultimate row in each table corresponds to the mean value for that similarity method when averaged over all of the activity classes for a dataset. The similarity method with the best recall rate in each row is strongly (**), and the recall value is boldfaced; any similarity method with an average recall within 1% and 5% of the value for the best similarity method is shown lightly (*). The bottom row in a table corresponds to the total number of (* and **) cells for each similarity method across the full set of activity classes.

Visual inspection of the recall values in Tables 4–6 enables comparisons to be made between the effectiveness of the various search models. However, a more quantitative approach is possible using the Kendall test of concordance [30].

This test shows whether a set of judges make comparable judgments about the ranking of a set of objects; here, the activity classes were considered the judges and the recall rates of the various search models the objects. The outputs of this test are the value of the Kendall coefficient and the associated significance level, which indicates whether this value of the coefficient could have occurred by chance. If the value is significant (for which we used cutoff values of both 0.01 and 0.05), then it is possible to give an overall ranking of the objects that have been ranked. The results of the Kendall analyses (for DS1–DS3) are reported in Table 7 and describe the top 1% and top 5% rankings for the various weighting functions. In Table 7, the columns show the dataset type, the recall percentage, the value of the coefficient, the associated probability, and the ranking of the methods.

Some of the activity classes, such as low-diversity activity classes, may contribute disproportionally to the overall value of mean recall. Therefore, using the mean recall value as the evaluation criterion could be impartial in some methods but not in others. To avoid this bias, the effective performances of the different methods have been further investigated based on the total number of (* and **) cells for each method across the full set of activity classes, as shown in the bottom rows of Tables 4–6. These (* and **) cell results are also listed in Table 8 (the results shown in the bottom rows of Tables 4–6 form the lower part of the results in Table 8).

Inspection of the DS1 search in Table 4 shows that BINRF produced the highest mean values when compared to the BIN and TAN. In addition, according to the total number of (* and **) cells in Table 4, BINRF is the best performing search across the 11 activity classes in terms of mean recall. Table 7 shows that the value of the Kendall coefficient for DS1 top 1% and 5%, 0.752, is significant at the 0.01 and 0.05 levels of statistical significance. Given that the result is significant, we can conclude that the overall ranking of the different procedures are BINRF > BIN > TAN and BINRF > TAN > BIN for the DS1 top 1% and 5%, respectively.

The good performance of the BINRF method is not restricted to DS1 since it also gives the best results for the top 1% and 5% for DS2 and DS3.

The DS3 searches are of particular interest since they involve the most heterogeneous activity classes in the three datasets used, and thus provide a tough test of the effectiveness of a screening method. Hert et al. [29] found that TSS (group fusion) was not preferred to the conventional similarity search for DS3 activity classes. However, when BINRF is used on this dataset, Tables 6 and 7 show that it gives the best performance of all the methods for this dataset at both cutoffs.

Visual inspection of the results in Tables 4–8 shows very clearly that reweighting reference fragments can significantly increase the effectiveness of the BIN method and the results are presented for the original search using TAN, BIN, and BINRF. A very surprising pattern of behaviour is observed in the DS3 results presented in Table 6 as the degree of enhancement in this more challenging screening task is remarkable.

In conclusion, we have introduced a new technique for utilising the effectiveness of retrieval when applying a BIN for ligand-based virtual screening. Simulated virtual screening experiments with MDDR datasets showed that the proposed techniques described here provide simple ways of enhancing the cost effectiveness of ligand-based virtual screening in chemical databases.

4. Conclusion

In this paper, we further investigated the impact of reweighting fragments on the Bayesian inference network performance for ligand-based virtual screening. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity. This finding is in line with our previous study, in which the relevance feedback information was used to reweight the fragments. However, it should be pointed out that while using relevance feedback information is limited only by computational cost, using a set of reference structures implies the availability of bioactivities.

Acknowledgment

This work is supported by Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Research University Grant Category (VOT Q.J130000.7128.00H72).

References

M. A. Johnson and G. M. Maggiora, Concepts and Application of Molecular Similarity, John Wiley & Sons, New York, NY, USA, 1990.
P. Willett, J. M. Barnard, and G. M. Downs, “Chemical similarity searching,” Journal of Chemical Information and Computer Sciences, vol. 38, no. 6, pp. 983–996, 1998.
View at: Google Scholar
W. P. Walters, M. T. Stahl, and M. A. Murcko, “Virtual screening—an overview,” Drug Discovery Today, vol. 3, no. 4, pp. 160–178, 1998.
View at: Google Scholar
B. Waszkowycz, T. D. J. Perkins, R. A. Sykes, and J. Li, “Large-scale virtual screening for discovering leads in the postgenomic era,” IBM Systems Journal, vol. 40, no. 2, pp. 360–378, 2001.
View at: Google Scholar
M. A. Miller, “Chemical database techniques in drug discovery,” Nature Reviews Drug Discovery, vol. 1, no. 3, pp. 220–227, 2002.
View at: Google Scholar
H. Eckert and J. Bajorath, “Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches,” Drug Discovery Today, vol. 12, no. 5-6, pp. 225–233, 2007.
View at: Publisher Site | Google Scholar
R. P. Sheridan, “Chemical similarity searches: when is complexity justified?” Expert Opinion on Drug Discovery, vol. 2, no. 4, pp. 423–430, 2007.
View at: Publisher Site | Google Scholar
H. Geppert, M. Vogt, and J. Bajorath, “Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation,” Journal of Chemical Information and Modeling, vol. 50, no. 2, pp. 205–216, 2010.
View at: Publisher Site | Google Scholar
R. P. Sheridan and S. K. Kearsley, “Why do we need so many chemical similarity search methods?” Drug Discovery Today, vol. 7, no. 17, pp. 903–911, 2002.
View at: Publisher Site | Google Scholar
N. Nikolova and J. Jaworska, “Approaches to measure chemical similarity—a Review,” QSAR & Combinatorial Science, vol. 22, no. 9-10, pp. 1006–1026, 2003.
View at: Google Scholar
A. Bender and R. C. Glen, “Molecular similarity: a key technique in molecular informatics,” Organic and Biomolecular Chemistry, vol. 2, no. 22, pp. 3204–3218, 2004.
View at: Publisher Site | Google Scholar
A. G. Maldonado, J. P. Doucet, M. Petitjean, and B. T. Fan, “Molecular similarity and diversity in chemoinformatics: from theory to applications,” Molecular Diversity, vol. 10, no. 1, pp. 39–79, 2006.
View at: Publisher Site | Google Scholar
A. R. Leach and V. J. Gillet, An Introduction to Chemoinformatics, Springer, 2007.
P. Willett, “Enhancing the effectiveness of ligand-based virtual screening using data fusion,” QSAR & Combinatorial Science, vol. 25, no. 12, pp. 1143–1152, 2006.
View at: Publisher Site | Google Scholar
L. Hodes, “Clustering a large number of compounds. 1. Establishing the method on an initial sample,” Journal of Chemical Information and Computer Sciences, vol. 29, pp. 66–71, 1989.
View at: Google Scholar
P. Willett and V. Winterman, “A comparison of some measures for the determination of inter-molecular structural similarity measures of inter-molecular structural similarity,” Quantitative Structure-Activity Relationships, vol. 5, no. 1, pp. 18–25, 1986.
View at: Google Scholar
P. A. Bath, C. A. Morris, and P. Willett, “Effect of standardization on fragment‐based measures of structural similarity,” Journal of chemometrics, vol. 7, pp. 543–550, 1993.
View at: Google Scholar
J. D. Holliday, C. Y. Hu, and P. Willett, “Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings,” Combinatorial Chemistry & High Throughput Screening, vol. 5, no. 2, pp. 155–166, 2002.
View at: Google Scholar
H. Turtle and W. B. Croft, “Evaluation of an inference network-based retrieval model,” ACM Transactions on Information Systems, vol. 9, pp. 187–222, 1991.
View at: Google Scholar
D. Haines and W. B. Croft, “Relevance feedback and inference networks,” in Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '93), pp. 2–11, July 1993.
View at: Google Scholar
L. M. De Campos, J. M. Fernández-Luna, and J. F. Huete, “Implementing relevance feedback in the Bayesian network retrieval model,” Journal of the American Society for Information Science & Technology, vol. 54, no. 4, pp. 302–313, 2003.
View at: Publisher Site | Google Scholar
J. Xin and J. S. Jin, “Relevance feedback for content-based image retrieval using Bayesian network,” in Proceedings of the Pan-Sydney Area Workshop on Visual Information Processing (VIP '05), pp. 91–94, 2004.
View at: Google Scholar
A. Abdo, N. Salim, and A. Ahmed, “Implementing relevance feedback in ligand-based virtual screening using bayesian inference network,” Journal of Biomolecular Screening, vol. 16, no. 9, pp. 1081–1088, 2011.
View at: Publisher Site | Google Scholar
A. Abdo and N. Salim, “New fragment weighting scheme for the Bayesian inference network in ligand-based virtual screening,” Journal of Chemical Information and Modeling, vol. 51, no. 1, pp. 25–32, 2011.
View at: Publisher Site | Google Scholar
D. Ellis, J. F. Hines, and P. Willett, “Measuring the degree of similarity between objects in text retrieval systems,” Perspectives in Information Management, vol. 3, pp. 128–149, 1993.
View at: Google Scholar
A. Abdo and N. Salim, “Similarity-based virtual screening with a bayesian inference network,” ChemMedChem, vol. 4, no. 2, pp. 210–218, 2009.
View at: Publisher Site | Google Scholar
B. Chen, C. Mueller, and P. Willett, “Evaluation of a Bayesian inference network for ligand-based virtual screening,” Journal of Cheminformatics, vol. 1, article 5, 2009.
View at: Google Scholar
Pipeline Pilot, Accelrys Software, San Diego, Calif, USA, 2008.
J. Hert, P. Willett, D. J. Wilton et al., “New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching,” Journal of Chemical Information and Modeling, vol. 46, no. 2, pp. 462–470, 2006.
View at: Publisher Site | Google Scholar
S. Siegel and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, 1988.

Copyright

Copyright © 2012 Ali Ahmed et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1428

Downloads

957

Citations