The virtual screening of a database of molecules is based on explicit three-dimensional molecular superpositions. The torsional flexibility of the database molecules is taken fully into account, and an arbitrary number of conformation-dependent molecular features may be considered. A fragmentation-reassembly approach is utilized, which allows for an efficient sampling of the conformational space. A fast clique-based pattern-matching algorithm generates alignments of pairs of adjacent molecular fragments on the (rigid) query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups), it is possible to rapidly produce accurate alignments of medium-sized drug-like molecules. Examples with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) show that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC, depending on the size of the query molecule.
|
1. A computer-implemented method of identifying at least a portion of a molecule indexed in a database, the molecule having both a conformation and at least one physico-chemical property similar to those of a query molecule, the database including at least one conformation representation for each of a plurality of indexed molecules, wherein each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment, and in which for each of the fragment conformation representations, at least 3 computed features of interest of the fragment have been provided, each of said at least 3 computed features corresponding to a different, single point in 3-D physical space and representing at least one given physico-chemical property, the method comprising:
a) providing a conformation of a query molecule that includes at least 3 computed features, each of which corresponding to a different, single point in 3-D physical space and corresponds to at least one physico-chemical property of interest, wherein at least one of the computed features of the query molecule is not located at an atomic site;
b) using a computer for pattern matching said at least 3 computed features of the query molecule with corresponding computed features of fragments given by fragment conformation representations in the database to identify fragments having patterns of computed features similar to patterns of computed features in portions of the query molecule, wherein said pattern matching includes i) first matching just feature type and ii) then matching feature geometry;
c) using a computer for assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by said pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of computed features similar to patterns of computed features in the query molecule; and
d) using a computer to present, to a user, results generated by steps a), b), and c), so that the user is able to view and analyze the results.
24. A product comprising a computer-readable medium that tangibly embodies a computer executable program thereon, the program designed for identifying at least a portion of a molecule indexed in a database, the molecule having both a conformation and at least one physico-chemical property similar to those of a query molecule provided by a user of the program, the database including at least one conformation representation of each of the indexed molecules, wherein each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment, and in which for each of the fragment conformation representations, at least 3 computed features of interest of the fragment have been provided, each of said at least 3 computed features corresponding to a different, single point in 3-D physical space and representing at least one given physico-chemical property, the program having instructions that cause a computer to implement the following steps when the program is executed on the computer, in response to a user of the program providing a conformation of the query molecule that includes at least 3 computed features, each of which corresponds to a different, single point in 3-D physical space and corresponds to at least one physico-chemical property of interest:
a) pattern matching said at least 3 computed features of the query molecule with corresponding computed features of fragments given by fragment conformation representations in the database to identify fragments having patterns of computed features similar to patterns of computed features in portions of the query molecule, wherein said pattern matching includes i) first matching just feature type and ii) then matching feature geometry; and
b) assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by said pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of computed features similar to patterns of computed features in the query molecule, wherein at least one of the computed features of the query molecule is not located at an atomic site.
34. A computer-implemented method of identifying at least a portion of a molecule indexed in a database, the molecule having both a conformation and at least one physico-chemical property similar to those of a query molecule, the database including at least one conformation representation of each of the indexed molecules, wherein each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment, and in which for each of the fragment conformation representations, at least 3 computed features of interest of the fragment have been provided, each of said at least 3 computed features corresponding to a different, single point in 3-D physical space and representing at least one given physico-chemical property, the method comprising:
a) providing a conformation of a query molecule that includes at least 3 computed features, each of which corresponds to a different, single point in 3-D physical space and corresponds to at least one physico-chemical property of interest;
b) using a computer for pattern matching said at least 3 computed features of the query molecule with corresponding computed features of fragments given by fragment conformation representations in the database to identify fragments having patterns of computed features similar to patterns of computed features in portions of the query molecule, wherein said pattern matching includes i) first matching just feature type and ii) then matching feature geometry;
c) using a computer for assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by said pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of computed features similar to patterns of computed features in the query molecule,
wherein conformations corresponding to the conformation representations associated with the fragments are selected by avoiding processing two or more overlapping fragment conformations during a database query if those overlapping fragment conformations are not sufficiently distinct from each other by a predetermined measure; and
d) using a computer to present, to a user, results generated by steps a), b), and c), so that the user is able to view and analyze the results.
36. A product comprising a computer-readable medium that tangibly embodies a computer executable program thereon, the program designed for identifying at least a portion of a molecule indexed in a database, the molecule having both a conformation and at least one physico-chemical property similar to those of a query molecule provided by a user of the program, the database including at least one conformation representation of each of the indexed molecules, wherein each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment, and in which for each of the fragment conformation representations, at least 3 computed features of interest of the fragment have been provided, each of said at least 3 computed features corresponding to a different, single point in 3-D physical space and representing at least one given physico-chemical property, the program having instructions that cause a computer to implement the following steps when the program is executed on the computer, in response to a user of the program providing a conformation of the query molecule that includes at least 3 computed features, each of which corresponds to a different, single point in 3-D physical space and corresponds to at least one physico-chemical property of interest:
a) pattern matching said at least 3 computed features of the query molecule with corresponding computed features of fragments given by fragment conformation representations in the database to identify fragments having patterns of computed features similar to patterns of computed features in portions of the query molecule, wherein said pattern matching includes i) first matching just feature type and ii) then matching feature geometry; and
b) assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by said pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of computed features similar to patterns of computed features in the query molecule, wherein conformations corresponding to the conformation representations associated with the fragments are selected by avoiding processing two or more overlapping fragment conformations during a database query if those overlapping fragment conformations are not sufficiently distinct from each other by a predetermined measure.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
10. The method of
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
i) determining that distances between computed features of the query molecule and distances between corresponding computed features of a pair of adjacent fragments given by fragment conformation representations in the database are within a predetermined tolerance, and
ii) determining that pairs of computed features of the query molecule and corresponding pairs of computed features of said pair of adjacent fragments can be aligned to within a predetermined angular tolerance.
17. The method of
18. The method of
19. The method of
matching computed features are grouped into sets based on geometrical considerations, each of these sets defining a respective unique alignment of a fragment onto the query molecule, and
fragments identified by said pattern matching are identified by said respective alignments.
20. The method of
21. The method of
22. The method of
23. The method of
25. The product of
26. The product of
27. The product of
i) determining that distances between computed features of the query molecule and distances between corresponding computed features of a pair of adjacent fragments given by fragment conformation representations in the database are within a predetermined tolerance, and
ii) determining that pairs of computed features of the query molecule and corresponding pairs of computed features of said pair of adjacent fragments can be aligned to within a predetermined angular tolerance.
28. The product of
29. The product of
30. The product of
31. The product of
matching computed features are grouped into sets based on geometrical considerations, each of these sets defining a respective unique alignment of a fragment onto the query molecule, and
fragments identified by said pattern matching are identified by said respective alignments.
32. The product of
33. The product of
using the computer to present, to a user, results generated by steps a) and b).
35. The method of
37. The product of
38. The product of
using the computer to present, to a user, results generated by steps a) and b).
|
The invention relates to a method of searching a database to identify at least one compound that has both a conformation and a physico-chemical property similar to those of a query molecule.
The virtual screening of compounds in databases is an important tool in modern drug design. Traditionally, two-dimensional or pharmacophore-based methods, which are very fast but have only limited accuracy, have been used for this purpose. (See, for example, H. Matter et al., in Combinatorial Organic Chemistry, John Wiley & Sons, New York, N.Y., 1999; C. Humblet et al., Annual Reports in Medicinal Chemistry, vol. 28, Chapter VI, Topics in Drug Design and Discovery, Academic Press, London, pp. 275-284, 1993; P. Willet, J. Mol. Recognition, vol. 8, p. 290, 1995; and R. D. Brown et al., J. Chem. Inf. Comput. Sci., vol. 36, p. 572, 1996.)
Three-dimensional molecular superposition methods have been successfully utilized to determine binding geometries relative to a reference molecule. (See, for example, G. Klebe et al., J. Comput. Aided Mol. Des., vol. 8, p. 751, 1994; S. K. Kearsley, et al., J. Comput. Aided Mol. Des., vol. 8, p. 565, 1994; G. Klebe et al., J. Comput. Aided Mol. Des., vol. 13, p. 35, 1999; C. Lemmen et al., J. Comput. Aided Mol. Des., vol. 11, p. 357, 1997; C. Lemmen, J. Med. Chem., vol. 41, p. 4502, 1998; C. Lemmen et al., J. Comput. Aided Mol. Des., vol. 12, p. 491, 1998; M. D. Miller et al., J. Med. Chem., vol. 42, p. 1505, 1999; J. A. Grant et al., J. Comput. Chem., vol. 17, p. 1653, 1996; C. McMartin et al., J. Comput. Aided Mol. Des., vol. 9, p. 237; and S. Handschuh et al., J. Chem. Inf. Comput. Sci., vol. 38, p. 220, 1998.) These 3-D methods play an important role in 3D-QSAR (Quantitative Structure-Activity Relationships) applications, pharmacophore elucidation, and receptor modeling—situations in which structural data of the target protein is not available. The variety of methodologies used for molecular superposition has recently been extensively reviewed (see Lemmen et al., J. Comput. Aided Mol. Des., vol. 14, p. 215, 2000), and an application of existing superposition methods to virtual database screening has been reported (see C. Lemmen et al., Perspectives in Drug Discovery and Design, vol. 20, p. 43, 2000).
Of course, the use of molecular superposition to determine the binding capability of possible ligands has its limitations. The underlying assumption is that other ligands will have the same overall binding mode as the reference molecule. Also, the bound conformation of the reference molecule has to be known, which is generally true only if crystallographic information about the corresponding protein-ligand complex is available. Therefore, in practical applications, the reference molecule should have a non-flexible structure, or its bound conformation has to be inferred using other methods, e.g., deduced from simultaneous, flexible alignments within a set of ligands that are known to be active. (See, for example, S. K. Kearsley et al., J. Comput. Chem., vol. 11, p. 1187, 1990; R. Diamond, Protein Sci., vol. 1, p. 1279, 1992; J. Mestres et al., J. Mol. Graph., vol. 15, p. 114, 1997; D. A. Cosgrove et al., J. Comput. Aided Mol. Des., vol. 14, p. 573, 2000; P. Labute et al., J. Med. Chem., vol. 44 p. 1483, 2001; and J. E. J. Mills et al., J. Comput. Aided Mol. Des., vol. 15, p. 81, 2001.) The bound conformation of the reference molecule can also be determined from distance constraints obtained in NMR (NOE) experiments (see G. C. K. Roberts, Drug Discovery Today, vol. 5, p. 230, 2000).
A field-based similarity search system and method have been described in which there is no screening of the conformation space, leading to a high computational load (see M. C. Pitman et al., J. Comput. Aided Mol. Des., vol. 15, p. 587, 2001). In this approach, all candidate molecules must first be assembled before they are scored. U.S. Pat. Nos. 5,787,279 and 5,752,019 to Rigoutsos describe other search systems and methods in which the search is based on atom types, and pose-clustering (a particular method of clustering of coordinate systems arbitrarily located and rotated in space) is used to generate alignments (hypotheses).
There is still a need for an accurate, efficient method of performing a database search of molecules that fully accounts for the three-dimensional structure and conformational flexibility of the molecules in the database.
Highly efficient methods are described herein for searching a database for molecules having a conformation and at least one physico-chemical property similar to those of a reference or query molecule. The methods herein have been used to screen a database of about 1800 molecules within a few minutes on a low-end PC, while producing accurate 3D molecular alignments. Preferred methods rely on conformational flexibility and clique search type algorithms to generate explicit three-dimensional superpositions. These methods are faster than other methods of comparable accuracy used to produce conformations suitable for molecular superpositions. The methods herein are both fast and accurate, as opposed to field based approaches that tend to be slow and do not deliver the desired accuracy. One advantage of preferred methods herein is that a course graining approach is used to remove redundant conformations from the conformational space, thereby avoiding unnecessary computational load. Another advantage of preferred methods herein is that candidate molecules (or fragments) need not be assembled before they are scored, but can be first scored on a directed graph basis without expensive computation. The fragmentation-reassembly procedure disclosed herein is used to significantly speed up query processing.
Yet another advantage of preferred methods herein is that molecules are represented using point-like features, including features known to be important in protein-ligand binding, e.g., hydrogen bond donors, acceptors, charges and hydrophobic regions. Furthermore, more sophisticated feature schemes may be used, such as using certain molecular surface properties, since features may explicitly depend on the molecular conformation. Using point-like features is by its nature different from using field-based features, leading to substantially improved performance.
In order to enable fast query processing during database searches, as much computational work as possible is preferably done in a database pre-processing step in which physico-chemical features are calculated from molecular properties. During preprocessing, millions of conformations may be explored by uniform sampling of dihedral angles, with many of them being discarded because of internal steric clashes or because the generated conformational change is too small relative to a conformation already stored in the database.
In preferred methods herein, the virtual screening of a database of molecules is based on explicit three-dimensional molecular superpositions that take the torsional flexibility of the database molecules fully into account. Molecules are represented by an arbitrary number of point-like features that are allowed to explicitly depend on the conformation of molecular fragments. An extensive database pre-processing step is used during which features are pre-calculated and stored in a lookup table in order to minimize computer time needed for a database query. Using a conventional molecular feature definition (hydrogen bond donors and acceptors, charges, and hydrophobic groups), the methods are able to rapidly produce accurate alignments for a test set of pairs of medium-sized drug-like molecules, which are known to bind to the same receptor. Using a test database containing a diverse set of 1728 drug-like molecules from the National Cancer Institute (NCI) diversity set as well as 52 dihydrofolate reductase actives, it has been demonstrated that average query processing times of the order of 0.1 seconds per molecule (including all conformers) can be achieved on a PC. The preferred algorithm is naturally parallelizable (i.e., all database molecules are processed independently of each other), suggesting that methods herein can be scaled to run on a farm of 50 or more PCs, permitting a database of several hundred thousand molecules to be searched in a few minutes.
One aspect of the invention is a method of identifying at least a portion of a molecule indexed in a database, with the database preferably being populated as part of a pre-processing procedure. The molecule has both a conformation and at least one physico-chemical property similar to those of a query molecule provided by a user of the method. The database includes at least one conformation representation of each of the indexed molecules, in which each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment. For each of the fragment conformation representations, at least 3 features of interest of the fragment are provided, and each of said at least 3 features represent at least one given physico-chemical property and is associated with a respective point in 3-D space. The method includes providing a conformation of a query molecule that includes at least 3 features, each of which corresponds to at least one physico-chemical property of interest and is associated with a respective point in 3-D space. The method further includes pattern matching features of the query molecule with corresponding features of fragments given by fragment conformation representations in the database to identify fragments having patterns of features similar to patterns of features in portions of the query molecule, in which the pattern matching includes i) first matching just feature type and ii) the matching feature geometry. The method also includes assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by the pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of features similar to patterns of features in the query molecule.
In a preferred method, this three-dimensional molecular structure corresponds to at least one entire indexed molecule. The method may further comprise, for each one of the overlapping fragments, calculating a set of conformations by sampling torsional angles corresponding to rotatable bonds of said one of the overlapping fragments. The method may further comprise using a graph-based algorithm for at least one of the pattern matching and the assembling. In a preferred method, the fragment conformations include conformations corresponding to fragments that have at least 20 atoms. In addition, the overlapping fragments may be constructed from pairs of adjacent non-overlapping fragments separated by one rotatable bond. The assembling may be based on the detection of directed paths in a directed graph having vertices representing the adjacent non-overlapping fragments, and may include scoring of the three-dimensional molecular structures without using their three-dimensional coordinates. In a preferred method, the conformations corresponding to the associated conformation representations are selected by avoiding processing two or more overlapping fragment conformations during a database query if those overlapping fragment conformations are not sufficiently distinct from each other by a predetermined measure. In preferred methods, the pattern matching includes identifying feature pairs, in which each feature pair includes a feature in the query molecule and a corresponding feature in a database molecule. Also, the pattern matching is preferably based on clique-detection in a distance-compatibility graph and does not require knowledge of local coordinate systems. The features of the fragments may represent at least one of the following physico-chemical properties: hydrogen bond donors, hydrogen bond acceptors, hydrophobic regions, aromatic rings, acid groups, and base groups; at least some of the features of the query molecule may be located somewhere other than at atomic sites. The method may include calculating said at least 3 features of the query molecule, calculating the features of the fragments of the indexed molecules, and partitioning the indexed molecules by an automated procedure using an algorithm that minimizes the number of conformations to be stored in the database.
A preferred embodiment of the invention is a product comprising a computer-readable medium that tangibly embodies a computer executable program thereon. The program is designed for identifying at least a portion of a molecule indexed in a database, in which the molecule has both a conformation and at least one physico-chemical property similar to those of a query molecule provided by a user of the program. The database includes at least one conformation representation of each of the indexed molecules, in which each of the indexed molecules is partitioned into a set of overlapping fragments, and in which at least one conformation representation is associated with each fragment, and in which for each of the fragment conformation representations, at least 3 features of interest of the fragment have been provided, each of said at least 3 features representing at least one given physico-chemical property and being associated with a respective point in 3-D space. In response to a user of the program providing a conformation of the query molecule that includes at least 3 features, each of which corresponds to at least one physico-chemical property of interest and is associated with a respective point in 3-D space, the program has instructions for pattern matching features of the query molecule with corresponding features of fragments given by fragment conformation representations in the database to identify fragments having patterns of features similar to patterns of features in portions of the query molecule, in which the pattern matching includes i) first matching just feature type and ii) then matching feature geometry. The program further includes instructions for assembling geometrically compatible overlapping fragment conformations corresponding to fragments identified by the pattern matching into at least one three-dimensional molecular structure that corresponds to at least a portion of at least one of the indexed molecules, in order to identify at least a portion of a molecule in the database having patterns of features similar to patterns of features in the query molecule.
A preferred similarity search algorithm is described followed by a discussion of working examples designed to test the capabilities of preferred methods with regard to computational performance and the accuracy of molecular alignments. The working examples presented in section 2.2 are mutual alignments of medium-sized molecules taken from the FlexS™ benchmark test set (see C. Lemmen et al. on the Internet website identified by the concatenation of “http://” and “www.biosolveit.de”, and C. Lemmen, J. Med. Chem., vol. 41, p. 4502, 1998.) In section 2.3, methods are presented in which molecules similar to a given query molecule are found within a large, diverse set of 1780 molecules. For this purpose, a database was prepared that contained a diverse set of 1728 compounds as well as a smaller set of 52 molecules that are active towards the same receptor as a selected query molecule, which was included in the smaller set.
1. Algorithm
The preferred algorithm for determining whether two molecules are similar entails representing a molecule by a set of features located at certain points in space. A feature of a molecule (or a portion of a molecule) is one or more properties associated with a single point in 3-dimensional space that may include directional or orientational information related to that property or properties. This point is not necessarily coincident with an atomic site, and may, for example, lie on the surface of the molecule. There are various rotationally invariant types of features corresponding to different local physico-chemical properties of the molecule. These can be atom-based entities like hydrogen bond donors and acceptors or charges, as well as local properties, which are not necessarily related to atoms or functional groups. For instance, a feature may describe local geometry or quantities that have been derived from a continuous field.
Specifically, a molecule D (or more generally, a portion of the molecule) is said to be similar to given conformation of a query molecule Q (which is treated as rigid) if a conformation of D exists that can be aligned with Q such that a certain number of features on Q and D match with respect to feature type and location within a certain tolerance. The number of matching features (subsequently referred to as “votes”) is determined along with a Carbo score. (See, for example, R. Carbo et al., Int. J. Quant. Chem., vol. 17, p. 1185, 1980). The Carbo score characterizes the overall molecular overlap and is used to construct a final similarity score (see section 2.1) that measures the similarity between Q and D. The goal is to find those molecules D in a database that can be aligned to a given conformation of a query molecule Q such that the similarity score is larger than a given value.
A preferred similarity search procedure is illustrated in
When querying the database, entries are retrieved from the lookup table 50 for each feature 60 associated with the given conformation of the query molecule Q (see
The size of the molecular fragments is determined by a trade-off between the total number of conformations that must be processed during a database query and the selectivity of single pattern matches. Fragment pairs that are too large will generally exhibit too many internal conformations. On the other hand, fragment pairs that are too small will have only a few feature points and therefore produce many unspecific placements on the query molecule, so that a meaningful assembly of complete molecules is not possible. Molecules are typically divided into 1 to 3 fragment pairs, each of which contains between 20 and 40 atoms and about 10 feature points. The method does not require any special base fragment from which an incremental assembly procedure is started.
Various steps of the preferred similarity search algorithm are now described in greater detail.
1.1 Fragmentation
Fragmentation of a molecule is preferably an automated process that results in the identification of a set of rotatable bonds where the molecule is cut. This set of rotatable bonds is determined such that the total number of fragment pair conformations is minimized given the minimum fragment pair size and the sampling resolution Δφ of the dihedral angles. The algorithm used for automatic partitioning of a molecule is discussed with reference to the molecule 70 shown in
First, all those bonds 74 that are rotatable are determined. A distinction is made between those bonds that are expected to be more or less freely rotatable (e.g., C—C single bonds between carbon atoms) and those expected to exist in two (cis/trans) conformations corresponding to Δφ=0° and 180° (e.g., C═C double bonds). The criteria used to determine whether a particular bond is a) considered to be rotatable or b) may exist in two different states (“cis/trans”) are listed in
Bonds which connect to a terminal group that is nearly rotationally symmetric (CH3, CX3 (X═F, Cl), SO3−, PO3−) are considered here to be non-rotatable. Special cases include hydroxyl, SH, and carboxyl groups which are treated as being rigid, i.e., the dihedral angle is not expanded explicitly, but features (see section 1.3) on these groups are defined in a way that their inherent flexibility is taken into account. Ring conformations are also considered rigid.
The algorithm treats the molecule in question as a “graph” and determines a path P (denoted by the solid line in
The parameter ε should be larger than 0 in order to ensure stability of the algorithm, so here ε is chosen to be 0.1. In the limit ε→0, the length of the path P corresponds to the number of rotatable bonds along P. The procedure described above minimizes the number of rotatable bonds not contained in the path P, and therefore maximizes the number of “cuttable” bonds. This works well for molecules that have few branches. Finally, an exhaustive search of all possible fragmentations involving cuts at the rotatable bonds along P is performed to find the optimal set of rotatable bonds to be cut such that the total number of fragment pair conformations (to be stored in the database) is minimized given the minimum fragment pair size and the sampling resolution Δφ of the dihedral angles. The minimum fragment pair size is chosen to ensure meaningful extended pattern matches in the 3D pattern matching procedure (see section 1.4). A minimum fragment pair size of 25-30 atoms works well, resulting in fragment pairs built from fragments that typically contain more than one rotatable bond (as opposed to the methods disclosed in U.S. Pat. Nos. 5,787,279 and 5,752,019 to Rigoutsos, in which the fragments are constrained to be rigid, i.e., to not have rotatable bonds).
1.2 Sampling of Conformational Space
Molecular conformations are generated by uniformly sampling the dihedral angles of all rotatable bonds with a certain angular resolution Δφ. In the preprocessing step this is done for all fragments and all fragment pairs of the database molecules, and a bump check is performed that removes conformations for which the distance between any two non-bonded atoms is smaller than 0.65 times the sum of the respective atomic van der Waals radii (as taken from A. Bondi, J. Phys. Chem., vol. 68, p. 441, 1964). This avoids the possibility of near coincident atoms in the conformation.
Since the number of stored conformations should be as small as possible in order to minimize the query processing time, a subsequent “coarse graining” step is performed to ensure that the mutual RMSD (after minimization with respect to translations and rotations) of all pairs of conformations is larger than a certain threshold μ. This is done by scanning through all conformations generated by the angular expansion procedure and removing all those whose RMSD with respect to any previously encountered conformation is smaller than μ. Since the RMSD represents a metric in the space of conformations, this can be done efficiently by making use of the triangle inequality. The parameter μ should reflect the expected accuracy (in terms of the RMSD) of the final candidate alignments on the query molecule; a value of 1 Å has been found to work well. In order to ensure that all fragment pairs are built from the same set of fragment conformations, the coarse graining procedure is performed first on the fragment level and then again after sampling the dihedral angle of the connecting bond on a fragment pair.
A complication of the strategy described above arises from the fact that conformations of fragments and fragment pairs are actually sampled instead of complete molecules. It may happen that fragment conformations, which are very close in conformational space and thus fall below the RMSD threshold μ, lead to conformations of the complete molecule that are very far apart. In order to deal with that problem, it is advantageous to approximate RMSDs of the complete molecule while sampling conformations on the fragment or fragment pair level. This is done by attaching dummy atoms (or dummy weights) 80, 82 to respective connecting bonds 84, 86 at respective distances of l, l′ from the fragment 88 (see
1.3 Feature Definition
The query molecule, as well as the fragment pair conformations in the database, is represented by a set of features reflecting local physico-chemical properties of the molecules. These features are explicitly calculated and stored for each fragment pair conformation. A feature F can be written as a tuple
F=(I, K, x, p) (1)
where I is an integer representing the feature type, K is a reference to the fragment pair (not present for query features), x is the location of the feature in Cartesian space (in the fragment pair or query molecule coordinate system), and p is an optional unit vector that may represent any directional or orientational information of the feature, e.g., the direction of a hydrogen bond. All fragment pair features are stored in a lookup table as a pair (index, value list), in which I serves as the lookup index, and (K, x, p) represents an entry in the value list.
“Hydrophobic atoms” as defined in
The information about the group size, i.e., the number of carbon atoms contained within the group, is passed along through the lookup table. In order to make feature matches between two hydrophobic groups more specific, only groups are considered for pattern matching in which the difference in the number of atoms is less than 3. Note that the above definition of hydrophobic groups is inherently conformation-dependent and cannot be derived from molecular topology information alone.
The vector p attached to a feature represents either the approximate direction of a fictitious hydrogen bond or, in the case of a benzene ring, the normal to the ring plane. In the latter case, the sign of the vector is not used during processing of the query.
As in the fragmentation and conformational expansion steps (see sections 1.1, 1.2), OH, SH, carboxyl (COO−), amine oxide, SO3−, and PO3− groups are treated in a special way. Since the dihedral angle at an OH or SH group is not expanded, both an acceptor and a donor feature are placed on the O or S atom with the vector p pointing in the direction of the bond away from the non-hydrogen atom connected to O or S. The inherent flexibility of a hydroxyl or SH group is accounted for by choosing appropriate angular tolerances (see eq. (7)) in the pattern matching procedure, as discussed below). Carboxyl groups are treated in a similar way. They are treated here as being deprotonated (even if a hydrogen atom is explicitly present in the input file) and therefore carry an acid feature and a hydrogen bond acceptor feature. Flexibility is accounted for here by the choice of the angular tolerance of the attached vector (see below). PO3− and SO3− groups only carry a charge (acid) feature on the phosphorus or sulfur atom and no features on the oxygen atoms.
1.4 Lookup Table and Pattern Matching
When a database query is launched, lookup table entries are retrieved for each query molecule feature resulting in a list of feature correspondences {c1, c2, . . . }. A feature correspondence c is the match of a single feature on the query molecule with a feature on a fragment pair, i.e., both the query feature and the fragment pair feature have the same feature type (or lookup index). c can be expressed as
c=(xq, xd, pq, pd, K) (2)
in which xq and xd are, respectively, the feature locations on the query molecule and a fragment pair of a particular database molecule, pq and pd are the corresponding unit vectors encoding directional information, and K is the fragment pair index. Different molecules in the database are preferably processed separately, so that fragment pair conformations belonging to only one molecule at a time are considered.
In order to detect common feature patterns on the query molecule and on a particular fragment pair conformation, it is advantageous to find sets of feature correspondences
cj=(xqj, xdj, pqj, pdj, Kj), j=1, . . . , V, (3)
in which V is the number of votes that can be matched simultaneously within a certain geometric tolerance by aligning the fragment pair onto the query molecule. This alignment can be expressed by a rotation matrix R and a translation vector t applied to the fragment pair. R and t are determined by the condition that
|xqj−R xdj−t| and ∠(pqj, R pdj) (4)
are small given a certain tolerance for all j=1, . . . , V. The alignment of the fragment pair defined by the rotation R and translation t then represents a hypothesis with V votes.
The pattern matching problem described above can be treated within a clique detection approach by mapping the feature correspondences onto the vertices of a graph G, in which two vertices are connected by an edge if the corresponding feature correspondences are pair-wise compatible. Alignments with V votes then correspond to cliques with V vertices in the graph G. Two feature correspondences (indexed by i=1, 2 in the discussion that follows)
ci=(xqi, xdi, pqi, pdi, Ki) (5)
are defined to be compatible if the distances between the features are the same on the query molecule and on the fragment pair within a tolerance ε=ε1+ε2,
∥xq1−xq2|−|xd1−xd2∥<ε, (6)
and a pair-wise alignment dq∥R dd (where R is a rotation matrix) of the difference vectors dq=xq1−xq2 and dd=xd1−xd2 exists, such that
∠(pqi, R pdi)<γi, (7)
in which γi denotes an angular tolerance. The parameters ε1, ε2, γ1, and γ2 are independently assigned to the two feature correspondences c1 and c2. It can be shown that eq. (7) is equivalent to the condition
Δφ1+Δφ2≧|φQ−φD| (8)
in which
Δφi=2 arcsin(√γi) (9)
and
λi=[cos(aQi−aDi)−cos γ1]/2 sin aQi sin aDi (10)
with φQ, φD, aQi and aDi defined as shown in
Preferred methods herein use an incremental approach to compute the list of all cliques in the graph G ordered by the number of votes. The actual fragment pair alignments or hypotheses—represented by the rotation R and translation t—are then determined by an RMSD minimization procedure applied to the feature locations. While pair-wise distance compatibility of feature correspondences ensures the distance compatibility of the whole feature pattern (except in situations with mirror symmetry), this is not the case for the pair-wise angular compatibility of the attached vectors pqj and pdj. Therefore, after clique-detection a post-processing step is applied that removes all feature correspondences in the clique where ∠(pqj, R pdj)≧γj and reduces the number of votes V accordingly. Note that at least 3 votes are needed for a unique alignment. For this reason all cliques with V<3 are discarded.
In order to limit the workload in the subsequent processing steps not all hypotheses are passed to the assembly procedure. For each fragment pair conformation, hypotheses that pass an additional shape screen (see section 1.2) are ordered with respect to votes, and put into “buckets”, Bk, containing all hypotheses with the same number of votes, Vk. All n buckets Bk with the highest number of votes (V1=Vmax; V2=Vmax−1; . . . ; Vn=Vmax−n+1), in which Vmax is the maximal number of votes, and
Σk=1,n−1mk+mn/2≦p, (11)
are handed over to the assembly, where mk is the number of hypotheses in Bk. This ensures that the total number of hypotheses per fragment pair conformation is always smaller than 2p. Unless stated otherwise, p=5 in the examples described below.
1.5 Assembly Algorithm
After lookup and pattern matching, each stored fragment pair conformation results in zero, one, or a plurality of “hypotheses”, i.e., fragment pairs that are aligned to the query molecule. A hypothesis can be written as an ordered pair (f, g) of fragments in which f={x1, x2, . . . , xn} and g={y1, y2, . . . , yn} are atomic coordinates of the two fragment conformations in the fragment pair aligned to the query molecule. f and g can be expressed in terms of the rotation matrices Rf and Rg that define the rotation of the aligned fragments relative to the fragments f0 and g0 stored in the database,
f=Rf(f0−x0)+x; g=Rg(g0−y0)+y (12)
in which x0 (y0) is the center of geometry of the stored coordinates f0 (g0), and x (y) is the center of geometry of the fragment f (g) aligned onto the query molecule Q. Rf and Rg can be calculated from the rotation R obtained in the pattern matching step and the known rotations and translations R0f,g that map the fragment coordinates stored in the database onto fragment pair coordinates.
Rf=R*R0f; Rg=R*R0g (13)
These concepts are outlined schematically in
The goal now is to find all candidates, i.e., superpositions of a complete database molecule with the query molecule, which are consistent with the hypotheses. This can be viewed as looking for sets of hypotheses
H1=(f1, f2), H2=(f′2, f3), . . . , HN−1=(f′N−1, fN) (14)
in which the coordinates of overlapping fragments are sufficiently close with respect to a certain metric d (defined below). Here N is the number of fragments present in the molecule, and fi and f′istand for the same fragment conformation, but with different coordinates according to the alignments of the hypotheses Hi−1 and Hi. For the assembly procedure, the following metric d in the space of the coordinate representations of a fragment conformation f is used:
d(f,f′)=½|yf−yf′|+a/2 a cos[½Tr(Rf′ RfT)−1] (15)
d consists of a translational part and a rotational part. The translational part is simply the distance of the centers of geometry of f and f0, while the rotational part represents the angle of rotation between f and f0. The parameter a determines the relative weight of both contributions and is chosen to be a=3 Å (see M. C. Pitman et al., J. Comput. Aided Mol. Des., vol. 15, p. 587, 2001). Note that explicit coordinates are not needed in order to evaluate d.
The candidate assembly can be expressed in terms of a graph representation, in which hypotheses form the vertices of a directed graph G and two hypotheses are connected by a directed edge (f, g)→(g′, h) if the distance (in terms of the metric d, see eq. (15)) of the overlapping fragment g, g′ is smaller than a cutoff parameter δ,
d(g, g′)<δ (16)
i.e., (f, g) and (g′, h) are pair-wise compatible. A good working value of δ is 2 Å.
A candidate then corresponds to a directed path of length N in the graph G. In the assembly procedure, all those paths in G are enumerated that can be done in a straightforward way using recursion. Since the number of votes every fragment receives as a result of “matching” is known, the total number of votes for candidates can be calculated by adding up all votes along the path. Votes of overlapping fragments are weighted by a factor of ½ to avoid counting them twice.
The result of this graph-based assembly procedure is a list of candidates, which is ordered by the number of votes, and in which each candidate is represented by a path in G. In order to obtain the explicit coordinate representation of a candidate, fragments are explicitly rotated and translated. This is done by starting from the first fragment f1 and then “adding” the other fragments successively. All rotations and translations needed in this process can be calculated straightforwardly from the rotation matrices and centers of geometry, which were used above to calculate the distance d of adjacent hypotheses. The process of merging pair-wise compatible hypotheses is outlined schematically in
After the assembly of a candidate in Cartesian space, a rigid alignment to the query molecule is performed using RMSD minimization with respect to the matching feature points. Of course, the explicit calculation of candidate coordinates is an expensive procedure. Thus, it is advantageous to do an explicit assembly only for the highest scoring candidates that do not fail a bump check (see section 1.2) and pass an additional shape screen (see section 2). For this set of candidates, which represent the final result of the query, the Carbo score is also calculated. (See R. Carbo et al., Int. J. Quant. Chem., vol. 17, p. 1185, 1980; in preferred implementations of the invention the Carbo function is used with three-dimensional Gaussian densities exp (−(r−r0)2/2τ2), in which r0 is an atom center and τ=1 Å, as described in M. C. Pitman et al., J. Comput. Aided Mol. Des., vol. 15, p. 287, 2001.)
2. Examples
Two types of examples are presented that are designed to assess the performance of the algorithm (in terms of CPU time), the quality of molecular alignments, the influence of the sampling resolution of the conformational space, and the sensitivity and selectivity of the molecular similarity search method. The first type is a mutual alignment of pairs of molecules that bind to the same receptor, and in which conformations of the bound states are known (see section 2.2). In a second type of example, queries are performed against a database of 1780 compounds comprising 52 known dihydrofolate reductase inhibitors (methotrexate, dihydrofolate (which served as the query molecule), and 50 molecules from the dataset of Crippen et al., J. Med. Chem., vol. 23, p. 599, 1980—in particular, compounds 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 29, 30, 31, 32, 34, 35, 36, 38, 39, 41, 42,43, 45, 46, 47, 48, 49, 50, 52, 54, 56, 57, 60, 62, 63, 64, 65, 66 from this reference—see
Parameters used in these examples are εi=0.4 Å for the distance tolerance (eq. (6)), except for hydrophobic groups, for which εi=0.7 Å if the matching groups both contain only one atom and εi=1.5 Å otherwise. The angular tolerance (eq. (7)) is γi=50°, except for the special cases of carboxyl and hydroxyl-like groups, where γ was chosen to be 90° to reflect the internal flexibility. The cutoff parameter for the assembly of adjacent hypotheses (eq. (16)) was set to δ=2 Å. The shape screen imposed before and after assembly removes all hypotheses and candidates that contain an atom farther than 4 Å away from any atom of the query molecule.
2.1 Scoring
In the examples herein, a generalized scoring scheme is used (as opposed to just counting votes) to allow for different weights of feature correspondences depending upon the type of feature involved. This was done in order to better balance the importance of the various functional groups in the calculation of the score. For instance, a match between two carbonyl groups (>C═O) with two hydrogen bond acceptors are taken to have weights different from that of two independent hydrogen bond acceptors. Similarly, a match between two carbonyl groups is preferably weighted higher than a match between a carbonyl group and some single hydrogen bond acceptor. All weights were chosen to be equal to 1 except for double tetrahedral and double planar acceptors, and hydroxyl-like donors and acceptors, in which the weight was set equal to 0.86. The “votes score” is then defined as
votes score=(Σfeature correspondences wq wd)/(Σwq2)1/2(Σwd2)1/2 (17)
Here wq and wd are the weights of features on the query molecule Q and the candidate D under consideration, and the sums in the denominator run over all features on Q and D. Note that the votes score as defined above is a number between zero and one. The introduction of feature type-dependent weights requires the additional information about a specific feature type to be passed along in the lookup table. This also allows for a feature type-dependent definition of distance and angular tolerances used in the pattern-matching algorithm (see Eqs. (6) and (7)).
2.2 Molecular Superposition
Mutual superpositions have been performed for a test set of 19 molecules (ligands) known to bind to one of 7 different receptors (proteins) for which the crystal structure of the protein-ligand complex is available, and from which the “correct” molecular alignment can be inferred (see C. Lemmen et al., J. Med. Chem., vol. 41, p. 4502, 1998). These molecules (listed in
In the next step, databases were prepared, each of which contains fragment pair conformations of one molecule from the test set. During the conformational sampling procedure, to ensure that the molecular conformation that corresponds to the crystal structure was not contained in the database, random offsets in the dihedral angle expansion were added. This ensures that self-queries, for which the query and database molecules are the same, are non-trivial, and can be used as a test of how well the conformational space is being sampled. All dihedral angles were sampled with a resolution of Δφ=60°, except for the case of 7cpa (a carboxypeptidase A inhibitor), for which a correct self-alignment (<2.4 Å RMSD) could only be achieved with Δφ=30°. The molecular partitioning scheme (see section 1.1) was optimized for each molecule in the sense that the number of stored fragment pair conformations was made as small as possible, while the fragment pairs were still large enough to allow for a complete assembly. Databases generated this way were queried with all molecules in the test set that bind to the same receptor as the database molecule. As seen in
It is reasonable to expect that larger molecular structures require smaller values of Δφ in order to avoid the possibility that important molecular conformations are missed. As mentioned above, the largest molecule considered here (7cpa with 74 atoms) is the only structure that required setting Δφ=30° in order to obtain the self-alignment correctly.
The results of
The ligands 1dwc and 1dwd are of similar size (71 and 69 atoms, respectively), have functionally similar structure elements (guanidine vs. benzamidine), but have backbones of rather different topologies. Convincing mutual alignments were found between 1dwd and 1dwc with minimum RMSDs of 1.93 and 2.28 Å, which are not simple substructure matches.
All 4 ligands considered are structurally very similar and include a stretched, flexible hydrocarbon chain capped with two 5-ring heterocyclic termini. The ligands fall into two groups representing binding modes, which differ by a reverse orientation of the entire molecules. Excellent alignments (<1.7 Å) were found between ligands having the same binding mode.
For the case in which 2rs3 was stored in the database, votes and Carbo score for this alignment were 1.0 and 0.93, respectively, the RMSD with respect to the crystal structure was 2.16 Å, and the query processing took 1.8 s CPU time.
The chemical structures of these two ligands consist of an N-heterocyclic system bound to a sugar phosphate backbone. The alignments found show a substructure match on the backbone and a non-trivial match involving a reverse orientation of the 5-rings present in both structures (purine in 4fbp and imidazole in t0039) with an RMSD of <0.6 Å.
For the case in which 4fbp was stored in the database, votes and Carbo score for this alignment were 0.63 and 0.96, the RMSD with respect to the crystal structure was 0.57 Å, and the query processing took 0.2 s CPU time.
The ligands dihydrofolate (1dfh) and methotrexate (4dfr) have a pteridine substructure in common that participates in the ligands' binding modes. The correct relative orientation of these substructures resulting from the match of the characteristic hydrogen bond donor/acceptor pattern was found. The overall minimum RMSDs range between 1.54 and 2.30 Å. These relatively large values reflect the fact that all features on the molecule, including those on the hydrophilic tail, are assumed to be of equal importance. The method therefore interpolates between the pteridine ring pattern match and the matches of the benzene ring and the carboxyl groups for the overall superposition, while in reality the alignment of the molecules is determined mainly by the pteridine ring match.
For the case in which 1dfh was stored in the database, votes and Carbo score for this alignment were 0.70 and 0.87, the RMSD with respect to the crystal structure was 1.73 Å, and the query processing took 0.1 s CPU time.
These 4 ligands all possess a similar binding mode, but are very different in size. All queries against 1tlp (especially the self-query) suffer a performance toll from a high load of hypotheses caused by the sugar substructure (which is only present in 1tlp) with its many hydrogen bond donors and acceptors. The (non-self) queries of the smallest ligand 2tmn (26 atoms) against the larger molecules pose a problem because of the size mismatch—no candidates are produced because of missing hypotheses.
For the case in which 3tmn was stored in the database, votes and Carbo score for this alignment were 0.58 and 0.76, the RMSD with respect to the crystal structure was 1.36 Å, and the query processing took 0.1 s CPU time.
The 3 ligands 1cbx, 6cpa, and 7cpa are of very different size (25, 58, and 74 atoms) but all are chemically and structurally similar, with 1cbx being a substructure of 6cpa, and 6cpa being a substructure of 7cpa. Again, no candidates are found in queries of the smaller ligands against the larger ones because of the size mismatch. Queries of the bigger ligands 6cpa, and 7cpa suffer a performance toll from a high load of hypotheses caused by matches on unspecific regions of the molecules such as amide and benzyl groups.
The query of the largest structure 7cpa against 6cpa produces excellent alignments with a best minimum RMSD of 1.39 Å. In this case, 6cpa was stored in the database, and votes and Carbo score for this alignment were 0.67 and 0.81, the RMSD with respect to the crystal structure was 1.60 Å, and the query processing took 10.6 s CPU time.
The ligands 1ela and 1ele are both of a peptidic nature and possess similar binding modes, with 1ele being primarily a substructure of 1ela with minor variations in backbone substituents. All alignments (self and mutual) obtained are excellent with minimum RMSDs ranging from 0.89 to 1.22 Å.
For the case in which 1ela was stored in the database, votes and Carbo score for this alignment were 0.83 and 0.92, the RMSD with respect to the crystal structure was 1.98 Å, and the query processing took 2.2 s CPU time.
An inherent asymmetry in mutual molecular alignments is exhibited with respect to the size of the structures. The query molecule should be larger than or approximately equal in size as the database molecule; otherwise certain candidates can not be assembled since hypotheses are missing. Substructures or approximate substructures are found only if the substructure itself is the database molecule. This becomes apparent in Examples 5 and 6 above. The CPU times for query processing (9th column of
2.3 Database Searches
Queries have been performed against the test database of 1780 molecules described above. The general structure of the molecules taken from Crippen et al. (J. Med. Chem. Vol. 23, p. 599, 1980) is shown in
All the database molecules were partitioned by an automatic procedure based on the partitioning algorithm described in section 1.1. The total number of fragment pair conformations, which were explicitly stored, was 32872. Database preprocessing, which includes the generation and storage of all fragment pair conformations from an arbitrary starting conformation for each molecule, as well as feature computation and creation of fFLASH's central lookup table, took about 35 minutes.
The size distribution of the molecules selected from the NCI database (i.e., the number of atoms per molecule) is given in
A quantitative analysis of the score plot (see
2.4 Architecture
In preferred embodiments of the invention, there is provided media encoded with executable program code to effect any of the methods described herein. This code contains executable instructions that may reside, for example, in the random access memory (RAM) of a processor, or on a hard drive or optical drive of a processor. The instructions may be stored on a magnetic or optical disk or diskette, a disk drive, magnetic tape, electronic read-only memory, or other appropriate data storage device. For example, see the computer program product 200 shown in
Preferred computer architecture is now discussed in connection with
The client 320 includes a graphical user interface (“GUI”, implemented in Java) that is run on a personal computer, such as a Win32 PC. The graphical user interface of the client 320 is used to populate databases, to issue queries, and to visualize results. Alternatively the client and server can be installed on a stand-alone PC or laptop.
The communication protocol between the client 320 and the database 350 is JDBC/SQL, and the protocol between the client 320 and the dispatcher 324 is ASCII streams over TCP/IP sockets. The dispatcher 324 issues requests to the applications 310 and the shell scripts 330 using system calls, and the applications 310 communicate with the database 350 using CLI/SQL.
2.5 Software Workflow
A preferred preprocessing methodology is shown in
A preferred query methodology is outlined in
3.0 Discussion
Although the methods herein are described with respect to a linear chain-like fragmentation approach which generally seems to be sufficient for medium-sized, drug-like molecules, the methods may be more generally used to handle closures of large ring structures, and, more importantly, be applied to branched molecules. In section 2.2, the issue of asymmetry between query and database molecules with respect to their size was discussed. This asymmetry can be reduced by allowing for the assembly of partial candidates and appending the missing fragment “tails” in an arbitrary conformation. Likewise, the sensitivity of the method can be increased by bridging gaps within a candidate if certain fragment pairs do not get enough support in the pattern matching procedure.
In other preferred methods of the invention, the user may actively manipulate features on the query molecule by introducing weights, or switching features on and off entirely according to their importance in the receptor-ligand interaction, e.g., with the aid of a monitor on which molecules and features are displayed. This way, the user may distinguish between parts of the query molecule that are solvent-exposed, and parts that bind to the protein. The resulting molecular alignments can be still further improved in a post-processing step by relaxing the conformational constraints and performing a continuous optimization of the RMSD of feature matches, i.e., the final alignments may be still further improved through the post-processing of candidates. In still other preferred methods, a receptor model instead of a single conformation of a query molecule may be used in order to allow for docking-type applications (see, for example, J. S. Mason et al., J. Med. Chem., vol. 42, p. 3251, 1999; and J. E. Eksterowicz et al., J. Mol. Graphics Modeling, vol. 20, p. 469, 2002).
Methods have been presented herein that make use of relatively simple, atom type based feature definition, resulting in good results for the set of medium-sized drug-like molecules considered herein. In addition, tests with larger molecules (HIV protease inhibitors) performed with the benchmark FlexS-77 dataset collected by C. Lemmen et al. (see the Interent website identified by the concatenation of “http://” and “www.biosolveit.de” and J. Med. Chem., vol. 41, p. 4502, 1998) found correct alignments in some cases; however, local shape similarity, e.g., the packing of hydrophobic side-chains, plays an important role for these structures, which is not properly resolved in the atom-based feature definition presented here. This suggests that other, more appropriate feature schemes, which also encode local shape properties, should be used in those situations. This is, in principle, possible since features are explicitly allowed to depend on fragment pair conformations, a capability used in the “geometric” definition of hydrophobic regions.
A feature scheme that distinguishes between (atom type-based) features in a locally or non-flat environment may be constructed. This increases the selectivity of the molecular similarity search method, since there is now a larger set of feature indices, and therefore fewer feature correspondences. A convenient side effect is that query-processing times are reduced by a factor of 2-3, because the workload for the clique detection algorithm is smaller. Using this feature scheme, it was possible to reproduce all but 4 alignments reported in
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore indicated by the appended claims rather than the foregoing description. All changes within the meaning and range of equivalency of the claims are to be embraced within that scope.
Kraemer, Andreas, Rice, Julia Elizabeth, Horn, Hans Werner
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5510240, | Jun 19 1991 | Aventis Pharmaceuticals Inc | Method of screening a peptide library |
5752019, | Dec 22 1995 | International Business Machines Corporation; IBM Corporation | System and method for confirmationally-flexible molecular identification |
5787279, | Dec 22 1995 | International Business Machines Corporation | System and method for conformationally-flexible molecular recognition |
6346290, | Oct 18 1994 | Intermolecular, Inc | Combinatorial synthesis of novel materials |
6376253, | Sep 02 1997 | Martek Biosciences Corporation | 13C, 15N, 2H labeled proteins for NMR structure determinations and their preparation |
6395480, | Feb 01 1999 | MDS Sciex | Computer program and database structure for detecting molecular binding events |
6399389, | Jun 28 1996 | Caliper Technologies Corp. | High throughput screening assay systems in microscale fluidic devices |
6410331, | Oct 18 1994 | Intermolecular, Inc | Combinatorial screening of inorganic and organometallic materials |
6420179, | Oct 18 1994 | Intermolecular, Inc | Combinatorial sythesis of organometallic materials |
6485905, | Feb 02 1998 | DH TECHNOLOGIES DEVELOPMENT PTE LTD | Bio-assay device |
20030215877, | |||
WO9201933, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 12 2003 | HORN, HANS WERNER | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014230 | /0036 | |
Jun 12 2003 | KRAEMER, ANDREAS | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014230 | /0036 | |
Jun 13 2003 | International Business Machines Corporation | (assignment on the face of the patent) | / | |||
Jun 13 2003 | RICE, JULIA ELIZABETH | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014230 | /0036 |
Date | Maintenance Fee Events |
Mar 25 2010 | ASPN: Payor Number Assigned. |
Oct 11 2013 | REM: Maintenance Fee Reminder Mailed. |
Jan 30 2014 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jan 30 2014 | M1554: Surcharge for Late Payment, Large Entity. |
Oct 16 2017 | REM: Maintenance Fee Reminder Mailed. |
Feb 27 2018 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Feb 27 2018 | M1555: 7.5 yr surcharge - late pmt w/in 6 mo, Large Entity. |
Oct 18 2021 | REM: Maintenance Fee Reminder Mailed. |
Apr 04 2022 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Mar 02 2013 | 4 years fee payment window open |
Sep 02 2013 | 6 months grace period start (w surcharge) |
Mar 02 2014 | patent expiry (for year 4) |
Mar 02 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 02 2017 | 8 years fee payment window open |
Sep 02 2017 | 6 months grace period start (w surcharge) |
Mar 02 2018 | patent expiry (for year 8) |
Mar 02 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 02 2021 | 12 years fee payment window open |
Sep 02 2021 | 6 months grace period start (w surcharge) |
Mar 02 2022 | patent expiry (for year 12) |
Mar 02 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |