Molecular Informatics utilises many ideas and concepts to find
relationships between molecules. The concept of similarity,
where molecules may be grouped according to their biological
effects or physicochemical properties has found extensive use
in drug discovery. Some areas of particular interest have been
in lead discovery and compound optimisation. For example,
in designing libraries of compounds for lead generation, one
approach is to design sets of compounds ‘similar’ to known
active compounds in the hope that alternative molecular
structures are found that maintain the properties required while
enhancing e.g. patentability, medicinal chemistry opportunities
or even in achieving optimised pharmacokinetic profiles. Thus
the practical importance of the concept of molecular similarity
has grown dramatically in recent years. The predominant users
are pharmaceutical companies, employing similarity methods
in a wide range of applications e.g. virtual screening, estimation
of absorption, distribution, metabolism, excretion and toxicity
(ADME/Tox) and prediction of physicochemical properties
(solubility, partitioning etc.). In this perspective, we discuss the
representation of molecular structure (descriptors), methods
of comparing structures and how these relate to measured
properties. This leads to the concept of molecular similarity,
its various definitions and uses and how these have evolved
in recent years. Here, we wish to evaluate and in some cases challenge accepted views and uses of molecular similarity.Molecular similarity, as a paradigm, contains many implicit and explicit assumptions in particular with respect to the prediction
of the binding and efficacy of molecules at biological receptors.
The fundamental observation is that molecular similarity has
a context which both defines and limits its use. The key issues
of solvation effects, heterogeneity of binding sites and the
fundamental problem of the form of similarity measure to use are addressed.
A DYNAMIC AREA:
Molecular similarity is a dynamic and evolving area of
research and has been regularly reviewed. Johnson and
Maggiora1 and Dean6 wrote comprehensive books in this area.
Recently, books by Leach and Gillet and Gasteiger have included
sections on molecular similarity. Recent general reviews
of molecular similarity are given by Willett et al.,Walters etal., Gillet et al and Bajorath,a good critique, particularly
of the misuse of similarity measures is given by Nikolova
and Jaworska. A justification for the large number of molecular
similarity methods is given by Sheridan and Kearsley.7
Bajorath discusses the role of similarity in the integration of
in silico and in vitro screening,while Johnson et al.attempts
to characterize similarity methods (at least those known at that
time). Some caveats of molecular similarity such as different
mechanisms of action and target-dependent similarity are discussed
by Kubinyi. Finally the reader is referred to Tversky,
who describes early approaches to similarity in psychological
testing which have been adopted by later researchers to describe
similarity in molecules. Of interest here is that similarity assessments
are influenced by ‘context, perspective, choice alternatives
and expertise’. The choice of features, transformations
and structural descriptions to describe entities (molecules in our
case) will govern the predictions made by similarity models as
much as do the model’s mechanisms for comparing and integrating
these representations.
The fundamental observation that we can derive from these
facts is that similarity has a context. Two vials of a yellow compound
may be very similar in colour (absorption spectrum)
but wildly different in biological activity. How far the context
of a particular similarity argument can be taken (the ‘neighbourhood
effect’) also depends on the discontinuities found in
receptor–ligand interactions; clearly, the similarities studied are
seldom linear and often have major discontinuities.In this perspective we have reviewed progress in molecular similarity
methods and applications and highlighted some of the
more challenging problems and assumptions.
Molecular similarity is extensively and successfully used in
the drug discovery context often to compare molecules in the
absence of other mechanistic information (a partial exception
is the docking applications described above). Most importantly,
similarity has a context. One has to be aware that similarity
defined on molecules alone in the absence of the medium in
which they act is an incomplete description so great care has to
be taken to use descriptors that are appropriate.
The discontinuous nature of biological effects such as ligand–
receptor binding means that linear regression techniques are
only appropriate for QSAR and related applications if a linear
relationship between feature space and activity exists. In general
it is often more appropriate to use nonparametric or non-linear
regression techniques. The example of electrostatic effects and
their discontinuous relationship with solvation energies is an
example.
Back-projectable descriptors (compared to descriptors without this property) possess better interpretability and will probably have more widespread use in the future. Binary bit strings in combination with similarity coefficients possess preferences with respect to bit density (and thus size of the molecule) and combinatorial preferences and one should be aware of these preferences when applying similarity methods. Applications of machine learning methods in computer-aided molecular design will certainly gain importance in the future particularly with the incorporation of heuristics that improve performance.As undarstanding of the chemistry and biology of drug action improves and a greater ability to model the underlying mechanisms appears, the need for ‘similarity’ approaches will diminish.
No comments:
Post a Comment