General Structure Search
Introduction
In the simplest case two molecular structures are compared. An exact structure
search determines whether they have the same topology or not.
Chemists are more often interested in substructure search, that is, whether one
molecular structure contains the other one as a substructure or not. By
definition, the examined molecule is called a target, the structure we are
looking for is called a query, and a target molecule matching the query structure
is called a hit (Table 1).
Table 1 Exact structure search, substructure search
query
|
target
|
hit
|
exact structure search
|
substructure search
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Atom lists, not lists
It is possible to define the type of an atom in a custom atom list. If
the type of the corresponding atom in the target molecular structure is a member of the list, it is considered a matching atom (Table 2). Not lists
can be used to specify atoms to be excluded (Table 3).
Table 2 Atom lists
Table 3 Atom not lists
Generic atoms
Applying atom lists and not lists is a practical solution when the number of
included or excluded atoms is small. However, generic atom types are helpful to
avoid long atom lists (Table 4). JChem handles two generic atom types,
Any and Hetero (the set of available generic atoms will be
extended soon).
Any any (any atom except hydrogen)
Q hetero (any atom except hydrogen and carbon)
Table 4 Generic query atoms
Reaction Search
Simple query structure in reaction
Structures to the left of the reaction arrow are reactants (starting materials),
structures to the right of the reaction arrow are products, and those molecules drawn
just above or below the arrow are agents (ingredients). Corresponding atoms
containing changing bonds (created, destroyed or modified) are marked with map
numbers both in the reactants and in the products (Table 5).
Searching for a substructure in a reaction equation does not differ from
the classical substructure search process described above. Any matching in any
reaction components (reactants, agents, products) is a hit. This is not the
case when the query itself is a reaction.
Table 5 Searching for simple query structure in reaction
Query structure in reaction components
Reaction queries are not necessarily complete reactions. Reaction queries
sometimes contain reactants only. In this case, the search engine retrieves
reactions containing reactants matching to the given structure. When just a
product is specified in the query, those reactions will be returned which
contain matching products (Table 6).
Table 6 Searching for query structure in reaction components
Atom maps
When a reaction query has mapped atoms, the reaction center of a matching
reaction is mapped correspondingly. Although, the actual value of the map
numbers might be different in the query and the target, the hit atoms have to
be paired exactly as they are in the query (Table 7).
Table 7 Searching by mapped reaction queries
Component identification
A query structure occasionally consists of some disjunct fragments. Since
these fragments belong to a single reaction component in the query, their
corresponding hits must belong to a single component as well. Two components
of a reaction query are matching to two components of a target reaction
(Table 13).
Table 13 Component identification during reaction search
|