Issues arise from our general approach that should be addressed. We have
organized these issues into a set of questions and answers: questions of
methodology and scope, and questions dealing with the prospects for success.
In addition, issues of timeliness, and the role of the Center are also considered.
The most basic questions are methodological and focus on why we choose
to supplement experiments with our particular computational approaches.
Why combine computational ligand docking with experimental screening
to find ligands? Computation can complement experimental screening.
Screening experiments are relatively expensive, require considerable logistical
expertise, and are not always successful on their own. They often have
significant false positives and false negatives rates, and cannot be applied
on a genome-wide scale. Even medium scale screening is unavailable to
most academic laboratories. Computational screening, once set up, is much
less expensive to run and to maintain. It can be readily expanded to screen
virtual libraries and can greatly assist experimental design. Computational
screening also has issues of accuracy, including both false positives
and false negatives, which we hope to improve as part of this proposal.
Why use computational ligand docking in addition to experimental
analyses to help characterize the structure and energetics of complexes? Computation
can complement structural experiments. It is possible to model alternative
physical conditions ( eg , p H, salt, solution versus crystal
states) that might not be readily available to experiment. Computation
can also be used to bridge the gap between the conditions required for
structural measurements and those present at the biological active state.
Further, we are learning from the structural genomics projects that it
is genuinely difficult to co-crystallize ligand-protein complexes for
crystallography or find the right solubility conditions for NMR spectroscopy.
Computation may be the only route to structure for many protein-ligand
complexes.
Why use computational docking to elucidate macromolecular complexes? As
noted above, structures of arbitrary single proteins can be difficult to
determine routinely by experiment. Even if the structural genomics initiative
meets its goal, it can only cover a small fraction of the cell's complement
of proteins. Expanding the task to the very large number of potential complexes
is exponentially more difficult, and is beyond any current experimental
technology, not withstanding the striking experimental successes with a
number of complexes. Our approach will be to try the best available computational
technology to see how successful it is, first in reproducing known complexes
and second, in generating testable hypotheses for complexes whose structures
are not known. This approach, combined with electron microscopy and chemical
crosslinking, should be particularly powerful for multidomain proteins which
have proved particularly difficult to crystallize.
Why use protein structure models , when even docking against high-resolution
crystallographic structures of proteins can be difficult? There
is no hope of having high-resolution crystallographic structures
of all human and pathogen proteins and complexes, under all relevant
conditions. The only practical manner of exploring ligand-protein
interactions for most systems is to use comparative protein structure
models. We present clear results in Section 5.2 in Core 1&2 that
docking against comparative models based on as little as 30% sequence
identity can be useful. We will develop new combined modeling and
docking technologies to further improve the utility of comparative
models.
Why develop automated protein structure modeling and docking, when
the results are frequently problematic even for experts? Automation
is essential if the results are to be used by a broad community of researchers
who are not computational experts. Even expert users may prefer an automated
method for reasons of efficiency and scope. Automation also encourages
good programming practice that results in robust, efficient, and flexible
methods. Automation allows rigorous testing of the methods, based on statistically
significant test sets and in a blind manner, because it obviates the need
for manual intervention. Automation is also pre-requisite for large-scale
calculations (see the next question). Finally, automation makes the pipeline
available as a research resource on the web.
The second set of questions deal with the scope of the proposal and the
choice of specific projects.
Why aim for large-scale protein structure modeling and docking,
when the results are frequently inaccurate even by experts who focus on
a single system? There are a number of applications of protein
structure modeling that require computation across an entire network
of genes, an entire genome, or even sets of genomes. For example,
development of drug candidates that are specific to a particular human
protein, or that broadly inhibit a particular set of pathogens but
not any of the human proteins would be enabled by accurate genome-wide
protein structure modeling and ligand docking calculations. In addition,
a large-scale calculation enables a search for "golden nuggets" among
the massive output even when much of it is not highly accurate, as
long as there are means to assess the results. Testing of the pipeline
and assessment of its output are major components of our proposal.
Why is the Center a genomics project? A major rationale
for our proposal is dealing with macromolecular interactions on a genomic
scale. Thus, along with the enabling computer science, software engineering,
and computational chemistry, we also include computational biology and bioinformatics.
In fact, our effort is to combine the best from all these fields.
Lastly, we come to the general questions.
Why is it timely to have a computational chemistry effort on a
genomic scale? For the first time, there is a confluence of
the massive amounts of varied data generated by genome sequencing, structural
biology with structural genomics, and functional genomics, powerful computing
provided by the cluster architectures, as well as the improvements in
the computational methods that contribute to the pipeline. None of these
existed in the past.
Why support a large center instead of the individual research projects? There
is a special need for this project because it is challenging, interdisciplinary,
and complex both scientifically and technologically. The success of the
proposal depends on our ability to join forces to integrate significant
scientific and software engineering efforts, and to secure a substantial
computer cluster with more than 2,000 computational nodes. These goals cannot
be met without significant direct support. Moreover, the optimization of
the whole pipeline is best achieved by the optimization of the individual
modules in the context of the remaining modules, not on their own. Most
of the investigators are already performing the research that is directly
relevant to the pipeline, but an additional effort is needed to put it together
and reap benefits of the integration. If the Center is funded, it would
greatly facilitate and encourage closer and productive cooperation between
all the investigators in the Center.
Why will we succeed where others have not? Our Center includes
expert investigators who have already been working on the individual aspects
of the overall pipeline with some success, and have recently even began working
on integrating the modules together. UCSF is an exemplary research and educational
institution with a history of achievement in the areas of the proposed research
and provides an environment in which the proposed system can be tested, improved,
and applied to important biological problems.
Next section: LITERATURE