qb3

 

A Case for Launching the Center

 

Issues arise from our general approach that should be addressed. We have organized these issues into a set of questions and answers: questions of methodology and scope, and questions dealing with the prospects for success. In addition, issues of timeliness, and the role of the Center are also considered.

The most basic questions are methodological and focus on why we choose to supplement experiments with our particular computational approaches.

Why combine computational ligand docking with experimental screening to find ligands? Computation can complement experimental screening. Screening experiments are relatively expensive, require considerable logistical expertise, and are not always successful on their own. They often have significant false positives and false negatives rates, and cannot be applied on a genome-wide scale. Even medium scale screening is unavailable to most academic laboratories. Computational screening, once set up, is much less expensive to run and to maintain. It can be readily expanded to screen virtual libraries and can greatly assist experimental design. Computational screening also has issues of accuracy, including both false positives and false negatives, which we hope to improve as part of this proposal.

Why use computational ligand docking in addition to experimental analyses to help characterize the structure and energetics of complexes? Computation can complement structural experiments. It is possible to model alternative physical conditions ( eg , p H, salt, solution versus crystal states) that might not be readily available to experiment. Computation can also be used to bridge the gap between the conditions required for structural measurements and those present at the biological active state. Further, we are learning from the structural genomics projects that it is genuinely difficult to co-crystallize ligand-protein complexes for crystallography or find the right solubility conditions for NMR spectroscopy. Computation may be the only route to structure for many protein-ligand complexes.

Why use computational docking to elucidate macromolecular complexes? As noted above, structures of arbitrary single proteins can be difficult to determine routinely by experiment. Even if the structural genomics initiative meets its goal, it can only cover a small fraction of the cell's complement of proteins. Expanding the task to the very large number of potential complexes is exponentially more difficult, and is beyond any current experimental technology, not withstanding the striking experimental successes with a number of complexes. Our approach will be to try the best available computational technology to see how successful it is, first in reproducing known complexes and second, in generating testable hypotheses for complexes whose structures are not known. This approach, combined with electron microscopy and chemical crosslinking, should be particularly powerful for multidomain proteins which have proved particularly difficult to crystallize.

Why use protein structure models , when even docking against high-resolution crystallographic structures of proteins can be difficult? There is no hope of having high-resolution crystallographic structures of all human and pathogen proteins and complexes, under all relevant conditions. The only practical manner of exploring ligand-protein interactions for most systems is to use comparative protein structure models. We present clear results in Section 5.2 in Core 1&2 that docking against comparative models based on as little as 30% sequence identity can be useful. We will develop new combined modeling and docking technologies to further improve the utility of comparative models.

Why develop automated protein structure modeling and docking, when the results are frequently problematic even for experts? Automation is essential if the results are to be used by a broad community of researchers who are not computational experts. Even expert users may prefer an automated method for reasons of efficiency and scope. Automation also encourages good programming practice that results in robust, efficient, and flexible methods. Automation allows rigorous testing of the methods, based on statistically significant test sets and in a blind manner, because it obviates the need for manual intervention. Automation is also pre-requisite for large-scale calculations (see the next question). Finally, automation makes the pipeline available as a research resource on the web.

The second set of questions deal with the scope of the proposal and the choice of specific projects.

Why aim for large-scale protein structure modeling and docking, when the results are frequently inaccurate even by experts who focus on a single system? There are a number of applications of protein structure modeling that require computation across an entire network of genes, an entire genome, or even sets of genomes. For example, development of drug candidates that are specific to a particular human protein, or that broadly inhibit a particular set of pathogens but not any of the human proteins would be enabled by accurate genome-wide protein structure modeling and ligand docking calculations. In addition, a large-scale calculation enables a search for "golden nuggets" among the massive output even when much of it is not highly accurate, as long as there are means to assess the results. Testing of the pipeline and assessment of its output are major components of our proposal.

Why is the Center a genomics project? A major rationale for our proposal is dealing with macromolecular interactions on a genomic scale. Thus, along with the enabling computer science, software engineering, and computational chemistry, we also include computational biology and bioinformatics. In fact, our effort is to combine the best from all these fields.

Lastly, we come to the general questions.

Why is it timely to have a computational chemistry effort on a genomic scale? For the first time, there is a confluence of the massive amounts of varied data generated by genome sequencing, structural biology with structural genomics, and functional genomics, powerful computing provided by the cluster architectures, as well as the improvements in the computational methods that contribute to the pipeline. None of these existed in the past.

Why support a large center instead of the individual research projects? There is a special need for this project because it is challenging, interdisciplinary, and complex both scientifically and technologically. The success of the proposal depends on our ability to join forces to integrate significant scientific and software engineering efforts, and to secure a substantial computer cluster with more than 2,000 computational nodes. These goals cannot be met without significant direct support. Moreover, the optimization of the whole pipeline is best achieved by the optimization of the individual modules in the context of the remaining modules, not on their own. Most of the investigators are already performing the research that is directly relevant to the pipeline, but an additional effort is needed to put it together and reap benefits of the integration. If the Center is funded, it would greatly facilitate and encourage closer and productive cooperation between all the investigators in the Center.

Why will we succeed where others have not? Our Center includes expert investigators who have already been working on the individual aspects of the overall pipeline with some success, and have recently even began working on integrating the modules together. UCSF is an exemplary research and educational institution with a history of achievement in the areas of the proposed research and provides an environment in which the proposed system can be tested, improved, and applied to important biological problems.

 

 


Copyright 2003-2004 CCPR, webmaster