The Center will have a major impact on the way biological research is done,
from basic academic studies to applied research at medical frontiers. These
prospects are outlined under the headings of (i) improving access to protein
structure modeling and docking, (ii) amplifying structural reasoning in
biology, (iii) integrating experiment and computation, (iv) providing new
viewpoints deriving from large-scale interaction maps, and (v) facilitating
health-related research.
Biologists working at the molecular level invariably benefit from knowing
the structures and ligands of their proteins. The automated pipeline tools
will bring modeling, docking, and virtual screening capabilities and results
to a large community of biologists who would otherwise be shut out of these
technologies. We anticipate that the biologists will apply the pipeline
to questions in areas of vital interest to themselves, in areas where they
are expert.
We will maintain a database of models of proteins, protein-ligand complexes,
and protein-protein complexes up-to-date with respect to the databases of
known protein sequences, structures, and interactions as well as our software.
We will also interface our central database with other
major biological resources on Internet. These links will be bidirectional
whenever possible. A typical user may first visit one of the primary
resources, such as UCSC Genome Browser, SwissProt, ENTREZ, Protein Information
Resource (PIR), and Protein Data Bank (PDB), and then be guided to our
resource for the "structural biology" information about their protein,
protein family, assembly, or network (see Letters of Collaboration or
Support from D. Haussler, R. Apweiler, E. Koonin, K. Wu, and H. Berman,
respectively).
To maximize the utility of our Center to the biomedical community, we will
provide publicly accessible web-based services for automated protein structure
modeling, protein-ligand docking, and functional annotation on demand. Researchers
will be empowered to use our resources in applications for which pre-computed
results are not available in our databases.
A ready correspondence between sequence and structure for many gene products
will allow biologists to develop structural insight into the functioning
of their biological systems, at levels spanning individual proteins, protein
families, protein complexes, networks, and genomes. For example, the protein-protein
docking pipeline will support structural proteomics 18,19 and Phase II of
the Protein Structure Initiative ( eg , Driving Biological
Project 2 in Core 3). And our modeling of the functional impact of non-synonymous
single nucleotide polymorphisms (SNPs) in drug response genes will use structural
insights to aid in the characterization of the variation in drug response
between individuals (Driving Biological Project 3 in Core 3).
Neither theory nor experiment work well in isolation. A major premise of
our project is that experimental data will empower theory and that theory
will aid in designing decisive experiments. We see the interplay between
theory and experiment at all levels of the proposal. Comparative protein
structure modeling is, after all, based on the collection of experimentally
determined structures. Modeling in all forms can be greatly aided by experimentally
derived restraints ( eg , the electron microscopy maps and crosslinking
experiments proposed in Driving Biological Project 2). Structure-based design
has assisted many drug design projects ( eg , Driving Biological
Project 1). A computational approach to predicting functional consequences
of point mutations in proteins is informed by experiment, yet it extends
the reach of experiment by making predictions for the full set of alternative
mutations ( eg , Driving Biological Project 3).
Our software system will be unique because of its applicability on a genomic
scale, opening doors to many new questions and applications. Unique inferences
are possible from even a partial mapping of the interactions of proteins
and ligands. Therefore, we are also developing new applications made possible
by the existence of a comprehensive map of protein-ligand interactions.
Examples of large-scale applications include (i) a functional annotation
of the structures determined by the NIH Protein Structure Initiative and
their homologs (Section D.19 in Core 1&2); (ii) identification of
small ligands that allow control and modification of protein signaling
circuits (Driving Biological Project 2); (iii) the development of drug
candidates that are specific to a particular human protein, or that
broadly inhibit a particular set of pathogens but not human proteins
(Driving Biological Project 1); and (iv) identification of new proteins
as drug targets and potential drug toxicity problems by docking the
~3,000 approved small molecule drugs to the ~15,000 human proteins of
known or modeled structure.
A large-scale calculation enables a search for "golden nuggets" among
the massive output even when much of it is not highly accurate,
as long as there are means to assess the results. Correspondingly, testing
of the algorithms and the pipeline as a whole, as well as assessment
of its output are major components of our proposal.
We expect that the pipeline and its applications will have a direct impact
on health-related issues. For example, the software pipeline will facilitate
the discovery of new drug targets and better leads for drug discovery, as
we hope to demonstrate by drug discovery for third world parasitic diseases
(Driving Biological Project 1). A second direct application is our modeling
of the functional impact of SNPs in drug response genes to elucidate the
variation in drug response between individuals (Driving Biological Project
3).
At a higher level, annotation of the function of proteins by describing
their interactions with other molecules is one of the major problems in biology
and medicine. Because a comprehensive functional annotation by experiment
is intractable, development of the computational approaches, such as those
proposed here, is the only practical option.
Next Section: Challenges