The standard of practice for computational studies of molecular interactions
can be significantly enhanced if we can overcome three scientific and technical
hurdles.
The first of these hurdles is the assessment of the strength of molecular
interactions. There are a wide variety of such assessments, including those
based on atomic force fields, empirical potentials, statistical recursion
approaches, and machine-learning strategies. As a team, we have had experience
with these protocols and recognize the strengths and weaknesses of each.
Scientific developments in this area will continue to be funded by R01 level
grants to individual investigators. The pipeline, by enforcing standard
software protocols, will allow competitive evaluation of such scoring schemes
against a wide domain of test suites.
The second scientific issue is the sampling problem, corresponding to the
exploration of molecular geometry, including the configurational and the
conformational degrees of freedom for both protein and ligand. Enough of
the geometric space must be searched to produce protein structures and docked
complexes close to the physically correct solutions without wasting time
in unproductive searches. The best sampling programs generate hypotheses
about alternative binding modes for experimental testing. Again, our pipeline
is a testbed for programs that address these concerns.
The third major issue is software engineering of the pipeline and associated
databases. All modules of the pipeline need to be automated and the interfaces
between them have to be specified so that the output of one module provides
the input for another. The efficiency, robustness, and accuracy of the pipeline
as a whole needs to be tested and improved. Massive amounts of input data
as well as intermediate and final results need to be stored in a flexible
database. These data need to be accessible via a sophisticated,
general, flexible, and convenient graphical user interface on the web. A
large cluster computer and its software environment need to be adapted to
the needs of the pipeline. Finally, our central database of protein structure
models and their interactions needs to reflect the growth of the input databases
and improvements in our software.
In all these areas, our group of investigators has the necessary expertise.
Typically, in each area, we have developed ground-breaking prototype software
packages and have a track record of distributing them to the scientific community.
We believe that the technical challenges can be met within the time frame
and funding levels of this proposal, in combination with our existing support.
Next section: CENTER CORES