|
|
|
|
Origins of the Center
The
core of the proposed Center is already in place at UCSF. In
early 2003, the majority of the investigators presently
involved in the proposed Center began to integrate their largely
existing software modules to create, test, and apply a software
pipeline for automated large-scale protein structure modeling and
ligand docking (Core 1&2).
The pipeline included comparative protein structure modeling
(M. Jacobson, A. Sali), identifying putative binding sites on proteins
(P. Babbitt, F. Cohen, A. Sali), building of virtual ligand libraries
(B. Shoichet), protein-ligand docking (K. Dill,M. Jacobson, I.
Kuntz, B. Shoichet), and depositing results into a central database
with a graphical user interface (T. Ferrin, A. Sali). It was
to be tested, illustrated, and justified by several new genome-wide
computational applications of protein structure modeling and ligand
docking (M. Jacobson, A. Sali, B. Shoichet, C. Voigt). It turned
out that each of the participating faculty had a natural niche in the
overall project and that between ourselves we covered all of the needed
components well.
We proceeded by organizing this
web site and monthly group meetings at which our students, postdocs,
programmers, and ourselves discussed specific scientific and technical
issues of the pipeline design.
We also made an effort to obtain funding for the integration,
in addition to the existing and new funding for the individual
modules developed by the investigators. First, we secured
a award from The California Institute for Quantitative Biomedical
Research. This support allowed us to hire a postdoc, Niu
Huang, to work on integration issues. Second, we secured a computer
hardware gift from Intel and an IBM SUR award.
More importantly, our current relationships
with IBM and Intel are a spring board for an expanded collaboration
that will involve IBM and Intel contributions to our software environment
and software development, and hopefully also additional hardware
resources.
These events were catalyzed by
two larger developments at UCSF. First, under the leadership of David
Agard, Marvin Cassman, and Kathy Giacomini, the root group of the
UCSF senior computational chemists and biologists ( ie ,
P. Babbitt, F. Cohen, K. Dill, T. Ferrin, and I. Kuntz) was
expanded by
five new faculty in 2002-03 ( ie , M. Jacobson, T. Kortemme,
A. Sali, B. Shoichet, and C. Voigt), with at least one more faculty
position to be filled in 2004. Second, all of the investigators moved
into contiguous space in Genentech Hall on the new Mission Bay campus
of UCSF in February 2003. We now share our computing infrastructure,
group meetings, administrative support, and other facilities.
The recent NIH RFA for National Centers in Biomedical Computing provided
us with an opportunity to significantly increase the scope of our plans.
First, we now aim to characterize protein-protein interactions, in addition
to the protein-ligand interactions. This expansion was enabled partly
by our recent recruitment of Tanja Kortemme as an Assistant Professor
and by establishing a collaboration with David Baker at University of
Washington, Seattle.
Second, we reinforced our own expertise
in building software, databases, and user interfaces for biologists
by including bona fide computer
scientists. Thus, Ben Rosen (Dept. of Computer Science at UC San
Diego), who has been collaborating with Ken Dill since 1996, will
be improving global optimization methods that are essential in virtually
all the steps of our system. To ensure that our
information navigation, databases, and user interfaces exploit the
relevant recent advances in computer science and pathway analysis,
we recruited Marti Hearst (Dept. of Computer Science at UC Berkeley),
and Bruce Conklin (Gladstone Institutes). In addition, our existing
collaboration with Intel and IBM computer scientists and software
engineers will contribute significantly to the hardware and software
environments for the development and execution of our pipeline.
Third, we added a significant experimental
component that will test the benefits of our new system and provide
feedback for its further improvement. The current three Driving Biological
Projects (DBP) are based on our existing collaborations with experimentalists
and respond to the needs of the Center. They involve drug discovery
(J. McKerrow, K. Guy, and J. DeRisi), protein-protein
interactions (D. Agard, W. Lim, and C. Voigt), and single
nucleotide polymorphisms in membrane transporters (K. Giacomini,
D. Kroetz, and J. Rine).
Many of the contributions of the individual
investigators in the Center are already funded by their own grants.
However integrating these projects will take considerable effort and
resources. The proposed budget focuses on efficiently integrating the
components, validating their performance in the context of the pipeline,
and maximizing the utility of the whole system to the end user via service, training, and
dissemination.
Computing
Resources of the Center
Enormous hardware resources are needed for the development, testing,
and especially application of our pipeline for large-scale protein
structure modeling, protein-ligand docking, and protein-protein
docking. This hardware in turn needs significant computer server
space, power supply, and air-conditioning. The Genentech Hall
computer server room shared by the UCSF investigators on this
proposal can house a cluster with approximately 2,000 CPUs of
the Intel Pentium type, with the corresponding amount of file
server and networking capacity. The larger QB3 computer server
room, also available mostly to the investigators on this proposal,
will be able to house approximately 4,000 CPU's of the Intel Pentium
type when it is finished in January 2005.
Our existing computing capacity includes 900 Intel Pentium CPUs
housed in Genentech Hall, and another 650 CPUs in the D. Baker
group at University of Washington, Seattle. Based on the funding
already in hand, we can count on just over 1,900 CPUs by the time
the National Centers for Biomedical Computing grants are awarded.
This number is already quite significant and certainly sufficient
for our initial development and testing efforts as proposed.
However, it is clear that additional computing power will be
needed in the future years, primarily for the production runs
of the pipeline and for serving our collaborators and the community.
We aim to have at our disposal at least 10,000 CPUs when our pipeline
reaches a relatively mature stage in Year 3 of the Center.
We are committed to procure sufficient computing for our goals.
We have several avenues, all of which are pursued in parallel. They
include a yearly $400,000 hardware budget in the present proposal,
a $600,000 hardware budget for the concurrently submitted NSF MRI
proposal ("Acquisition of a cluster computer for bioimaging and
computational biology", A. Sali, Principal Investigator), a mutual
intent to apply jointly with Intel for a $1,000,000 UC Discovery
Grant (see Letter of Collaboration by Tim Mattson of Intel), a collaboration
with the team of computational biologists at Lawrence Livermore
National Laboratory led by Rod Balhorn that involves sharing of
the LLNL
computing resources (see Letter of Collaboration by K. Fidelis),
an intent by the UCSF Development Office to seek private funds
for our computing hardware (see Letter of Support by M. Bishop),
and our intent to apply for future funding opportunities at
NIH, NSF, DOE, and other government and private funding agencies.
In addition, we are especially interested in the IBM's Bluegene
architecture. Because the Bluegene CPUs are more compact and
require less power to run, the server rooms are capable of
housing 5-10 times more Bluegene CPU's than the current Intel
CPUs. This fact is one of the motivating factors to expand
our collaboration with IBM while not sacrificing a more traditional
growth curve associated with continued updates of the current
Intel-based clusters. On this basis, we are confident that
we will have achieved our goal of at least 10,000 CPUs by the
end of 2006, if the current Center proposal is funded.
Environment of the Center
Most of the Center activities will be located at the Mission
Bay campus of UCSF. The majority of the faculty at UCSF
involved in the Center will be in two contiguous buildings,
Genentech Hall and the QB3 building, which will house the cores
of the Center and the research facilities, including our computers.
The QB3 building will be ready for occupancy in February, 2005,
and will primarily house faculty engaged in computational biology,
imaging, and protein engineering. It is expected that more
than 10 new faculty will be recruited in these areas over the
next three years. Genentech Hall is already occupied, with
a focus on biochemistry, biophysics, and chemical biology.
The two buildings will be physically connected on every floor,
ensuring mutual access to facilities and opportunities for
close collaborations. Genentech Hall and QB3 will be themselves
in the midst of a larger development that will soon house the
Gladstone and Gallo Institutes as well as a Cancer Center and
other research institutions. Finally, close ties exist between
the Mission Bay campus and the Medical School, primarily located
at the Parnassus Campus of UCSF. Therefore clinical and basic
research efforts remain closely linked.
The UCSF Mission Bay campus is also home to the Resource for
Biocomputing, Visualization, and Informatics (RBVI), an NIH
NCRR Biomedical Technology Resource Center led by Tom Ferrin,
an investigator on the current proposal. The RBVI creates and
applies novel software tools for solving such problems as gene
characterization and interpretation, drug design, variation
in drug response, protein engineering, biomaterials design,
and prediction of protein functions. The RBVI provides access
for scientists to state-of-the-art computer hardware and software,
and provides training in the form of web-based tutorial modules
and hands-on workshops. It distributes several of its own software
packages via its web site.
There is a strong synergy between the RBVI and the software
development projects described in this proposal. The
RBVI's core Chimera and Data Sharing (DASH) research and development
projects provide crucial tools and background for the proposed
Center. The RBVI employs several programmers with BSc, MS and
PhD degrees in computer science or related engineering fields,
and hence bring considerable expertise to the overall project
in the area of software design, development, and deployment.
This network of activity in biology, medicine and computational
sciences ensures that the developments of the Center take place
within a context where both fundamental biology and medical
applications will influence the direction of the software and
hardware developments by the Center.
In addition to the immediate environment at UCSF, we will also
be exposed to and inter-linked with the broad biology community
in the US and elsewhere, as described in Cores 4-7.
Next section: A Case for Launching the Center
|
|
|