qb3

 

Investigators and Environment

Origins of the Center

The core of the proposed Center is already in place at UCSF. In early 2003, the majority of the investigators presently involved in the proposed Center began to integrate their largely existing software modules to create, test, and apply a software pipeline for automated large-scale protein structure modeling and ligand docking (Core 1&2). The pipeline included comparative protein structure modeling (M. Jacobson, A. Sali), identifying putative binding sites on proteins (P. Babbitt, F. Cohen, A. Sali), building of virtual ligand libraries (B. Shoichet), protein-ligand docking (K. Dill,M. Jacobson, I. Kuntz, B. Shoichet), and depositing results into a central database with a graphical user interface (T. Ferrin, A. Sali). It was to be tested, illustrated, and justified by several new genome-wide computational applications of protein structure modeling and ligand docking (M. Jacobson, A. Sali, B. Shoichet, C. Voigt). It turned out that each of the participating faculty had a natural niche in the overall project and that between ourselves we covered all of the needed components well.

We proceeded by organizing this web site and monthly group meetings at which our students, postdocs, programmers, and ourselves discussed specific scientific and technical issues of the pipeline design.


We also made an effort to obtain funding for the integration, in addition to the existing and new funding for the individual modules developed by the investigators. First, we secured a award from The California Institute for Quantitative Biomedical Research. This support allowed us to hire a postdoc, Niu Huang, to work on integration issues. Second, we secured a computer hardware gift from Intel and an IBM SUR award.

More importantly, our current relationships with IBM and Intel are a spring board for an expanded collaboration that will involve IBM and Intel contributions to our software environment and software development, and hopefully also additional hardware resources.

These events were catalyzed by two larger developments at UCSF. First, under the leadership of David Agard, Marvin Cassman, and Kathy Giacomini, the root group of the UCSF senior computational chemists and biologists ( ie , P. Babbitt, F. Cohen, K. Dill, T. Ferrin, and I. Kuntz) was expanded by five new faculty in 2002-03 ( ie , M. Jacobson, T. Kortemme, A. Sali, B. Shoichet, and C. Voigt), with at least one more faculty position to be filled in 2004. Second, all of the investigators moved into contiguous space in Genentech Hall on the new Mission Bay campus of UCSF in February 2003. We now share our computing infrastructure, group meetings, administrative support, and other facilities.

The recent NIH RFA for National Centers in Biomedical Computing provided us with an opportunity to significantly increase the scope of our plans. First, we now aim to characterize protein-protein interactions, in addition to the protein-ligand interactions. This expansion was enabled partly by our recent recruitment of Tanja Kortemme as an Assistant Professor and by establishing a collaboration with David Baker at University of Washington, Seattle.

Second, we reinforced our own expertise in building software, databases, and user interfaces for biologists by including bona fide computer scientists. Thus, Ben Rosen (Dept. of Computer Science at UC San Diego), who has been collaborating with Ken Dill since 1996, will be improving global optimization methods that are essential in virtually all the steps of our system. To ensure that our information navigation, databases, and user interfaces exploit the relevant recent advances in computer science and pathway analysis, we recruited Marti Hearst (Dept. of Computer Science at UC Berkeley), and Bruce Conklin (Gladstone Institutes). In addition, our existing collaboration with Intel and IBM computer scientists and software engineers will contribute significantly to the hardware and software environments for the development and execution of our pipeline.

Third, we added a significant experimental component that will test the benefits of our new system and provide feedback for its further improvement. The current three Driving Biological Projects (DBP) are based on our existing collaborations with experimentalists and respond to the needs of the Center. They involve drug discovery (J. McKerrow, K. Guy, and J. DeRisi), protein-protein interactions (D. Agard, W. Lim, and C. Voigt), and single nucleotide polymorphisms in membrane transporters (K. Giacomini, D. Kroetz, and J. Rine).

Many of the contributions of the individual investigators in the Center are already funded by their own grants. However integrating these projects will take considerable effort and resources. The proposed budget focuses on efficiently integrating the components, validating their performance in the context of the pipeline, and maximizing the utility of the whole system to the end user via service, training, and dissemination.

Computing Resources of the Center

Enormous hardware resources are needed for the development, testing, and especially application of our pipeline for large-scale protein structure modeling, protein-ligand docking, and protein-protein docking. This hardware in turn needs significant computer server space, power supply, and air-conditioning. The Genentech Hall computer server room shared by the UCSF investigators on this proposal can house a cluster with approximately 2,000 CPUs of the Intel Pentium type, with the corresponding amount of file server and networking capacity. The larger QB3 computer server room, also available mostly to the investigators on this proposal, will be able to house approximately 4,000 CPU's of the Intel Pentium type when it is finished in January 2005.

Our existing computing capacity includes 900 Intel Pentium CPUs housed in Genentech Hall, and another 650 CPUs in the D. Baker group at University of Washington, Seattle. Based on the funding already in hand, we can count on just over 1,900 CPUs by the time the National Centers for Biomedical Computing grants are awarded. This number is already quite significant and certainly sufficient for our initial development and testing efforts as proposed.

However, it is clear that additional computing power will be needed in the future years, primarily for the production runs of the pipeline and for serving our collaborators and the community. We aim to have at our disposal at least 10,000 CPUs when our pipeline reaches a relatively mature stage in Year 3 of the Center.

We are committed to procure sufficient computing for our goals. We have several avenues, all of which are pursued in parallel. They include a yearly $400,000 hardware budget in the present proposal, a $600,000 hardware budget for the concurrently submitted NSF MRI proposal ("Acquisition of a cluster computer for bioimaging and computational biology", A. Sali, Principal Investigator), a mutual intent to apply jointly with Intel for a $1,000,000 UC Discovery Grant (see Letter of Collaboration by Tim Mattson of Intel), a collaboration with the team of computational biologists at Lawrence Livermore National Laboratory led by Rod Balhorn that involves sharing of the LLNL

computing resources (see Letter of Collaboration by K. Fidelis), an intent by the UCSF Development Office to seek private funds for our computing hardware (see Letter of Support by M. Bishop), and our intent to apply for future funding opportunities at NIH, NSF, DOE, and other government and private funding agencies.

In addition, we are especially interested in the IBM's Bluegene architecture. Because the Bluegene CPUs are more compact and require less power to run, the server rooms are capable of housing 5-10 times more Bluegene CPU's than the current Intel CPUs. This fact is one of the motivating factors to expand our collaboration with IBM while not sacrificing a more traditional growth curve associated with continued updates of the current Intel-based clusters. On this basis, we are confident that we will have achieved our goal of at least 10,000 CPUs by the end of 2006, if the current Center proposal is funded.

Environment of the Center

Most of the Center activities will be located at the Mission Bay campus of UCSF.   The majority of the faculty at UCSF involved in the Center will be in two contiguous buildings, Genentech Hall and the QB3 building, which will house the cores of the Center and the research facilities, including our computers. The QB3 building will be ready for occupancy in February, 2005, and will primarily house faculty engaged in computational biology, imaging, and protein engineering. It is expected that more than 10 new faculty will be recruited in these areas over the next three years. Genentech Hall is already occupied, with a focus on biochemistry, biophysics, and chemical biology. The two buildings will be physically connected on every floor, ensuring mutual access to facilities and opportunities for close collaborations. Genentech Hall and QB3 will be themselves in the midst of a larger development that will soon house the Gladstone and Gallo Institutes as well as a Cancer Center and other research institutions. Finally, close ties exist between the Mission Bay campus and the Medical School, primarily located at the Parnassus Campus of UCSF. Therefore clinical and basic research efforts remain closely linked.

The UCSF Mission Bay campus is also home to the Resource for Biocomputing, Visualization, and Informatics (RBVI), an NIH NCRR Biomedical Technology Resource Center led by Tom Ferrin, an investigator on the current proposal. The RBVI creates and applies novel software tools for solving such problems as gene characterization and interpretation, drug design, variation in drug response, protein engineering, biomaterials design, and prediction of protein functions. The RBVI provides access for scientists to state-of-the-art computer hardware and software, and provides training in the form of web-based tutorial modules and hands-on workshops. It distributes several of its own software packages via its web site.

There is a strong synergy between the RBVI and the software development projects described in this proposal.   The RBVI's core Chimera and Data Sharing (DASH) research and development projects provide crucial tools and background for the proposed Center. The RBVI employs several programmers with BSc, MS and PhD degrees in computer science or related engineering fields, and hence bring considerable expertise to the overall project in the area of software design, development, and deployment.  

This network of activity in biology, medicine and computational sciences ensures that the developments of the Center take place within a context where both fundamental biology and medical applications will influence the direction of the software and hardware developments by the Center.

In addition to the immediate environment at UCSF, we will also be exposed to and inter-linked with the broad biology community in the US and elsewhere, as described in Cores 4-7.

 

 

 

 


Copyright 2003-2004 CCPR, webmaster