UNIVERSITY OF CHICAGO, THE
In two previous stages of this project, both funded by the National Institute of General Medical Sciences and carried out successfully, we developed GeneWays, a completely automated system that efficiently distills information about molecular interactions from an astronomical number of full-text biomedical articles. The next logical stage of the project is to carry this system from the computational laboratory into a practical, useful, and even indispensable tool that researchers can use to solve complex problems currently posed in experimental medicine and biology. The central hypothesis of our work on GeneWays has been that our computational tools will generate biological predictions of a quality sufficiently high that the biomedical community will invest in serious experimental validation. Specifically, we propose the following. 1. We will improve significantly the precision and recall of the GeneWays system. 2. We will develop and implement a probabilistic belief-network formalism?a belief-graph relative of the Bayesian network formalism that allows us to place and update beliefs on both the vertices and the edges of the graph for probabilistic reasoning over the large collection of facts in the GeneWays database. We will develop and implement a coordinated collection of methods for computing and updating beliefs on individual nodes and edges of the belief graph. 3. We will develop and implement a mathematical framework for incorporating pathway information into a genetic- linkage analysis formalism in such a way that each piece of pathway knowledge includes a specified degree of confidence. 4. We will process an enormous collection of texts, such as open-access biomedical journals, PubMed abstracts, and the GeneWays corpus, and thus will build a comprehensive GeneHighWays database. We will make the GeneHighWays database easily and freely accessible to academic researchers through a web interface. We will evaluate the new version of the GeneWays system and the GeneHighWays database for the quality of data, performance of the mathematical methods, and quality of the interface.