Grants - AWARD SUMMARY


CALIFORNIA INSTITUTE OF TECHNOLOGY


This supplement is to accelerate Aim 1C in the original application (1C. We will efficiently curate information derived from the literature and user submission) thereby helping achieve Aim 1 (Increase Database Content), and in a new Aim (Aim 3; described in this document), to pilot extension of the methods and tools to curation at other databases. Reasonably comprehensive, well-structured databases are critical to modern biomedical research but only a few groups have such resources. While man y model organism researchers have access to such databases, these projects struggle to keep up with the expanding literature. As part of the parent grant, WormBase has begun to use automation and semi-automation in our curation pipeline. This automation will be accelerated within WormBase, and extended to other model organism databases (MODs). As part of this project, we will compare data models and curation strategies at each of eleven database projects (including Mouse Genome Database, FlyBase WormBase, Saccharomyces Genome Database, ZebraFish Information Network, and the UniProtKB), prioritize the development of tools according to joint needs and opportunities, and implement automated curation pipelines at a few sites. A generic curation workflow includes paper identification (triage), first pass curation (indexing data types), and retrieval or extraction of facts related to specific data types; this workflow is well suited for automation. Various statistical NLP methods to classify and index papers will be investigated as training sets are developed; a Support Vector Machine (SVM) approach is promising but Hidden Markov Models and Conditional Random Fields will be also evaluated. The Textpresso Search Engine has been adapted by many MODs at least in pilot form and will be used in some curation tasks for identifying individual sentences with relevant facts. Curators will evaluate the outcome of the search results by analyzing true and false positives and negatives using st andard metrics of recall and precision. Their evaluations will serve as the basis for improved recall and precision. All data extracted from papers will be available in freely accessible component databases; annotated training sets will be freely available; all software will be open source and freely available for anonymous download.

Clarification of Codes

Choose a quarter and click "Go."


AWARD OVERVIEW

AWARD OVERVIEW
Award Number 3P41HG002223-10S1 Funding Agency Department of Health and Human Services
Total Award Amount $989,492 Project Location - City Pasadena
Award Date 09/29/2009 Project Location - State CA
Project Status Completed Project Location - Zip 91125-0001
Jobs Reported 0.00 Congressional District 29
Project Location - Country US

Recipient Information (Grants)

Recipient Information (Grants)
Recipient Name CALIFORNIA INSTITUTE OF TECHNOLOGY
Recipient DUNS Number 009584210
Recipient Address 1200 E CALIFORNIA BLVD
Recipient City PASADENA
Recipient State California
Recipient Zip 91125-0001
Recipient Congressional District 29
Recipient Country USA
Required to Report Top 5
Highly Compensated Officials
No

Projects and Jobs Information

Projects and Jobs Information
Project Title WORMBASE: A CORE DATA RESOURCE FOR C ELEGANS AND OTHER NEMATODES
Project Status Completed
Final Project Report Submitted Yes
Project Activities Description Allergy & Immunological Diseases Research
Quarterly Activities/Project Description Research has been completed; no additional funds will be expended.
Jobs Created 0.00
Description of Jobs Created The following types of jobs were created or retained as a result of this award: Postdoctoral/Associate Researcher.


Purchaser Information (Grants)

Purchaser Information
Contracting Office ID Not Reported
Contracting Office Name Not Available
Contracting Office Region Not Available
TAS Major Program 75-0906

Award Information

Award Information
Award Date 09/29/2009
Award Number 3P41HG002223-10S1
Order Number
Award Type Grants
Funding Agency ID 75
Funding Agency Name Department of Health and Human Services
Funding Office Name Not Available
Awarding Agency ID 75
Awarding Agency Name Department of Health and Human Services
Amount of Award $989,492
Funds Invoiced/Received $989,492
Expenditure Amount $989,492
Infrastructure Expenditure Amount $0
Infrastructure Purpose and Rationale Not Reported
Infrastructure Point of Contact Name Not Reported
Infrastructure Point of Contact Email Not Reported
Infrastructure Point of Contact Phone Not Reported
Infrastructure Point of Contact Address Not Reported
Infrastructure Point of Contact City Not Reported
Infrastructure Point of Contact State Not Reported
Infrastructure Point of Contact Zip Not Reported

Product or Service Information (Grants)

Product or Service Information
Primary Activity Code H02.02
Activity Description Allergy & Immunological Diseases Research

Sub-Awards Information

Sub-Awards Information
Sub-awards to Organizations 0
Sub-award Amounts to Organizations $0
Sub-Awards to Individuals 0
Sub-Award Amounts to Individuals $0
Number of Sub-awards less than $25,000/award 0
Amount of Sub-awards less than $25,000/award $0
Number of payments to vendors greater than $25,000 0
Total Amount of payments to vendors greater than $25,000/award $0
Number of payments to vendors less than $25,000/award 10
Total Amount of payments to vendors less than $25,000/award $1,651







Project Location Detail

Location Information
Latitude, Longitude 34º 8' 9", -118º 7' 38"
Congressional District 29
Address 1 1200 E. California Blvd.
Address 2
City Pasadena
County Los Angeles
State CA
Zip 91125-0001
Submit Feedback/Comments: Provide feedback or comments on the performance and progress of awards.