Cargando…
Text-mining assisted regulatory annotation
BACKGROUND: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374703/ https://www.ncbi.nlm.nih.gov/pubmed/18271954 http://dx.doi.org/10.1186/gb-2008-9-2-r31 |
_version_ | 1782154516092682240 |
---|---|
author | Aerts, Stein Haeussler, Maximilian van Vooren, Steven Griffith, Obi L Hulpiau, Paco Jones, Steven JM Montgomery, Stephen B Bergman, Casey M |
author_facet | Aerts, Stein Haeussler, Maximilian van Vooren, Steven Griffith, Obi L Hulpiau, Paco Jones, Steven JM Montgomery, Stephen B Bergman, Casey M |
author_sort | Aerts, Stein |
collection | PubMed |
description | BACKGROUND: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature. RESULTS: We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process. CONCLUSION: Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation. |
format | Text |
id | pubmed-2374703 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23747032008-05-09 Text-mining assisted regulatory annotation Aerts, Stein Haeussler, Maximilian van Vooren, Steven Griffith, Obi L Hulpiau, Paco Jones, Steven JM Montgomery, Stephen B Bergman, Casey M Genome Biol Research BACKGROUND: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature. RESULTS: We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process. CONCLUSION: Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation. BioMed Central 2008 2008-02-13 /pmc/articles/PMC2374703/ /pubmed/18271954 http://dx.doi.org/10.1186/gb-2008-9-2-r31 Text en Copyright © 2008 Aerts et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Aerts, Stein Haeussler, Maximilian van Vooren, Steven Griffith, Obi L Hulpiau, Paco Jones, Steven JM Montgomery, Stephen B Bergman, Casey M Text-mining assisted regulatory annotation |
title | Text-mining assisted regulatory annotation |
title_full | Text-mining assisted regulatory annotation |
title_fullStr | Text-mining assisted regulatory annotation |
title_full_unstemmed | Text-mining assisted regulatory annotation |
title_short | Text-mining assisted regulatory annotation |
title_sort | text-mining assisted regulatory annotation |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374703/ https://www.ncbi.nlm.nih.gov/pubmed/18271954 http://dx.doi.org/10.1186/gb-2008-9-2-r31 |
work_keys_str_mv | AT aertsstein textminingassistedregulatoryannotation AT haeusslermaximilian textminingassistedregulatoryannotation AT vanvoorensteven textminingassistedregulatoryannotation AT griffithobil textminingassistedregulatoryannotation AT hulpiaupaco textminingassistedregulatoryannotation AT jonesstevenjm textminingassistedregulatoryannotation AT montgomerystephenb textminingassistedregulatoryannotation AT bergmancaseym textminingassistedregulatoryannotation AT textminingassistedregulatoryannotation |