Cargando…
A method for automatically extracting infectious disease-related primers and probes from the literature
BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirical...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923139/ https://www.ncbi.nlm.nih.gov/pubmed/20682041 http://dx.doi.org/10.1186/1471-2105-11-410 |
_version_ | 1782185483029184512 |
---|---|
author | García-Remesal, Miguel Cuevas, Alejandro López-Alonso, Victoria López-Campos, Guillermo de la Calle, Guillermo de la Iglesia, Diana Pérez-Rey, David Crespo, José Martín-Sánchez, Fernando Maojo, Víctor |
author_facet | García-Remesal, Miguel Cuevas, Alejandro López-Alonso, Victoria López-Campos, Guillermo de la Calle, Guillermo de la Iglesia, Diana Pérez-Rey, David Crespo, José Martín-Sánchez, Fernando Maojo, Víctor |
author_sort | García-Remesal, Miguel |
collection | PubMed |
description | BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. |
format | Text |
id | pubmed-2923139 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29231392010-08-18 A method for automatically extracting infectious disease-related primers and probes from the literature García-Remesal, Miguel Cuevas, Alejandro López-Alonso, Victoria López-Campos, Guillermo de la Calle, Guillermo de la Iglesia, Diana Pérez-Rey, David Crespo, José Martín-Sánchez, Fernando Maojo, Víctor BMC Bioinformatics Methodology Article BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. BioMed Central 2010-08-03 /pmc/articles/PMC2923139/ /pubmed/20682041 http://dx.doi.org/10.1186/1471-2105-11-410 Text en Copyright ©2010 García-Remesal et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article García-Remesal, Miguel Cuevas, Alejandro López-Alonso, Victoria López-Campos, Guillermo de la Calle, Guillermo de la Iglesia, Diana Pérez-Rey, David Crespo, José Martín-Sánchez, Fernando Maojo, Víctor A method for automatically extracting infectious disease-related primers and probes from the literature |
title | A method for automatically extracting infectious disease-related primers and probes from the literature |
title_full | A method for automatically extracting infectious disease-related primers and probes from the literature |
title_fullStr | A method for automatically extracting infectious disease-related primers and probes from the literature |
title_full_unstemmed | A method for automatically extracting infectious disease-related primers and probes from the literature |
title_short | A method for automatically extracting infectious disease-related primers and probes from the literature |
title_sort | method for automatically extracting infectious disease-related primers and probes from the literature |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923139/ https://www.ncbi.nlm.nih.gov/pubmed/20682041 http://dx.doi.org/10.1186/1471-2105-11-410 |
work_keys_str_mv | AT garciaremesalmiguel amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT cuevasalejandro amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT lopezalonsovictoria amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT lopezcamposguillermo amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT delacalleguillermo amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT delaiglesiadiana amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT perezreydavid amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT crespojose amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT martinsanchezfernando amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT maojovictor amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT garciaremesalmiguel methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT cuevasalejandro methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT lopezalonsovictoria methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT lopezcamposguillermo methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT delacalleguillermo methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT delaiglesiadiana methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT perezreydavid methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT crespojose methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT martinsanchezfernando methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature AT maojovictor methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature |