Cargando…

A method for automatically extracting infectious disease-related primers and probes from the literature

BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirical...

Descripción completa

Detalles Bibliográficos
Autores principales: García-Remesal, Miguel, Cuevas, Alejandro, López-Alonso, Victoria, López-Campos, Guillermo, de la Calle, Guillermo, de la Iglesia, Diana, Pérez-Rey, David, Crespo, José, Martín-Sánchez, Fernando, Maojo, Víctor
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923139/
https://www.ncbi.nlm.nih.gov/pubmed/20682041
http://dx.doi.org/10.1186/1471-2105-11-410
_version_ 1782185483029184512
author García-Remesal, Miguel
Cuevas, Alejandro
López-Alonso, Victoria
López-Campos, Guillermo
de la Calle, Guillermo
de la Iglesia, Diana
Pérez-Rey, David
Crespo, José
Martín-Sánchez, Fernando
Maojo, Víctor
author_facet García-Remesal, Miguel
Cuevas, Alejandro
López-Alonso, Victoria
López-Campos, Guillermo
de la Calle, Guillermo
de la Iglesia, Diana
Pérez-Rey, David
Crespo, José
Martín-Sánchez, Fernando
Maojo, Víctor
author_sort García-Remesal, Miguel
collection PubMed
description BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
format Text
id pubmed-2923139
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29231392010-08-18 A method for automatically extracting infectious disease-related primers and probes from the literature García-Remesal, Miguel Cuevas, Alejandro López-Alonso, Victoria López-Campos, Guillermo de la Calle, Guillermo de la Iglesia, Diana Pérez-Rey, David Crespo, José Martín-Sánchez, Fernando Maojo, Víctor BMC Bioinformatics Methodology Article BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. BioMed Central 2010-08-03 /pmc/articles/PMC2923139/ /pubmed/20682041 http://dx.doi.org/10.1186/1471-2105-11-410 Text en Copyright ©2010 García-Remesal et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
García-Remesal, Miguel
Cuevas, Alejandro
López-Alonso, Victoria
López-Campos, Guillermo
de la Calle, Guillermo
de la Iglesia, Diana
Pérez-Rey, David
Crespo, José
Martín-Sánchez, Fernando
Maojo, Víctor
A method for automatically extracting infectious disease-related primers and probes from the literature
title A method for automatically extracting infectious disease-related primers and probes from the literature
title_full A method for automatically extracting infectious disease-related primers and probes from the literature
title_fullStr A method for automatically extracting infectious disease-related primers and probes from the literature
title_full_unstemmed A method for automatically extracting infectious disease-related primers and probes from the literature
title_short A method for automatically extracting infectious disease-related primers and probes from the literature
title_sort method for automatically extracting infectious disease-related primers and probes from the literature
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923139/
https://www.ncbi.nlm.nih.gov/pubmed/20682041
http://dx.doi.org/10.1186/1471-2105-11-410
work_keys_str_mv AT garciaremesalmiguel amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT cuevasalejandro amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT lopezalonsovictoria amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT lopezcamposguillermo amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT delacalleguillermo amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT delaiglesiadiana amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT perezreydavid amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT crespojose amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT martinsanchezfernando amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT maojovictor amethodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT garciaremesalmiguel methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT cuevasalejandro methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT lopezalonsovictoria methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT lopezcamposguillermo methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT delacalleguillermo methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT delaiglesiadiana methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT perezreydavid methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT crespojose methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT martinsanchezfernando methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature
AT maojovictor methodforautomaticallyextractinginfectiousdiseaserelatedprimersandprobesfromtheliterature