Cargando…

Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison

Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper prop...

Descripción completa

Detalles Bibliográficos
Autores principales: Dai, Qi, Li, Lihua, Liu, Xiaoqing, Yao, Yuhua, Zhao, Fukun, Zhang, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213098/
https://www.ncbi.nlm.nih.gov/pubmed/22102867
http://dx.doi.org/10.1371/journal.pone.0026779
_version_ 1782216078668070912
author Dai, Qi
Li, Lihua
Liu, Xiaoqing
Yao, Yuhua
Zhao, Fukun
Zhang, Michael
author_facet Dai, Qi
Li, Lihua
Liu, Xiaoqing
Yao, Yuhua
Zhao, Fukun
Zhang, Michael
author_sort Dai, Qi
collection PubMed
description Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models.
format Online
Article
Text
id pubmed-3213098
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32130982011-11-18 Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison Dai, Qi Li, Lihua Liu, Xiaoqing Yao, Yuhua Zhao, Fukun Zhang, Michael PLoS One Research Article Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models. Public Library of Science 2011-11-10 /pmc/articles/PMC3213098/ /pubmed/22102867 http://dx.doi.org/10.1371/journal.pone.0026779 Text en Dai et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dai, Qi
Li, Lihua
Liu, Xiaoqing
Yao, Yuhua
Zhao, Fukun
Zhang, Michael
Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title_full Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title_fullStr Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title_full_unstemmed Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title_short Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison
title_sort integrating overlapping structures and background information of words significantly improves biological sequence comparison
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213098/
https://www.ncbi.nlm.nih.gov/pubmed/22102867
http://dx.doi.org/10.1371/journal.pone.0026779
work_keys_str_mv AT daiqi integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison
AT lilihua integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison
AT liuxiaoqing integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison
AT yaoyuhua integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison
AT zhaofukun integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison
AT zhangmichael integratingoverlappingstructuresandbackgroundinformationofwordssignificantlyimprovesbiologicalsequencecomparison