Cargando…

A Novel Algorithm for Finding Interspersed Repeat Regions

The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exh...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Dongdong, Wang, Zhengzhi, Ni, Qingshan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5172473/
https://www.ncbi.nlm.nih.gov/pubmed/15862119
http://dx.doi.org/10.1016/S1672-0229(04)02024-8
_version_ 1782484134805897216
author Li, Dongdong
Wang, Zhengzhi
Ni, Qingshan
author_facet Li, Dongdong
Wang, Zhengzhi
Ni, Qingshan
author_sort Li, Dongdong
collection PubMed
description The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.
format Online
Article
Text
id pubmed-5172473
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-51724732016-12-23 A Novel Algorithm for Finding Interspersed Repeat Regions Li, Dongdong Wang, Zhengzhi Ni, Qingshan Genomics Proteomics Bioinformatics Letter The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%. Elsevier 2004-08 2016-11-28 /pmc/articles/PMC5172473/ /pubmed/15862119 http://dx.doi.org/10.1016/S1672-0229(04)02024-8 Text en . http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Letter
Li, Dongdong
Wang, Zhengzhi
Ni, Qingshan
A Novel Algorithm for Finding Interspersed Repeat Regions
title A Novel Algorithm for Finding Interspersed Repeat Regions
title_full A Novel Algorithm for Finding Interspersed Repeat Regions
title_fullStr A Novel Algorithm for Finding Interspersed Repeat Regions
title_full_unstemmed A Novel Algorithm for Finding Interspersed Repeat Regions
title_short A Novel Algorithm for Finding Interspersed Repeat Regions
title_sort novel algorithm for finding interspersed repeat regions
topic Letter
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5172473/
https://www.ncbi.nlm.nih.gov/pubmed/15862119
http://dx.doi.org/10.1016/S1672-0229(04)02024-8
work_keys_str_mv AT lidongdong anovelalgorithmforfindinginterspersedrepeatregions
AT wangzhengzhi anovelalgorithmforfindinginterspersedrepeatregions
AT niqingshan anovelalgorithmforfindinginterspersedrepeatregions
AT lidongdong novelalgorithmforfindinginterspersedrepeatregions
AT wangzhengzhi novelalgorithmforfindinginterspersedrepeatregions
AT niqingshan novelalgorithmforfindinginterspersedrepeatregions