Cargando…
A Novel Algorithm for Finding Interspersed Repeat Regions
The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exh...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5172473/ https://www.ncbi.nlm.nih.gov/pubmed/15862119 http://dx.doi.org/10.1016/S1672-0229(04)02024-8 |
_version_ | 1782484134805897216 |
---|---|
author | Li, Dongdong Wang, Zhengzhi Ni, Qingshan |
author_facet | Li, Dongdong Wang, Zhengzhi Ni, Qingshan |
author_sort | Li, Dongdong |
collection | PubMed |
description | The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%. |
format | Online Article Text |
id | pubmed-5172473 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-51724732016-12-23 A Novel Algorithm for Finding Interspersed Repeat Regions Li, Dongdong Wang, Zhengzhi Ni, Qingshan Genomics Proteomics Bioinformatics Letter The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%. Elsevier 2004-08 2016-11-28 /pmc/articles/PMC5172473/ /pubmed/15862119 http://dx.doi.org/10.1016/S1672-0229(04)02024-8 Text en . http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Letter Li, Dongdong Wang, Zhengzhi Ni, Qingshan A Novel Algorithm for Finding Interspersed Repeat Regions |
title | A Novel Algorithm for Finding Interspersed Repeat Regions |
title_full | A Novel Algorithm for Finding Interspersed Repeat Regions |
title_fullStr | A Novel Algorithm for Finding Interspersed Repeat Regions |
title_full_unstemmed | A Novel Algorithm for Finding Interspersed Repeat Regions |
title_short | A Novel Algorithm for Finding Interspersed Repeat Regions |
title_sort | novel algorithm for finding interspersed repeat regions |
topic | Letter |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5172473/ https://www.ncbi.nlm.nih.gov/pubmed/15862119 http://dx.doi.org/10.1016/S1672-0229(04)02024-8 |
work_keys_str_mv | AT lidongdong anovelalgorithmforfindinginterspersedrepeatregions AT wangzhengzhi anovelalgorithmforfindinginterspersedrepeatregions AT niqingshan anovelalgorithmforfindinginterspersedrepeatregions AT lidongdong novelalgorithmforfindinginterspersedrepeatregions AT wangzhengzhi novelalgorithmforfindinginterspersedrepeatregions AT niqingshan novelalgorithmforfindinginterspersedrepeatregions |