Cargando…

A clustering method for repeat analysis in DNA sequences

BACKGROUND: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. RESULTS: The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Volfovsky, Natalia, Haas, Brian J, Salzberg, Steven L
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2001
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55324/ https://www.ncbi.nlm.nih.gov/pubmed/11532211

_version_	1782120030482202624
author	Volfovsky, Natalia Haas, Brian J Salzberg, Steven L
author_facet	Volfovsky, Natalia Haas, Brian J Salzberg, Steven L
author_sort	Volfovsky, Natalia
collection	PubMed
description	BACKGROUND: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. RESULTS: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. CONCLUSIONS: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.
format	Text
id	pubmed-55324
institution	National Center for Biotechnology Information
language	English
publishDate	2001
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-553242001-09-10 A clustering method for repeat analysis in DNA sequences Volfovsky, Natalia Haas, Brian J Salzberg, Steven L Genome Biol Research BACKGROUND: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. RESULTS: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. CONCLUSIONS: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences. BioMed Central 2001 2001-08-01 /pmc/articles/PMC55324/ /pubmed/11532211 Text en Copyright © 2001 Volfovsky et al., licensee BioMed Central Ltd
spellingShingle	Research Volfovsky, Natalia Haas, Brian J Salzberg, Steven L A clustering method for repeat analysis in DNA sequences
title	A clustering method for repeat analysis in DNA sequences
title_full	A clustering method for repeat analysis in DNA sequences
title_fullStr	A clustering method for repeat analysis in DNA sequences
title_full_unstemmed	A clustering method for repeat analysis in DNA sequences
title_short	A clustering method for repeat analysis in DNA sequences
title_sort	clustering method for repeat analysis in dna sequences
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55324/ https://www.ncbi.nlm.nih.gov/pubmed/11532211
work_keys_str_mv	AT volfovskynatalia aclusteringmethodforrepeatanalysisindnasequences AT haasbrianj aclusteringmethodforrepeatanalysisindnasequences AT salzbergstevenl aclusteringmethodforrepeatanalysisindnasequences AT volfovskynatalia clusteringmethodforrepeatanalysisindnasequences AT haasbrianj clusteringmethodforrepeatanalysisindnasequences AT salzbergstevenl clusteringmethodforrepeatanalysisindnasequences

A clustering method for repeat analysis in DNA sequences

Ejemplares similares