Cargando…

Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

BACKGROUND: Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also...

Descripción completa

Detalles Bibliográficos
Autores principales:	DeBarry, Jeremy D, Liu, Renyi, Bennetzen, Jeffrey L
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2412881/ https://www.ncbi.nlm.nih.gov/pubmed/18474116 http://dx.doi.org/10.1186/1471-2105-9-235

_version_	1782155987444039680
author	DeBarry, Jeremy D Liu, Renyi Bennetzen, Jeffrey L
author_facet	DeBarry, Jeremy D Liu, Renyi Bennetzen, Jeffrey L
author_sort	DeBarry, Jeremy D
collection	PubMed
description	BACKGROUND: Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes. RESULTS: In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate. CONCLUSION: These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.
format	Text
id	pubmed-2412881
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24128812008-06-05 Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm DeBarry, Jeremy D Liu, Renyi Bennetzen, Jeffrey L BMC Bioinformatics Software BACKGROUND: Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes. RESULTS: In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate. CONCLUSION: These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components. BioMed Central 2008-05-13 /pmc/articles/PMC2412881/ /pubmed/18474116 http://dx.doi.org/10.1186/1471-2105-9-235 Text en Copyright © 2008 DeBarry et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software DeBarry, Jeremy D Liu, Renyi Bennetzen, Jeffrey L Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title	Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title_full	Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title_fullStr	Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title_full_unstemmed	Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title_short	Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm
title_sort	discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the assisted automated assembler of repeat families (aaarf) algorithm
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2412881/ https://www.ncbi.nlm.nih.gov/pubmed/18474116 http://dx.doi.org/10.1186/1471-2105-9-235
work_keys_str_mv	AT debarryjeremyd discoveryandassemblyofrepeatfamilypseudomoleculesfromsparsegenomicsequencedatausingtheassistedautomatedassemblerofrepeatfamiliesaaarfalgorithm AT liurenyi discoveryandassemblyofrepeatfamilypseudomoleculesfromsparsegenomicsequencedatausingtheassistedautomatedassemblerofrepeatfamiliesaaarfalgorithm AT bennetzenjeffreyl discoveryandassemblyofrepeatfamilypseudomoleculesfromsparsegenomicsequencedatausingtheassistedautomatedassemblerofrepeatfamiliesaaarfalgorithm

Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

Ejemplares similares