Cargando…

The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes

BACKGROUND: High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of ab initio prediction programs as well as homology-based searches. Most...

Descripción completa

Detalles Bibliográficos
Autores principales: Estill, James C, Bennetzen, Jeffrey L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705364/
https://www.ncbi.nlm.nih.gov/pubmed/19545381
http://dx.doi.org/10.1186/1746-4811-5-8
_version_ 1782168983269539840
author Estill, James C
Bennetzen, Jeffrey L
author_facet Estill, James C
Bennetzen, Jeffrey L
author_sort Estill, James C
collection PubMed
description BACKGROUND: High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of ab initio prediction programs as well as homology-based searches. Most of these programs operate on a single contiguous sequence at a time, and the results are generated in a diverse array of readable formats that must be translated to a standardized file format. These translated results must then be concatenated into a single source, and then presented in an integrated form for human curation. RESULTS: We have designed, implemented, and assessed a Perl-based workflow named DAWGPAWS for the generation of computational results for human curation of the genes and transposable elements in plant genomes. The use of DAWGPAWS was found to accelerate annotation of 80–200 kb wheat DNA inserts in bacterial artificial chromosome (BAC) vectors by approximately twenty-fold and to also significantly improve the quality of the annotation in terms of completeness and accuracy. CONCLUSION: The DAWGPAWS genome annotation pipeline fills an important need in the annotation of plant genomes by generating computational evidences in a high throughput manner, translating these results to a common file format, and facilitating the human curation of these computational results. We have verified the value of DAWGPAWS by using this pipeline to annotate the genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome (Triticum aestivum L.). DAWGPAWS can be applied to annotation efforts in other plant genomes with minor modifications of program-specific configuration files, and the modular design of the workflow facilitates integration into existing pipelines.
format Text
id pubmed-2705364
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27053642009-07-03 The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes Estill, James C Bennetzen, Jeffrey L Plant Methods Software BACKGROUND: High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of ab initio prediction programs as well as homology-based searches. Most of these programs operate on a single contiguous sequence at a time, and the results are generated in a diverse array of readable formats that must be translated to a standardized file format. These translated results must then be concatenated into a single source, and then presented in an integrated form for human curation. RESULTS: We have designed, implemented, and assessed a Perl-based workflow named DAWGPAWS for the generation of computational results for human curation of the genes and transposable elements in plant genomes. The use of DAWGPAWS was found to accelerate annotation of 80–200 kb wheat DNA inserts in bacterial artificial chromosome (BAC) vectors by approximately twenty-fold and to also significantly improve the quality of the annotation in terms of completeness and accuracy. CONCLUSION: The DAWGPAWS genome annotation pipeline fills an important need in the annotation of plant genomes by generating computational evidences in a high throughput manner, translating these results to a common file format, and facilitating the human curation of these computational results. We have verified the value of DAWGPAWS by using this pipeline to annotate the genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome (Triticum aestivum L.). DAWGPAWS can be applied to annotation efforts in other plant genomes with minor modifications of program-specific configuration files, and the modular design of the workflow facilitates integration into existing pipelines. BioMed Central 2009-06-19 /pmc/articles/PMC2705364/ /pubmed/19545381 http://dx.doi.org/10.1186/1746-4811-5-8 Text en Copyright © 2009 Estill and Bennetzen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Estill, James C
Bennetzen, Jeffrey L
The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title_full The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title_fullStr The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title_full_unstemmed The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title_short The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes
title_sort dawgpaws pipeline for the annotation of genes and transposable elements in plant genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705364/
https://www.ncbi.nlm.nih.gov/pubmed/19545381
http://dx.doi.org/10.1186/1746-4811-5-8
work_keys_str_mv AT estilljamesc thedawgpawspipelinefortheannotationofgenesandtransposableelementsinplantgenomes
AT bennetzenjeffreyl thedawgpawspipelinefortheannotationofgenesandtransposableelementsinplantgenomes
AT estilljamesc dawgpawspipelinefortheannotationofgenesandtransposableelementsinplantgenomes
AT bennetzenjeffreyl dawgpawspipelinefortheannotationofgenesandtransposableelementsinplantgenomes