Cargando…

AGOUTI: improving genome assembly and annotation using transcriptome data

BACKGROUND: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Simo V., Zhuo, Luting, Hahn, Matthew W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4952227/
https://www.ncbi.nlm.nih.gov/pubmed/27435057
http://dx.doi.org/10.1186/s13742-016-0136-3
_version_ 1782443781170135040
author Zhang, Simo V.
Zhuo, Luting
Hahn, Matthew W.
author_facet Zhang, Simo V.
Zhuo, Luting
Hahn, Matthew W.
author_sort Zhang, Simo V.
collection PubMed
description BACKGROUND: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species. RESULTS: We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods. CONCLUSION: AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by hundreds or thousands. The software is available free of charge under the MIT license. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0136-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4952227
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49522272016-07-21 AGOUTI: improving genome assembly and annotation using transcriptome data Zhang, Simo V. Zhuo, Luting Hahn, Matthew W. Gigascience Technical Note BACKGROUND: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species. RESULTS: We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods. CONCLUSION: AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by hundreds or thousands. The software is available free of charge under the MIT license. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0136-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-07-19 /pmc/articles/PMC4952227/ /pubmed/27435057 http://dx.doi.org/10.1186/s13742-016-0136-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Zhang, Simo V.
Zhuo, Luting
Hahn, Matthew W.
AGOUTI: improving genome assembly and annotation using transcriptome data
title AGOUTI: improving genome assembly and annotation using transcriptome data
title_full AGOUTI: improving genome assembly and annotation using transcriptome data
title_fullStr AGOUTI: improving genome assembly and annotation using transcriptome data
title_full_unstemmed AGOUTI: improving genome assembly and annotation using transcriptome data
title_short AGOUTI: improving genome assembly and annotation using transcriptome data
title_sort agouti: improving genome assembly and annotation using transcriptome data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4952227/
https://www.ncbi.nlm.nih.gov/pubmed/27435057
http://dx.doi.org/10.1186/s13742-016-0136-3
work_keys_str_mv AT zhangsimov agoutiimprovinggenomeassemblyandannotationusingtranscriptomedata
AT zhuoluting agoutiimprovinggenomeassemblyandannotationusingtranscriptomedata
AT hahnmattheww agoutiimprovinggenomeassemblyandannotationusingtranscriptomedata