Cargando…

Assessing De Novo transcriptome assembly metrics for consistency and utility

BACKGROUND: Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality. RESULTS: We simula...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Neil, Shawn T, Emrich, Scott J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733778/
https://www.ncbi.nlm.nih.gov/pubmed/23837739
http://dx.doi.org/10.1186/1471-2164-14-465
_version_ 1782279406525349888
author O’Neil, Shawn T
Emrich, Scott J
author_facet O’Neil, Shawn T
Emrich, Scott J
author_sort O’Neil, Shawn T
collection PubMed
description BACKGROUND: Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality. RESULTS: We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a “perfect” and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased. Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways. CONCLUSIONS: Although much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.
format Online
Article
Text
id pubmed-3733778
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37337782013-08-06 Assessing De Novo transcriptome assembly metrics for consistency and utility O’Neil, Shawn T Emrich, Scott J BMC Genomics Research Article BACKGROUND: Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality. RESULTS: We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a “perfect” and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased. Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways. CONCLUSIONS: Although much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies. BioMed Central 2013-07-09 /pmc/articles/PMC3733778/ /pubmed/23837739 http://dx.doi.org/10.1186/1471-2164-14-465 Text en Copyright © 2013 O’Neil and Emrich; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
O’Neil, Shawn T
Emrich, Scott J
Assessing De Novo transcriptome assembly metrics for consistency and utility
title Assessing De Novo transcriptome assembly metrics for consistency and utility
title_full Assessing De Novo transcriptome assembly metrics for consistency and utility
title_fullStr Assessing De Novo transcriptome assembly metrics for consistency and utility
title_full_unstemmed Assessing De Novo transcriptome assembly metrics for consistency and utility
title_short Assessing De Novo transcriptome assembly metrics for consistency and utility
title_sort assessing de novo transcriptome assembly metrics for consistency and utility
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733778/
https://www.ncbi.nlm.nih.gov/pubmed/23837739
http://dx.doi.org/10.1186/1471-2164-14-465
work_keys_str_mv AT oneilshawnt assessingdenovotranscriptomeassemblymetricsforconsistencyandutility
AT emrichscottj assessingdenovotranscriptomeassemblymetricsforconsistencyandutility