Cargando…

GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations

BACKGROUND: Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types....

Descripción completa

Detalles Bibliográficos
Autores principales: Manchanda, Nancy, Portwood, John L., Woodhouse, Margaret R., Seetharam, Arun S., Lawrence-Dill, Carolyn J., Andorf, Carson M., Hufford, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7053122/
https://www.ncbi.nlm.nih.gov/pubmed/32122303
http://dx.doi.org/10.1186/s12864-020-6568-2
_version_ 1783502978896887808
author Manchanda, Nancy
Portwood, John L.
Woodhouse, Margaret R.
Seetharam, Arun S.
Lawrence-Dill, Carolyn J.
Andorf, Carson M.
Hufford, Matthew B.
author_facet Manchanda, Nancy
Portwood, John L.
Woodhouse, Margaret R.
Seetharam, Arun S.
Lawrence-Dill, Carolyn J.
Andorf, Carson M.
Hufford, Matthew B.
author_sort Manchanda, Nancy
collection PubMed
description BACKGROUND: Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types. Quality metrics for genome assemblies gauge both the completeness and contiguity of an assembly and help provide confidence in downstream biological insights. To compare quality across multiple assemblies, a set of common metrics are typically calculated and then compared to one or more gold standard reference genomes. While several tools exist for calculating individual metrics, applications providing comprehensive evaluations of multiple assembly features are, perhaps surprisingly, lacking. Here, we describe a new toolkit that integrates multiple metrics to characterize both assembly and gene annotation quality in a way that enables comparison across multiple assemblies and assembly types. RESULTS: Our application, named GenomeQC, is an easy-to-use and interactive web framework that integrates various quantitative measures to characterize genome assemblies and annotations. GenomeQC provides researchers with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. CONCLUSIONS: The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All source code and a containerized version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC.
format Online
Article
Text
id pubmed-7053122
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70531222020-03-10 GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations Manchanda, Nancy Portwood, John L. Woodhouse, Margaret R. Seetharam, Arun S. Lawrence-Dill, Carolyn J. Andorf, Carson M. Hufford, Matthew B. BMC Genomics Software BACKGROUND: Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types. Quality metrics for genome assemblies gauge both the completeness and contiguity of an assembly and help provide confidence in downstream biological insights. To compare quality across multiple assemblies, a set of common metrics are typically calculated and then compared to one or more gold standard reference genomes. While several tools exist for calculating individual metrics, applications providing comprehensive evaluations of multiple assembly features are, perhaps surprisingly, lacking. Here, we describe a new toolkit that integrates multiple metrics to characterize both assembly and gene annotation quality in a way that enables comparison across multiple assemblies and assembly types. RESULTS: Our application, named GenomeQC, is an easy-to-use and interactive web framework that integrates various quantitative measures to characterize genome assemblies and annotations. GenomeQC provides researchers with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. CONCLUSIONS: The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All source code and a containerized version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC. BioMed Central 2020-03-02 /pmc/articles/PMC7053122/ /pubmed/32122303 http://dx.doi.org/10.1186/s12864-020-6568-2 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Manchanda, Nancy
Portwood, John L.
Woodhouse, Margaret R.
Seetharam, Arun S.
Lawrence-Dill, Carolyn J.
Andorf, Carson M.
Hufford, Matthew B.
GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title_full GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title_fullStr GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title_full_unstemmed GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title_short GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations
title_sort genomeqc: a quality assessment tool for genome assemblies and gene structure annotations
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7053122/
https://www.ncbi.nlm.nih.gov/pubmed/32122303
http://dx.doi.org/10.1186/s12864-020-6568-2
work_keys_str_mv AT manchandanancy genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT portwoodjohnl genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT woodhousemargaretr genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT seetharamaruns genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT lawrencedillcarolynj genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT andorfcarsonm genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations
AT huffordmatthewb genomeqcaqualityassessmenttoolforgenomeassembliesandgenestructureannotations