Cargando…

Using multiple reference genomes to identify and resolve annotation inconsistencies

BACKGROUND: Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene...

Descripción completa

Detalles Bibliográficos
Autores principales:	Monnahan, Patrick J., Michno, Jean-Michel, O’Connor, Christine, Brohammer, Alex B., Springer, Nathan M., McGaugh, Suzanne E., Hirsch, Candice N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7140576/ https://www.ncbi.nlm.nih.gov/pubmed/32264824 http://dx.doi.org/10.1186/s12864-020-6696-8

_version_	1783519022708424704
author	Monnahan, Patrick J. Michno, Jean-Michel O’Connor, Christine Brohammer, Alex B. Springer, Nathan M. McGaugh, Suzanne E. Hirsch, Candice N.
author_facet	Monnahan, Patrick J. Michno, Jean-Michel O’Connor, Christine Brohammer, Alex B. Springer, Nathan M. McGaugh, Suzanne E. Hirsch, Candice N.
author_sort	Monnahan, Patrick J.
collection	PubMed
description	BACKGROUND: Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. RESULTS: We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3–5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. CONCLUSIONS: Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses.
format	Online Article Text
id	pubmed-7140576
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71405762020-04-14 Using multiple reference genomes to identify and resolve annotation inconsistencies Monnahan, Patrick J. Michno, Jean-Michel O’Connor, Christine Brohammer, Alex B. Springer, Nathan M. McGaugh, Suzanne E. Hirsch, Candice N. BMC Genomics Methodology Article BACKGROUND: Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. RESULTS: We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3–5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. CONCLUSIONS: Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses. BioMed Central 2020-04-08 /pmc/articles/PMC7140576/ /pubmed/32264824 http://dx.doi.org/10.1186/s12864-020-6696-8 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Monnahan, Patrick J. Michno, Jean-Michel O’Connor, Christine Brohammer, Alex B. Springer, Nathan M. McGaugh, Suzanne E. Hirsch, Candice N. Using multiple reference genomes to identify and resolve annotation inconsistencies
title	Using multiple reference genomes to identify and resolve annotation inconsistencies
title_full	Using multiple reference genomes to identify and resolve annotation inconsistencies
title_fullStr	Using multiple reference genomes to identify and resolve annotation inconsistencies
title_full_unstemmed	Using multiple reference genomes to identify and resolve annotation inconsistencies
title_short	Using multiple reference genomes to identify and resolve annotation inconsistencies
title_sort	using multiple reference genomes to identify and resolve annotation inconsistencies
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7140576/ https://www.ncbi.nlm.nih.gov/pubmed/32264824 http://dx.doi.org/10.1186/s12864-020-6696-8
work_keys_str_mv	AT monnahanpatrickj usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT michnojeanmichel usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT oconnorchristine usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT brohammeralexb usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT springernathanm usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT mcgaughsuzannee usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies AT hirschcandicen usingmultiplereferencegenomestoidentifyandresolveannotationinconsistencies

Using multiple reference genomes to identify and resolve annotation inconsistencies

Ejemplares similares