Cargando…

Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes

PREMISE: Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vuruputoor, Vidya S., Monyak, Daniel, Fetter, Karl C., Webster, Cynthia, Bhattarai, Akriti, Shrestha, Bikash, Zaman, Sumaira, Bennett, Jeremy, McEvoy, Susan L., Caballero, Madison, Wegrzyn, Jill L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2023
Materias:	Application Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10439824/ https://www.ncbi.nlm.nih.gov/pubmed/37601314 http://dx.doi.org/10.1002/aps3.11533

_version_	1785093036245516288
author	Vuruputoor, Vidya S. Monyak, Daniel Fetter, Karl C. Webster, Cynthia Bhattarai, Akriti Shrestha, Bikash Zaman, Sumaira Bennett, Jeremy McEvoy, Susan L. Caballero, Madison Wegrzyn, Jill L.
author_facet	Vuruputoor, Vidya S. Monyak, Daniel Fetter, Karl C. Webster, Cynthia Bhattarai, Akriti Shrestha, Bikash Zaman, Sumaira Bennett, Jeremy McEvoy, Susan L. Caballero, Madison Wegrzyn, Jill L.
author_sort	Vuruputoor, Vidya S.
collection	PubMed
description	PREMISE: Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein‐coding gene predictions. METHODS: The impact of repeat masking, long‐read and short‐read inputs, and de novo and genome‐guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. RESULTS: Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono‐exonic/multi‐exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA‐read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence‐based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome‐guided transcriptome assemblies, or full‐length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post‐processing with functional and structural filters is highly recommended. DISCUSSION: While the annotation of non‐model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
format	Online Article Text
id	pubmed-10439824
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-104398242023-08-20 Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes Vuruputoor, Vidya S. Monyak, Daniel Fetter, Karl C. Webster, Cynthia Bhattarai, Akriti Shrestha, Bikash Zaman, Sumaira Bennett, Jeremy McEvoy, Susan L. Caballero, Madison Wegrzyn, Jill L. Appl Plant Sci Application Article PREMISE: Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein‐coding gene predictions. METHODS: The impact of repeat masking, long‐read and short‐read inputs, and de novo and genome‐guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. RESULTS: Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono‐exonic/multi‐exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA‐read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence‐based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome‐guided transcriptome assemblies, or full‐length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post‐processing with functional and structural filters is highly recommended. DISCUSSION: While the annotation of non‐model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions. John Wiley and Sons Inc. 2023-08-08 /pmc/articles/PMC10439824/ /pubmed/37601314 http://dx.doi.org/10.1002/aps3.11533 Text en © 2023 The Authors. Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Application Article Vuruputoor, Vidya S. Monyak, Daniel Fetter, Karl C. Webster, Cynthia Bhattarai, Akriti Shrestha, Bikash Zaman, Sumaira Bennett, Jeremy McEvoy, Susan L. Caballero, Madison Wegrzyn, Jill L. Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title	Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title_full	Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title_fullStr	Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title_full_unstemmed	Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title_short	Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes
title_sort	welcome to the big leaves: best practices for improving genome annotation in non‐model plant genomes
topic	Application Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10439824/ https://www.ncbi.nlm.nih.gov/pubmed/37601314 http://dx.doi.org/10.1002/aps3.11533
work_keys_str_mv	AT vuruputoorvidyas welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT monyakdaniel welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT fetterkarlc welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT webstercynthia welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT bhattaraiakriti welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT shresthabikash welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT zamansumaira welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT bennettjeremy welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT mcevoysusanl welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT caballeromadison welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes AT wegrzynjilll welcometothebigleavesbestpracticesforimprovinggenomeannotationinnonmodelplantgenomes

Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes

Ejemplares similares