Cargando…

Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome

MOTIVATION: As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragment...

Descripción completa

Detalles Bibliográficos
Autores principales: Piližota, Ivana, Train, Clément-Marie, Altenhoff, Adrian, Redestig, Henning, Dessimoz, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449756/
https://www.ncbi.nlm.nih.gov/pubmed/30184069
http://dx.doi.org/10.1093/bioinformatics/bty772
_version_ 1783408916691943424
author Piližota, Ivana
Train, Clément-Marie
Altenhoff, Adrian
Redestig, Henning
Dessimoz, Christophe
author_facet Piližota, Ivana
Train, Clément-Marie
Altenhoff, Adrian
Redestig, Henning
Dessimoz, Christophe
author_sort Piližota, Ivana
collection PubMed
description MOTIVATION: As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs. RESULTS: In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations. AVAILABILITY AND IMPLEMENTATION: An open source software tool is available at https://github.com/DessimozLab/esprit2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6449756
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64497562019-04-09 Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome Piližota, Ivana Train, Clément-Marie Altenhoff, Adrian Redestig, Henning Dessimoz, Christophe Bioinformatics Original Papers MOTIVATION: As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs. RESULTS: In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations. AVAILABILITY AND IMPLEMENTATION: An open source software tool is available at https://github.com/DessimozLab/esprit2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-01 2018-09-01 /pmc/articles/PMC6449756/ /pubmed/30184069 http://dx.doi.org/10.1093/bioinformatics/bty772 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Piližota, Ivana
Train, Clément-Marie
Altenhoff, Adrian
Redestig, Henning
Dessimoz, Christophe
Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title_full Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title_fullStr Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title_full_unstemmed Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title_short Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
title_sort phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449756/
https://www.ncbi.nlm.nih.gov/pubmed/30184069
http://dx.doi.org/10.1093/bioinformatics/bty772
work_keys_str_mv AT pilizotaivana phylogeneticapproachestoidentifyingfragmentsofthesamegenewithapplicationtothewheatgenome
AT trainclementmarie phylogeneticapproachestoidentifyingfragmentsofthesamegenewithapplicationtothewheatgenome
AT altenhoffadrian phylogeneticapproachestoidentifyingfragmentsofthesamegenewithapplicationtothewheatgenome
AT redestighenning phylogeneticapproachestoidentifyingfragmentsofthesamegenewithapplicationtothewheatgenome
AT dessimozchristophe phylogeneticapproachestoidentifyingfragmentsofthesamegenewithapplicationtothewheatgenome