Cargando…

Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species

The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alt...

Descripción completa

Detalles Bibliográficos
Autores principales: Contreras-Moreira, Bruno, Cantalapiedra, Carlos P., García-Pereira, María J., Gordon, Sean P., Vogel, John P., Igartua, Ernesto, Casas, Ana M., Vinuesa, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5306281/
https://www.ncbi.nlm.nih.gov/pubmed/28261241
http://dx.doi.org/10.3389/fpls.2017.00184
_version_ 1782507168512081920
author Contreras-Moreira, Bruno
Cantalapiedra, Carlos P.
García-Pereira, María J.
Gordon, Sean P.
Vogel, John P.
Igartua, Ernesto
Casas, Ana M.
Vinuesa, Pablo
author_facet Contreras-Moreira, Bruno
Cantalapiedra, Carlos P.
García-Pereira, María J.
Gordon, Sean P.
Vogel, John P.
Igartua, Ernesto
Casas, Ana M.
Vinuesa, Pablo
author_sort Contreras-Moreira, Bruno
collection PubMed
description The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.
format Online
Article
Text
id pubmed-5306281
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-53062812017-03-03 Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species Contreras-Moreira, Bruno Cantalapiedra, Carlos P. García-Pereira, María J. Gordon, Sean P. Vogel, John P. Igartua, Ernesto Casas, Ana M. Vinuesa, Pablo Front Plant Sci Plant Science The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity. Frontiers Media S.A. 2017-02-14 /pmc/articles/PMC5306281/ /pubmed/28261241 http://dx.doi.org/10.3389/fpls.2017.00184 Text en Copyright © 2017 Contreras-Moreira, Cantalapiedra, García-Pereira, Gordon, Vogel, Igartua, Casas and Vinuesa. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Contreras-Moreira, Bruno
Cantalapiedra, Carlos P.
García-Pereira, María J.
Gordon, Sean P.
Vogel, John P.
Igartua, Ernesto
Casas, Ana M.
Vinuesa, Pablo
Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title_full Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title_fullStr Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title_full_unstemmed Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title_short Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
title_sort analysis of plant pan-genomes and transcriptomes with get_homologues-est, a clustering solution for sequences of the same species
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5306281/
https://www.ncbi.nlm.nih.gov/pubmed/28261241
http://dx.doi.org/10.3389/fpls.2017.00184
work_keys_str_mv AT contrerasmoreirabruno analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT cantalapiedracarlosp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT garciapereiramariaj analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT gordonseanp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT vogeljohnp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT igartuaernesto analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT casasanam analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies
AT vinuesapablo analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies