Cargando…
Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species
The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alt...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5306281/ https://www.ncbi.nlm.nih.gov/pubmed/28261241 http://dx.doi.org/10.3389/fpls.2017.00184 |
_version_ | 1782507168512081920 |
---|---|
author | Contreras-Moreira, Bruno Cantalapiedra, Carlos P. García-Pereira, María J. Gordon, Sean P. Vogel, John P. Igartua, Ernesto Casas, Ana M. Vinuesa, Pablo |
author_facet | Contreras-Moreira, Bruno Cantalapiedra, Carlos P. García-Pereira, María J. Gordon, Sean P. Vogel, John P. Igartua, Ernesto Casas, Ana M. Vinuesa, Pablo |
author_sort | Contreras-Moreira, Bruno |
collection | PubMed |
description | The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity. |
format | Online Article Text |
id | pubmed-5306281 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-53062812017-03-03 Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species Contreras-Moreira, Bruno Cantalapiedra, Carlos P. García-Pereira, María J. Gordon, Sean P. Vogel, John P. Igartua, Ernesto Casas, Ana M. Vinuesa, Pablo Front Plant Sci Plant Science The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity. Frontiers Media S.A. 2017-02-14 /pmc/articles/PMC5306281/ /pubmed/28261241 http://dx.doi.org/10.3389/fpls.2017.00184 Text en Copyright © 2017 Contreras-Moreira, Cantalapiedra, García-Pereira, Gordon, Vogel, Igartua, Casas and Vinuesa. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Contreras-Moreira, Bruno Cantalapiedra, Carlos P. García-Pereira, María J. Gordon, Sean P. Vogel, John P. Igartua, Ernesto Casas, Ana M. Vinuesa, Pablo Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title | Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title_full | Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title_fullStr | Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title_full_unstemmed | Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title_short | Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species |
title_sort | analysis of plant pan-genomes and transcriptomes with get_homologues-est, a clustering solution for sequences of the same species |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5306281/ https://www.ncbi.nlm.nih.gov/pubmed/28261241 http://dx.doi.org/10.3389/fpls.2017.00184 |
work_keys_str_mv | AT contrerasmoreirabruno analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT cantalapiedracarlosp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT garciapereiramariaj analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT gordonseanp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT vogeljohnp analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT igartuaernesto analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT casasanam analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies AT vinuesapablo analysisofplantpangenomesandtranscriptomeswithgethomologuesestaclusteringsolutionforsequencesofthesamespecies |