Cargando…

Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris

Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in...

Descripción completa

Detalles Bibliográficos
Autores principales: Ojeda, Dario I., Mattila, Tiina M., Ruttink, Tom, Kujala, Sonja T., Kärkkäinen, Katri, Verta, Jukka-Pekka, Pyhäjärvi, Tanja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6778806/
https://www.ncbi.nlm.nih.gov/pubmed/31427456
http://dx.doi.org/10.1534/g3.119.400357
_version_ 1783456825985728512
author Ojeda, Dario I.
Mattila, Tiina M.
Ruttink, Tom
Kujala, Sonja T.
Kärkkäinen, Katri
Verta, Jukka-Pekka
Pyhäjärvi, Tanja
author_facet Ojeda, Dario I.
Mattila, Tiina M.
Ruttink, Tom
Kujala, Sonja T.
Kärkkäinen, Katri
Verta, Jukka-Pekka
Pyhäjärvi, Tanja
author_sort Ojeda, Dario I.
collection PubMed
description Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.
format Online
Article
Text
id pubmed-6778806
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-67788062019-10-26 Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris Ojeda, Dario I. Mattila, Tiina M. Ruttink, Tom Kujala, Sonja T. Kärkkäinen, Katri Verta, Jukka-Pekka Pyhäjärvi, Tanja G3 (Bethesda) Investigations Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity. Genetics Society of America 2019-10-26 /pmc/articles/PMC6778806/ /pubmed/31427456 http://dx.doi.org/10.1534/g3.119.400357 Text en Copyright © 2019 Ojeda et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Ojeda, Dario I.
Mattila, Tiina M.
Ruttink, Tom
Kujala, Sonja T.
Kärkkäinen, Katri
Verta, Jukka-Pekka
Pyhäjärvi, Tanja
Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title_full Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title_fullStr Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title_full_unstemmed Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title_short Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
title_sort utilization of tissue ploidy level variation in de novo transcriptome assembly of pinus sylvestris
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6778806/
https://www.ncbi.nlm.nih.gov/pubmed/31427456
http://dx.doi.org/10.1534/g3.119.400357
work_keys_str_mv AT ojedadarioi utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT mattilatiinam utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT ruttinktom utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT kujalasonjat utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT karkkainenkatri utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT vertajukkapekka utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris
AT pyhajarvitanja utilizationoftissueploidylevelvariationindenovotranscriptomeassemblyofpinussylvestris