Cargando…

De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species

BACKGROUND: Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improv...

Descripción completa

Detalles Bibliográficos
Autores principales: Carruthers, Madeleine, Yurchenko, Andrey A., Augley, Julian J., Adams, Colin E., Herzyk, Pawel, Elmer, Kathryn R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5759245/
https://www.ncbi.nlm.nih.gov/pubmed/29310597
http://dx.doi.org/10.1186/s12864-017-4379-x
_version_ 1783291161850413056
author Carruthers, Madeleine
Yurchenko, Andrey A.
Augley, Julian J.
Adams, Colin E.
Herzyk, Pawel
Elmer, Kathryn R.
author_facet Carruthers, Madeleine
Yurchenko, Andrey A.
Augley, Julian J.
Adams, Colin E.
Herzyk, Pawel
Elmer, Kathryn R.
author_sort Carruthers, Madeleine
collection PubMed
description BACKGROUND: Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improving understanding of the genetic mechanisms underlying traits in these species would significantly progress research in these fields. Here we generate high quality de novo transcriptomes for four salmonid species: Atlantic salmon (Salmo salar), brown trout (Salmo trutta), Arctic charr (Salvelinus alpinus), and European whitefish (Coregonus lavaretus). All species except Atlantic salmon have no reference genome publicly available and few if any genomic studies to date. RESULTS: We used paired-end RNA-seq on Illumina to generate high coverage sequencing of multiple individuals, yielding between 180 and 210 M reads per species. After initial assembly, strict filtering was used to remove duplicated, redundant, and low confidence transcripts. The final assemblies consisted of 36,505 protein-coding transcripts for Atlantic salmon, 35,736 for brown trout, 33,126 for Arctic charr, and 33,697 for European whitefish and are made publicly available. Assembly completeness was assessed using three approaches, all of which supported high quality of the assemblies: 1) ~78% of Actinopterygian single-copy orthologs were successfully captured in our assemblies, 2) orthogroup inference identified high overlap in the protein sequences present across all four species (40% shared across all four and 84% shared by at least two), and 3) comparison with the published Atlantic salmon genome suggests that our assemblies represent well covered (~98%) protein-coding transcriptomes. Thorough comparison of the generated assemblies found that 84-90% of transcripts in each assembly were orthologous with at least one of the other three species. We also identified 34-37% of transcripts in each assembly as paralogs. We further compare completeness and annotation statistics of our new assemblies to available related species. CONCLUSION: New, high-confidence protein-coding transcriptomes were generated for four ecologically and economically important species of salmonids. This offers a high quality pipeline for such complex genomes, represents a valuable contribution to the existing genomic resources for these species and provides robust tools for future investigation of gene expression and sequence evolution in these and other salmonid species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4379-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5759245
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57592452018-01-10 De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species Carruthers, Madeleine Yurchenko, Andrey A. Augley, Julian J. Adams, Colin E. Herzyk, Pawel Elmer, Kathryn R. BMC Genomics Research Article BACKGROUND: Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improving understanding of the genetic mechanisms underlying traits in these species would significantly progress research in these fields. Here we generate high quality de novo transcriptomes for four salmonid species: Atlantic salmon (Salmo salar), brown trout (Salmo trutta), Arctic charr (Salvelinus alpinus), and European whitefish (Coregonus lavaretus). All species except Atlantic salmon have no reference genome publicly available and few if any genomic studies to date. RESULTS: We used paired-end RNA-seq on Illumina to generate high coverage sequencing of multiple individuals, yielding between 180 and 210 M reads per species. After initial assembly, strict filtering was used to remove duplicated, redundant, and low confidence transcripts. The final assemblies consisted of 36,505 protein-coding transcripts for Atlantic salmon, 35,736 for brown trout, 33,126 for Arctic charr, and 33,697 for European whitefish and are made publicly available. Assembly completeness was assessed using three approaches, all of which supported high quality of the assemblies: 1) ~78% of Actinopterygian single-copy orthologs were successfully captured in our assemblies, 2) orthogroup inference identified high overlap in the protein sequences present across all four species (40% shared across all four and 84% shared by at least two), and 3) comparison with the published Atlantic salmon genome suggests that our assemblies represent well covered (~98%) protein-coding transcriptomes. Thorough comparison of the generated assemblies found that 84-90% of transcripts in each assembly were orthologous with at least one of the other three species. We also identified 34-37% of transcripts in each assembly as paralogs. We further compare completeness and annotation statistics of our new assemblies to available related species. CONCLUSION: New, high-confidence protein-coding transcriptomes were generated for four ecologically and economically important species of salmonids. This offers a high quality pipeline for such complex genomes, represents a valuable contribution to the existing genomic resources for these species and provides robust tools for future investigation of gene expression and sequence evolution in these and other salmonid species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4379-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-08 /pmc/articles/PMC5759245/ /pubmed/29310597 http://dx.doi.org/10.1186/s12864-017-4379-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Carruthers, Madeleine
Yurchenko, Andrey A.
Augley, Julian J.
Adams, Colin E.
Herzyk, Pawel
Elmer, Kathryn R.
De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title_full De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title_fullStr De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title_full_unstemmed De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title_short De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
title_sort de novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5759245/
https://www.ncbi.nlm.nih.gov/pubmed/29310597
http://dx.doi.org/10.1186/s12864-017-4379-x
work_keys_str_mv AT carruthersmadeleine denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies
AT yurchenkoandreya denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies
AT augleyjulianj denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies
AT adamscoline denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies
AT herzykpawel denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies
AT elmerkathrynr denovotranscriptomeassemblyannotationandcomparisonoffourecologicalandevolutionarymodelsalmonidfishspecies