Cargando…
Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets
The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available re...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391306/ https://www.ncbi.nlm.nih.gov/pubmed/32760550 http://dx.doi.org/10.1002/ece3.6483 |
_version_ | 1783564608488865792 |
---|---|
author | Bohling, Justin |
author_facet | Bohling, Justin |
author_sort | Bohling, Justin |
collection | PubMed |
description | The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics. |
format | Online Article Text |
id | pubmed-7391306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73913062020-08-04 Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets Bohling, Justin Ecol Evol Original Research The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics. John Wiley and Sons Inc. 2020-06-28 /pmc/articles/PMC7391306/ /pubmed/32760550 http://dx.doi.org/10.1002/ece3.6483 Text en Published 2020. This article is a U.S. Government work and is in the public domain in the USA. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Research Bohling, Justin Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title | Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title_full | Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title_fullStr | Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title_full_unstemmed | Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title_short | Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets |
title_sort | evaluating the effect of reference genome divergence on the analysis of empirical radseq datasets |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391306/ https://www.ncbi.nlm.nih.gov/pubmed/32760550 http://dx.doi.org/10.1002/ece3.6483 |
work_keys_str_mv | AT bohlingjustin evaluatingtheeffectofreferencegenomedivergenceontheanalysisofempiricalradseqdatasets |