Cargando…
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data
Restriction‐site associated DNA sequencing (RAD‐seq) can identify and score thousands of genetic markers from a group of samples for population‐genetics studies. One challenge of de novo RAD‐seq analysis is to distinguish paralogous sequence variants (PSVs) from true single‐nucleotide polymorphisms...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6065343/ https://www.ncbi.nlm.nih.gov/pubmed/30073062 http://dx.doi.org/10.1002/ece3.4219 |
_version_ | 1783342846659526656 |
---|---|
author | Nadukkalam Ravindran, Praveen Bentzen, Paul Bradbury, Ian R. Beiko, Robert G. |
author_facet | Nadukkalam Ravindran, Praveen Bentzen, Paul Bradbury, Ian R. Beiko, Robert G. |
author_sort | Nadukkalam Ravindran, Praveen |
collection | PubMed |
description | Restriction‐site associated DNA sequencing (RAD‐seq) can identify and score thousands of genetic markers from a group of samples for population‐genetics studies. One challenge of de novo RAD‐seq analysis is to distinguish paralogous sequence variants (PSVs) from true single‐nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network‐based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD‐seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD‐seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package. |
format | Online Article Text |
id | pubmed-6065343 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-60653432018-08-02 PMERGE: Computational filtering of paralogous sequences from RAD‐seq data Nadukkalam Ravindran, Praveen Bentzen, Paul Bradbury, Ian R. Beiko, Robert G. Ecol Evol Original Research Restriction‐site associated DNA sequencing (RAD‐seq) can identify and score thousands of genetic markers from a group of samples for population‐genetics studies. One challenge of de novo RAD‐seq analysis is to distinguish paralogous sequence variants (PSVs) from true single‐nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network‐based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD‐seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD‐seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package. John Wiley and Sons Inc. 2018-06-11 /pmc/articles/PMC6065343/ /pubmed/30073062 http://dx.doi.org/10.1002/ece3.4219 Text en © 2018 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Research Nadukkalam Ravindran, Praveen Bentzen, Paul Bradbury, Ian R. Beiko, Robert G. PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title |
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title_full |
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title_fullStr |
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title_full_unstemmed |
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title_short |
PMERGE: Computational filtering of paralogous sequences from RAD‐seq data |
title_sort | pmerge: computational filtering of paralogous sequences from rad‐seq data |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6065343/ https://www.ncbi.nlm.nih.gov/pubmed/30073062 http://dx.doi.org/10.1002/ece3.4219 |
work_keys_str_mv | AT nadukkalamravindranpraveen pmergecomputationalfilteringofparalogoussequencesfromradseqdata AT bentzenpaul pmergecomputationalfilteringofparalogoussequencesfromradseqdata AT bradburyianr pmergecomputationalfilteringofparalogoussequencesfromradseqdata AT beikorobertg pmergecomputationalfilteringofparalogoussequencesfromradseqdata |