Cargando…
Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering
Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4095725/ https://www.ncbi.nlm.nih.gov/pubmed/25057498 http://dx.doi.org/10.1155/2014/675158 |
_version_ | 1782326085042569216 |
---|---|
author | Ilut, Daniel C. Nydam, Marie L. Hare, Matthew P. |
author_facet | Ilut, Daniel C. Nydam, Marie L. Hare, Matthew P. |
author_sort | Ilut, Daniel C. |
collection | PubMed |
description | Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data. |
format | Online Article Text |
id | pubmed-4095725 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-40957252014-07-23 Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering Ilut, Daniel C. Nydam, Marie L. Hare, Matthew P. Biomed Res Int Research Article Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data. Hindawi Publishing Corporation 2014 2014-06-25 /pmc/articles/PMC4095725/ /pubmed/25057498 http://dx.doi.org/10.1155/2014/675158 Text en Copyright © 2014 Daniel C. Ilut et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Ilut, Daniel C. Nydam, Marie L. Hare, Matthew P. Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title | Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title_full | Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title_fullStr | Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title_full_unstemmed | Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title_short | Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering |
title_sort | defining loci in restriction-based reduced representation genomic data from nonmodel species: sources of bias and diagnostics for optimal clustering |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4095725/ https://www.ncbi.nlm.nih.gov/pubmed/25057498 http://dx.doi.org/10.1155/2014/675158 |
work_keys_str_mv | AT ilutdanielc defininglociinrestrictionbasedreducedrepresentationgenomicdatafromnonmodelspeciessourcesofbiasanddiagnosticsforoptimalclustering AT nydammariel defininglociinrestrictionbasedreducedrepresentationgenomicdatafromnonmodelspeciessourcesofbiasanddiagnosticsforoptimalclustering AT harematthewp defininglociinrestrictionbasedreducedrepresentationgenomicdatafromnonmodelspeciessourcesofbiasanddiagnosticsforoptimalclustering |