Cargando…
Simple regression models as a threshold for selecting AFLP loci with reduced error rates
BACKGROUND: Amplified fragment length polymorphism is a popular DNA marker technique that has applications in multiple fields of study. Technological improvements and decreasing costs have dramatically increased the number of markers that can be generated in an amplified fragment length polymorphism...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534328/ https://www.ncbi.nlm.nih.gov/pubmed/23072295 http://dx.doi.org/10.1186/1471-2105-13-268 |
_version_ | 1782475315587579904 |
---|---|
author | Price, David L Casler, Michael D |
author_facet | Price, David L Casler, Michael D |
author_sort | Price, David L |
collection | PubMed |
description | BACKGROUND: Amplified fragment length polymorphism is a popular DNA marker technique that has applications in multiple fields of study. Technological improvements and decreasing costs have dramatically increased the number of markers that can be generated in an amplified fragment length polymorphism experiment. As datasets increase in size, the number of genotyping errors also increases. Error within a DNA marker dataset can result in reduced statistical power, incorrect conclusions, and decreased reproducibility. It is essential that error within a dataset be recognized and reduced where possible, while still balancing the need for genomic diversity. RESULTS: Using simple regression with a second-degree polynomial term, a model was fit to describe the relationship between locus-specific error rate and the frequency of present alleles. This model was then used to set a moving error rate threshold that varied based on the frequency of present alleles at a given locus. Loci with error rates greater than the threshold were removed from further analyses. This method of selecting loci is advantageous, as it accounts for differences in error rate between loci of varying frequencies of present alleles. An example using this method to select loci is demonstrated in an amplified fragment length polymorphism dataset generated from the North American prairie species big bluestem. Within this dataset the error rate was reduced from 12.5% to 8.8% by removal of loci with error rates greater than the defined threshold. By repeating the method on selected loci, the error rate was further reduced to 5.9%. This reduction in error resulted in a substantial increase in the amount of genetic variation attributable to regional and population variation. CONCLUSIONS: This paper demonstrates a logical and computationally simple method for selecting loci with a reduced error rate. In the context of a genetic diversity study, this method resulted in an increased ability to detect differences between populations. Further application of this locus selection method, in addition to error-reducing methodological precautions, will result in amplified fragment length polymorphism datasets with reduced error rates. This reduction in error rate should result in greater power to detect differences and increased reproducibility. |
format | Online Article Text |
id | pubmed-3534328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35343282013-01-03 Simple regression models as a threshold for selecting AFLP loci with reduced error rates Price, David L Casler, Michael D BMC Bioinformatics Methodology Article BACKGROUND: Amplified fragment length polymorphism is a popular DNA marker technique that has applications in multiple fields of study. Technological improvements and decreasing costs have dramatically increased the number of markers that can be generated in an amplified fragment length polymorphism experiment. As datasets increase in size, the number of genotyping errors also increases. Error within a DNA marker dataset can result in reduced statistical power, incorrect conclusions, and decreased reproducibility. It is essential that error within a dataset be recognized and reduced where possible, while still balancing the need for genomic diversity. RESULTS: Using simple regression with a second-degree polynomial term, a model was fit to describe the relationship between locus-specific error rate and the frequency of present alleles. This model was then used to set a moving error rate threshold that varied based on the frequency of present alleles at a given locus. Loci with error rates greater than the threshold were removed from further analyses. This method of selecting loci is advantageous, as it accounts for differences in error rate between loci of varying frequencies of present alleles. An example using this method to select loci is demonstrated in an amplified fragment length polymorphism dataset generated from the North American prairie species big bluestem. Within this dataset the error rate was reduced from 12.5% to 8.8% by removal of loci with error rates greater than the defined threshold. By repeating the method on selected loci, the error rate was further reduced to 5.9%. This reduction in error resulted in a substantial increase in the amount of genetic variation attributable to regional and population variation. CONCLUSIONS: This paper demonstrates a logical and computationally simple method for selecting loci with a reduced error rate. In the context of a genetic diversity study, this method resulted in an increased ability to detect differences between populations. Further application of this locus selection method, in addition to error-reducing methodological precautions, will result in amplified fragment length polymorphism datasets with reduced error rates. This reduction in error rate should result in greater power to detect differences and increased reproducibility. BioMed Central 2012-10-16 /pmc/articles/PMC3534328/ /pubmed/23072295 http://dx.doi.org/10.1186/1471-2105-13-268 Text en Copyright ©2012 Price and Casler; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Price, David L Casler, Michael D Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title | Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title_full | Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title_fullStr | Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title_full_unstemmed | Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title_short | Simple regression models as a threshold for selecting AFLP loci with reduced error rates |
title_sort | simple regression models as a threshold for selecting aflp loci with reduced error rates |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534328/ https://www.ncbi.nlm.nih.gov/pubmed/23072295 http://dx.doi.org/10.1186/1471-2105-13-268 |
work_keys_str_mv | AT pricedavidl simpleregressionmodelsasathresholdforselectingaflplociwithreducederrorrates AT caslermichaeld simpleregressionmodelsasathresholdforselectingaflplociwithreducederrorrates |