Cargando…

Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring

BACKGROUND: Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of autom...

Descripción completa

Detalles Bibliográficos
Autores principales: Arrigo, Nils, Tuszynski, Jarek W, Ehrich, Dorothee, Gerdes, Tommy, Alvarez, Nadir
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656475/
https://www.ncbi.nlm.nih.gov/pubmed/19171029
http://dx.doi.org/10.1186/1471-2105-10-33
_version_ 1782165508134535168
author Arrigo, Nils
Tuszynski, Jarek W
Ehrich, Dorothee
Gerdes, Tommy
Alvarez, Nadir
author_facet Arrigo, Nils
Tuszynski, Jarek W
Ehrich, Dorothee
Gerdes, Tommy
Alvarez, Nadir
author_sort Arrigo, Nils
collection PubMed
description BACKGROUND: Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses. RESULTS: Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (I(bin)) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets. CONCLUSION: Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at .
format Text
id pubmed-2656475
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26564752009-03-17 Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring Arrigo, Nils Tuszynski, Jarek W Ehrich, Dorothee Gerdes, Tommy Alvarez, Nadir BMC Bioinformatics Research Article BACKGROUND: Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses. RESULTS: Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (I(bin)) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets. CONCLUSION: Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at . BioMed Central 2009-01-26 /pmc/articles/PMC2656475/ /pubmed/19171029 http://dx.doi.org/10.1186/1471-2105-10-33 Text en Copyright © 2009 Arrigo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Arrigo, Nils
Tuszynski, Jarek W
Ehrich, Dorothee
Gerdes, Tommy
Alvarez, Nadir
Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_full Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_fullStr Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_full_unstemmed Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_short Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_sort evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using rawgeno, an r package for automating aflp scoring
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656475/
https://www.ncbi.nlm.nih.gov/pubmed/19171029
http://dx.doi.org/10.1186/1471-2105-10-33
work_keys_str_mv AT arrigonils evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring
AT tuszynskijarekw evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring
AT ehrichdorothee evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring
AT gerdestommy evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring
AT alvareznadir evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring