Cargando…

Statistical modeling of STR capillary electrophoresis signal

BACKGROUND: In order to isolate an individual’s genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which...

Descripción completa

Detalles Bibliográficos
Autores principales: Karkar, Slim, Alfonse, Lauren E., Grgicak, Catherine M., Lun, Desmond S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886162/
https://www.ncbi.nlm.nih.gov/pubmed/31787097
http://dx.doi.org/10.1186/s12859-019-3074-0
_version_ 1783474828514164736
author Karkar, Slim
Alfonse, Lauren E.
Grgicak, Catherine M.
Lun, Desmond S.
author_facet Karkar, Slim
Alfonse, Lauren E.
Grgicak, Catherine M.
Lun, Desmond S.
author_sort Karkar, Slim
collection PubMed
description BACKGROUND: In order to isolate an individual’s genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which contains information about the length and number of STR units amplified. For samples collected from the environment, interpretation of the signal can be challenging given that information regarding the quality and quantity of the DNA is often limited. The signal can be further compounded by the presence of noise and PCR artifacts such as stutter which can mask or mimic biological alleles. Because manual interpretation methods cannot comprehensively account for such nuances, it would be valuable to develop a signal model that can effectively characterize the various components of STR signal independent of a priori knowledge of the quantity or quality of DNA. RESULTS: First, we seek to mathematically characterize the quality of the profile by measuring changes in the signal with respect to amplicon size. Next, we examine the noise, allele, and stutter components of the signal and develop distinct models for each. Using cross-validation and model selection, we identify a model that can be effectively utilized for downstream interpretation. Finally, we show an implementation of the model in NOCIt, a software system that calculates the a posteriori probability distribution on the number of contributors. CONCLUSION: The model was selected using a large, diverse set of DNA samples obtained from 144 different laboratory conditions; with DNA amounts ranging from a single copy of DNA to hundreds of copies, and the quality of the profiles ranging from pristine to highly degraded. Implemented in NOCIt, the model enables a probabilisitc approach to estimating the number of contributors to complex, environmental samples.
format Online
Article
Text
id pubmed-6886162
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68861622019-12-11 Statistical modeling of STR capillary electrophoresis signal Karkar, Slim Alfonse, Lauren E. Grgicak, Catherine M. Lun, Desmond S. BMC Bioinformatics Research BACKGROUND: In order to isolate an individual’s genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which contains information about the length and number of STR units amplified. For samples collected from the environment, interpretation of the signal can be challenging given that information regarding the quality and quantity of the DNA is often limited. The signal can be further compounded by the presence of noise and PCR artifacts such as stutter which can mask or mimic biological alleles. Because manual interpretation methods cannot comprehensively account for such nuances, it would be valuable to develop a signal model that can effectively characterize the various components of STR signal independent of a priori knowledge of the quantity or quality of DNA. RESULTS: First, we seek to mathematically characterize the quality of the profile by measuring changes in the signal with respect to amplicon size. Next, we examine the noise, allele, and stutter components of the signal and develop distinct models for each. Using cross-validation and model selection, we identify a model that can be effectively utilized for downstream interpretation. Finally, we show an implementation of the model in NOCIt, a software system that calculates the a posteriori probability distribution on the number of contributors. CONCLUSION: The model was selected using a large, diverse set of DNA samples obtained from 144 different laboratory conditions; with DNA amounts ranging from a single copy of DNA to hundreds of copies, and the quality of the profiles ranging from pristine to highly degraded. Implemented in NOCIt, the model enables a probabilisitc approach to estimating the number of contributors to complex, environmental samples. BioMed Central 2019-12-02 /pmc/articles/PMC6886162/ /pubmed/31787097 http://dx.doi.org/10.1186/s12859-019-3074-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Karkar, Slim
Alfonse, Lauren E.
Grgicak, Catherine M.
Lun, Desmond S.
Statistical modeling of STR capillary electrophoresis signal
title Statistical modeling of STR capillary electrophoresis signal
title_full Statistical modeling of STR capillary electrophoresis signal
title_fullStr Statistical modeling of STR capillary electrophoresis signal
title_full_unstemmed Statistical modeling of STR capillary electrophoresis signal
title_short Statistical modeling of STR capillary electrophoresis signal
title_sort statistical modeling of str capillary electrophoresis signal
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886162/
https://www.ncbi.nlm.nih.gov/pubmed/31787097
http://dx.doi.org/10.1186/s12859-019-3074-0
work_keys_str_mv AT karkarslim statisticalmodelingofstrcapillaryelectrophoresissignal
AT alfonselaurene statisticalmodelingofstrcapillaryelectrophoresissignal
AT grgicakcatherinem statisticalmodelingofstrcapillaryelectrophoresissignal
AT lundesmonds statisticalmodelingofstrcapillaryelectrophoresissignal