Cargando…

Inferring ethnicity from mitochondrial DNA sequence

BACKGROUND: The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic m...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Chih, Măndoiu, Ion I, Nelson, Craig E
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3090759/
https://www.ncbi.nlm.nih.gov/pubmed/21554759
http://dx.doi.org/10.1186/1753-6561-5-S2-S11
_version_ 1782203174953680896
author Lee, Chih
Măndoiu, Ion I
Nelson, Craig E
author_facet Lee, Chih
Măndoiu, Ion I
Nelson, Craig E
author_sort Lee, Chih
collection PubMed
description BACKGROUND: The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome. RESULTS: We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome. CONCLUSIONS: Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.
format Text
id pubmed-3090759
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30907592011-05-28 Inferring ethnicity from mitochondrial DNA sequence Lee, Chih Măndoiu, Ion I Nelson, Craig E BMC Proc Proceedings BACKGROUND: The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome. RESULTS: We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome. CONCLUSIONS: Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications. BioMed Central 2011-05-28 /pmc/articles/PMC3090759/ /pubmed/21554759 http://dx.doi.org/10.1186/1753-6561-5-S2-S11 Text en Copyright ©2011 Lee et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lee, Chih
Măndoiu, Ion I
Nelson, Craig E
Inferring ethnicity from mitochondrial DNA sequence
title Inferring ethnicity from mitochondrial DNA sequence
title_full Inferring ethnicity from mitochondrial DNA sequence
title_fullStr Inferring ethnicity from mitochondrial DNA sequence
title_full_unstemmed Inferring ethnicity from mitochondrial DNA sequence
title_short Inferring ethnicity from mitochondrial DNA sequence
title_sort inferring ethnicity from mitochondrial dna sequence
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3090759/
https://www.ncbi.nlm.nih.gov/pubmed/21554759
http://dx.doi.org/10.1186/1753-6561-5-S2-S11
work_keys_str_mv AT leechih inferringethnicityfrommitochondrialdnasequence
AT mandoiuioni inferringethnicityfrommitochondrialdnasequence
AT nelsoncraige inferringethnicityfrommitochondrialdnasequence