Cargando…

Identifying large sets of unrelated individuals and unrelated markers

BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses f...

Descripción completa

Detalles Bibliográficos
Autores principales: Abraham, Kuruvilla Joseph, Diaz, Clara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3995366/
https://www.ncbi.nlm.nih.gov/pubmed/24635884
http://dx.doi.org/10.1186/1751-0473-9-6
_version_ 1782312868787519488
author Abraham, Kuruvilla Joseph
Diaz, Clara
author_facet Abraham, Kuruvilla Joseph
Diaz, Clara
author_sort Abraham, Kuruvilla Joseph
collection PubMed
description BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed. RESULTS: We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations. CONCLUSIONS: The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html
format Online
Article
Text
id pubmed-3995366
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39953662014-05-07 Identifying large sets of unrelated individuals and unrelated markers Abraham, Kuruvilla Joseph Diaz, Clara Source Code Biol Med Methodology BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed. RESULTS: We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations. CONCLUSIONS: The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html BioMed Central 2014-03-17 /pmc/articles/PMC3995366/ /pubmed/24635884 http://dx.doi.org/10.1186/1751-0473-9-6 Text en Copyright © 2014 Abraham and Diaz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Methodology
Abraham, Kuruvilla Joseph
Diaz, Clara
Identifying large sets of unrelated individuals and unrelated markers
title Identifying large sets of unrelated individuals and unrelated markers
title_full Identifying large sets of unrelated individuals and unrelated markers
title_fullStr Identifying large sets of unrelated individuals and unrelated markers
title_full_unstemmed Identifying large sets of unrelated individuals and unrelated markers
title_short Identifying large sets of unrelated individuals and unrelated markers
title_sort identifying large sets of unrelated individuals and unrelated markers
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3995366/
https://www.ncbi.nlm.nih.gov/pubmed/24635884
http://dx.doi.org/10.1186/1751-0473-9-6
work_keys_str_mv AT abrahamkuruvillajoseph identifyinglargesetsofunrelatedindividualsandunrelatedmarkers
AT diazclara identifyinglargesetsofunrelatedindividualsandunrelatedmarkers