Cargando…
Identifying large sets of unrelated individuals and unrelated markers
BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3995366/ https://www.ncbi.nlm.nih.gov/pubmed/24635884 http://dx.doi.org/10.1186/1751-0473-9-6 |
_version_ | 1782312868787519488 |
---|---|
author | Abraham, Kuruvilla Joseph Diaz, Clara |
author_facet | Abraham, Kuruvilla Joseph Diaz, Clara |
author_sort | Abraham, Kuruvilla Joseph |
collection | PubMed |
description | BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed. RESULTS: We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations. CONCLUSIONS: The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html |
format | Online Article Text |
id | pubmed-3995366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39953662014-05-07 Identifying large sets of unrelated individuals and unrelated markers Abraham, Kuruvilla Joseph Diaz, Clara Source Code Biol Med Methodology BACKGROUND: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed. RESULTS: We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations. CONCLUSIONS: The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html BioMed Central 2014-03-17 /pmc/articles/PMC3995366/ /pubmed/24635884 http://dx.doi.org/10.1186/1751-0473-9-6 Text en Copyright © 2014 Abraham and Diaz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Methodology Abraham, Kuruvilla Joseph Diaz, Clara Identifying large sets of unrelated individuals and unrelated markers |
title | Identifying large sets of unrelated individuals and unrelated markers |
title_full | Identifying large sets of unrelated individuals and unrelated markers |
title_fullStr | Identifying large sets of unrelated individuals and unrelated markers |
title_full_unstemmed | Identifying large sets of unrelated individuals and unrelated markers |
title_short | Identifying large sets of unrelated individuals and unrelated markers |
title_sort | identifying large sets of unrelated individuals and unrelated markers |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3995366/ https://www.ncbi.nlm.nih.gov/pubmed/24635884 http://dx.doi.org/10.1186/1751-0473-9-6 |
work_keys_str_mv | AT abrahamkuruvillajoseph identifyinglargesetsofunrelatedindividualsandunrelatedmarkers AT diazclara identifyinglargesetsofunrelatedindividualsandunrelatedmarkers |