Cargando…

Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering

Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Amit Kumar, Chopra, Rupali, Ali, Shafat, Aggarwal, Shweta, Vig, Lovekesh, Koul Bamezai, Rameshwar Nath
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4150763/
https://www.ncbi.nlm.nih.gov/pubmed/25030906
http://dx.doi.org/10.1093/nar/gku585
_version_ 1782332949978415104
author Srivastava, Amit Kumar
Chopra, Rupali
Ali, Shafat
Aggarwal, Shweta
Vig, Lovekesh
Koul Bamezai, Rameshwar Nath
author_facet Srivastava, Amit Kumar
Chopra, Rupali
Ali, Shafat
Aggarwal, Shweta
Vig, Lovekesh
Koul Bamezai, Rameshwar Nath
author_sort Srivastava, Amit Kumar
collection PubMed
description Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as F(ST), molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(−3)) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.
format Online
Article
Text
id pubmed-4150763
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41507632014-12-01 Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering Srivastava, Amit Kumar Chopra, Rupali Ali, Shafat Aggarwal, Shweta Vig, Lovekesh Koul Bamezai, Rameshwar Nath Nucleic Acids Res Methods Online Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as F(ST), molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10(−3)) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner. Oxford University Press 2014-09-02 2014-07-16 /pmc/articles/PMC4150763/ /pubmed/25030906 http://dx.doi.org/10.1093/nar/gku585 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Srivastava, Amit Kumar
Chopra, Rupali
Ali, Shafat
Aggarwal, Shweta
Vig, Lovekesh
Koul Bamezai, Rameshwar Nath
Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title_full Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title_fullStr Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title_full_unstemmed Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title_short Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
title_sort inferring population structure and relationship using minimal independent evolutionary markers in y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4150763/
https://www.ncbi.nlm.nih.gov/pubmed/25030906
http://dx.doi.org/10.1093/nar/gku585
work_keys_str_mv AT srivastavaamitkumar inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering
AT choprarupali inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering
AT alishafat inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering
AT aggarwalshweta inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering
AT viglovekesh inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering
AT koulbamezairameshwarnath inferringpopulationstructureandrelationshipusingminimalindependentevolutionarymarkersinychromosomeahybridapproachofrecursivefeatureselectionforhierarchicalclustering