Cargando…

A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data

BACKGROUND: Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce MCPCA_PopGen to analyze population structure of low-depth sequencing data. RESULTS: The method optimizes the ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Miao, Liu, Yiwen, Zhou, Hua, Watkins, Joseph, Zhou, Jin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8236193/
https://www.ncbi.nlm.nih.gov/pubmed/34174829
http://dx.doi.org/10.1186/s12859-021-04265-7
Descripción
Sumario:BACKGROUND: Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce MCPCA_PopGen to analyze population structure of low-depth sequencing data. RESULTS: The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. CONCLUSIONS: We apply MCPCA_PopGen to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The MCPCA_PopGen package is available on https://github.com/yiwenstat/MCPCA_PopGen. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1186/s12859-021-04265-7.