Cargando…

ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing data

BACKGROUND: Large-scale multi-ethnic DNA sequencing data is increasingly available owing to decreasing cost of modern sequencing technologies. Inference of the population structure with such sequencing data is fundamentally important. However, the ultra-dimensionality and complicated linkage disequi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Jinghan, Xu, Yuyang, Yao, Minhao, Wang, Gao, Liu, Zhonghua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10155328/ https://www.ncbi.nlm.nih.gov/pubmed/37131141 http://dx.doi.org/10.1186/s12859-023-05305-0

Descripción
Sumario:	BACKGROUND: Large-scale multi-ethnic DNA sequencing data is increasingly available owing to decreasing cost of modern sequencing technologies. Inference of the population structure with such sequencing data is fundamentally important. However, the ultra-dimensionality and complicated linkage disequilibrium patterns across the whole genome make it challenging to infer population structure using traditional principal component analysis based methods and software. RESULTS: We present the ERStruct Python Package, which enables the inference of population structure using whole-genome sequencing data. By leveraging parallel computing and GPU acceleration, our package achieves significant improvements in the speed of matrix operations for large-scale data. Additionally, our package features adaptive data splitting capabilities to facilitate computation on GPUs with limited memory. CONCLUSION: Our Python package ERStruct is an efficient and user-friendly tool for estimating the number of top informative principal components that capture population structure from whole genome sequencing data.

ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing data

Ejemplares similares