Cargando…

Multi-allelic positional Burrows-Wheeler transform

BACKGROUND: Recent advances in whole-genome sequencing and SNP array technology have led to the generation of a large amount of genotype data. Large volumes of genotype data will require faster and more efficient methods for storing and searching the data. Positional Burrows-Wheeler Transform (PBWT)...

Descripción completa

Detalles Bibliográficos
Autores principales: Naseri, Ardalan, Zhi, Degui, Zhang, Shaojie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551244/
https://www.ncbi.nlm.nih.gov/pubmed/31167638
http://dx.doi.org/10.1186/s12859-019-2821-6
Descripción
Sumario:BACKGROUND: Recent advances in whole-genome sequencing and SNP array technology have led to the generation of a large amount of genotype data. Large volumes of genotype data will require faster and more efficient methods for storing and searching the data. Positional Burrows-Wheeler Transform (PBWT) provides an appropriate data structure for bi-allelic data. With the increasing sample sizes, more multi-allelic sites are expected to be observed. Hence, there is a necessity to handle multi-allelic genotype data. RESULTS: In this paper, we introduce a multi-allelic version of the Positional Burrows-Wheeler Transform (mPBWT) based on the bi-allelic version for compression and searching. The time-complexity for constructing the data structure and searching within a panel containing t-allelic sites increases by a factor of t. CONCLUSION: Considering the small value for the possible alleles t, the time increase for the multi-allelic PBWT will be negligible and comparable to the bi-allelic version of PBWT.