Cargando…

Application of a superword array in genome assembly

We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Xiaoqiu, Yang, Shiaw-Pyng, Chinwalla, Asif T., Hillier, LaDeana W., Minx, Patrick, Mardis, Elaine R., Wilson, Richard K.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325203/
https://www.ncbi.nlm.nih.gov/pubmed/16397298
http://dx.doi.org/10.1093/nar/gkj419
Descripción
Sumario:We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.