Cargando…

Application of a superword array in genome assembly

We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Xiaoqiu, Yang, Shiaw-Pyng, Chinwalla, Asif T., Hillier, LaDeana W., Minx, Patrick, Mardis, Elaine R., Wilson, Richard K.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325203/
https://www.ncbi.nlm.nih.gov/pubmed/16397298
http://dx.doi.org/10.1093/nar/gkj419
_version_ 1782126470739525632
author Huang, Xiaoqiu
Yang, Shiaw-Pyng
Chinwalla, Asif T.
Hillier, LaDeana W.
Minx, Patrick
Mardis, Elaine R.
Wilson, Richard K.
author_facet Huang, Xiaoqiu
Yang, Shiaw-Pyng
Chinwalla, Asif T.
Hillier, LaDeana W.
Minx, Patrick
Mardis, Elaine R.
Wilson, Richard K.
author_sort Huang, Xiaoqiu
collection PubMed
description We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.
format Text
id pubmed-1325203
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-13252032006-01-10 Application of a superword array in genome assembly Huang, Xiaoqiu Yang, Shiaw-Pyng Chinwalla, Asif T. Hillier, LaDeana W. Minx, Patrick Mardis, Elaine R. Wilson, Richard K. Nucleic Acids Res Article We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP. Oxford University Press 2006 2006-01-05 /pmc/articles/PMC1325203/ /pubmed/16397298 http://dx.doi.org/10.1093/nar/gkj419 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Huang, Xiaoqiu
Yang, Shiaw-Pyng
Chinwalla, Asif T.
Hillier, LaDeana W.
Minx, Patrick
Mardis, Elaine R.
Wilson, Richard K.
Application of a superword array in genome assembly
title Application of a superword array in genome assembly
title_full Application of a superword array in genome assembly
title_fullStr Application of a superword array in genome assembly
title_full_unstemmed Application of a superword array in genome assembly
title_short Application of a superword array in genome assembly
title_sort application of a superword array in genome assembly
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325203/
https://www.ncbi.nlm.nih.gov/pubmed/16397298
http://dx.doi.org/10.1093/nar/gkj419
work_keys_str_mv AT huangxiaoqiu applicationofasuperwordarrayingenomeassembly
AT yangshiawpyng applicationofasuperwordarrayingenomeassembly
AT chinwallaasift applicationofasuperwordarrayingenomeassembly
AT hillierladeanaw applicationofasuperwordarrayingenomeassembly
AT minxpatrick applicationofasuperwordarrayingenomeassembly
AT mardiselainer applicationofasuperwordarrayingenomeassembly
AT wilsonrichardk applicationofasuperwordarrayingenomeassembly