Cargando…

A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip® whole-genome resequencing platform

DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the que...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandya, Gagan A., Holmes, Michael H., Sunkara, Sirisha, Sparks, Andrew, Bai, Yun, Verratti, Kathleen, Saeed, Kelly, Venepally, Pratap, Jarrahi, Behnam, Fleischmann, Robert D., Peterson, Scott N.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2175352/
https://www.ncbi.nlm.nih.gov/pubmed/18006572
http://dx.doi.org/10.1093/nar/gkm918
Descripción
Sumario:DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip® array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.