Cargando…

Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes

BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-re...

Descripción completa

Detalles Bibliográficos
Autores principales: Haslam, Niall J., Whiteford, Nava E., Weber, Gerald, Prügel-Bennett, Adam, Essex, Jonathan W., Neylon, Cameron
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2430613/
https://www.ncbi.nlm.nih.gov/pubmed/18563203
http://dx.doi.org/10.1371/journal.pone.0002500
_version_ 1782156417047724032
author Haslam, Niall J.
Whiteford, Nava E.
Weber, Gerald
Prügel-Bennett, Adam
Essex, Jonathan W.
Neylon, Cameron
author_facet Haslam, Niall J.
Whiteford, Nava E.
Weber, Gerald
Prügel-Bennett, Adam
Essex, Jonathan W.
Neylon, Cameron
author_sort Haslam, Niall J.
collection PubMed
description BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences. RESULTS: We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence. CONCLUSIONS: Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools.
format Text
id pubmed-2430613
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-24306132008-06-19 Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes Haslam, Niall J. Whiteford, Nava E. Weber, Gerald Prügel-Bennett, Adam Essex, Jonathan W. Neylon, Cameron PLoS One Research Article BACKGROUND: Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences. RESULTS: We designed overlapping probe libraries for a range of highly variable drug target genes based on known sequence information and develop a formal terminology to describe probe library design. We find that for some targets these libraries can provide good coverage of a previously unseen target whereas for others the coverage is less than 30%. The optimal probe length varies from as short at 12 nt to as large as 19 nt and depends on the sequence, its variability, and the stringency of thermal filtering. It cannot be determined from inspection of an example gene sequence. CONCLUSIONS: Optimal probe length and the optimal number of reference sequences used to design a probe library are highly target specific for highly variable sequencing targets. The optimum design cannot be determined simply by inspection of input sequences or of alignments but only by detailed analysis of the each specific target. For highly variable sequences, shorter probes can in some cases provide better information than longer probes. Probe library design would benefit from a general purpose tool for analysing these issues. The formal terminology developed here and the analysis approaches it is used to describe will contribute to the development of such tools. Public Library of Science 2008-06-18 /pmc/articles/PMC2430613/ /pubmed/18563203 http://dx.doi.org/10.1371/journal.pone.0002500 Text en Haslam et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Haslam, Niall J.
Whiteford, Nava E.
Weber, Gerald
Prügel-Bennett, Adam
Essex, Jonathan W.
Neylon, Cameron
Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title_full Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title_fullStr Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title_full_unstemmed Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title_short Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
title_sort optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2430613/
https://www.ncbi.nlm.nih.gov/pubmed/18563203
http://dx.doi.org/10.1371/journal.pone.0002500
work_keys_str_mv AT haslamniallj optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes
AT whitefordnavae optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes
AT webergerald optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes
AT prugelbennettadam optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes
AT essexjonathanw optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes
AT neyloncameron optimalprobelengthvariesfortargetswithhighsequencevariationimplicationsforprobelibrarydesignforresequencinghighlyvariablegenes