Cargando…

Draft versus finished sequence data for DNA and protein diagnostic signature development

Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP...

Descripción completa

Detalles Bibliográficos
Autores principales: Gardner, Shea N., Lam, Marisa W., Smith, Jason R., Torres, Clinton L., Slezak, Tom R.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266063/
https://www.ncbi.nlm.nih.gov/pubmed/16243783
http://dx.doi.org/10.1093/nar/gki896
_version_ 1782125924281483264
author Gardner, Shea N.
Lam, Marisa W.
Smith, Jason R.
Torres, Clinton L.
Slezak, Tom R.
author_facet Gardner, Shea N.
Lam, Marisa W.
Smith, Jason R.
Torres, Clinton L.
Slezak, Tom R.
author_sort Gardner, Shea N.
collection PubMed
description Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(−3)–10(−5) (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.
format Text
id pubmed-1266063
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-12660632005-10-28 Draft versus finished sequence data for DNA and protein diagnostic signature development Gardner, Shea N. Lam, Marisa W. Smith, Jason R. Torres, Clinton L. Slezak, Tom R. Nucleic Acids Res Article Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(−3)–10(−5) (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. Oxford University Press 2005 2005-10-20 /pmc/articles/PMC1266063/ /pubmed/16243783 http://dx.doi.org/10.1093/nar/gki896 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Gardner, Shea N.
Lam, Marisa W.
Smith, Jason R.
Torres, Clinton L.
Slezak, Tom R.
Draft versus finished sequence data for DNA and protein diagnostic signature development
title Draft versus finished sequence data for DNA and protein diagnostic signature development
title_full Draft versus finished sequence data for DNA and protein diagnostic signature development
title_fullStr Draft versus finished sequence data for DNA and protein diagnostic signature development
title_full_unstemmed Draft versus finished sequence data for DNA and protein diagnostic signature development
title_short Draft versus finished sequence data for DNA and protein diagnostic signature development
title_sort draft versus finished sequence data for dna and protein diagnostic signature development
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266063/
https://www.ncbi.nlm.nih.gov/pubmed/16243783
http://dx.doi.org/10.1093/nar/gki896
work_keys_str_mv AT gardnershean draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment
AT lammarisaw draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment
AT smithjasonr draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment
AT torresclintonl draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment
AT slezaktomr draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment