Cargando…
Draft versus finished sequence data for DNA and protein diagnostic signature development
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266063/ https://www.ncbi.nlm.nih.gov/pubmed/16243783 http://dx.doi.org/10.1093/nar/gki896 |
_version_ | 1782125924281483264 |
---|---|
author | Gardner, Shea N. Lam, Marisa W. Smith, Jason R. Torres, Clinton L. Slezak, Tom R. |
author_facet | Gardner, Shea N. Lam, Marisa W. Smith, Jason R. Torres, Clinton L. Slezak, Tom R. |
author_sort | Gardner, Shea N. |
collection | PubMed |
description | Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(−3)–10(−5) (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. |
format | Text |
id | pubmed-1266063 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-12660632005-10-28 Draft versus finished sequence data for DNA and protein diagnostic signature development Gardner, Shea N. Lam, Marisa W. Smith, Jason R. Torres, Clinton L. Slezak, Tom R. Nucleic Acids Res Article Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(−3)–10(−5) (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. Oxford University Press 2005 2005-10-20 /pmc/articles/PMC1266063/ /pubmed/16243783 http://dx.doi.org/10.1093/nar/gki896 Text en © The Author 2005. Published by Oxford University Press. All rights reserved |
spellingShingle | Article Gardner, Shea N. Lam, Marisa W. Smith, Jason R. Torres, Clinton L. Slezak, Tom R. Draft versus finished sequence data for DNA and protein diagnostic signature development |
title | Draft versus finished sequence data for DNA and protein diagnostic signature development |
title_full | Draft versus finished sequence data for DNA and protein diagnostic signature development |
title_fullStr | Draft versus finished sequence data for DNA and protein diagnostic signature development |
title_full_unstemmed | Draft versus finished sequence data for DNA and protein diagnostic signature development |
title_short | Draft versus finished sequence data for DNA and protein diagnostic signature development |
title_sort | draft versus finished sequence data for dna and protein diagnostic signature development |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266063/ https://www.ncbi.nlm.nih.gov/pubmed/16243783 http://dx.doi.org/10.1093/nar/gki896 |
work_keys_str_mv | AT gardnershean draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment AT lammarisaw draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment AT smithjasonr draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment AT torresclintonl draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment AT slezaktomr draftversusfinishedsequencedatafordnaandproteindiagnosticsignaturedevelopment |