Cargando…

ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection

Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection...

Descripción completa

Detalles Bibliográficos
Autores principales: Alachiotis, Nikolaos, Vogiatzi, Emmanouella, Pavlidis, Pavlos, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology (RNCSB) Organization 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962156/
https://www.ncbi.nlm.nih.gov/pubmed/24688709
http://dx.doi.org/10.5936/csbj.201303001
_version_ 1782308392874803200
author Alachiotis, Nikolaos
Vogiatzi, Emmanouella
Pavlidis, Pavlos
Stamatakis, Alexandros
author_facet Alachiotis, Nikolaos
Vogiatzi, Emmanouella
Pavlidis, Pavlos
Stamatakis, Alexandros
author_sort Alachiotis, Nikolaos
collection PubMed
description Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
format Online
Article
Text
id pubmed-3962156
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Research Network of Computational and Structural Biotechnology (RNCSB) Organization
record_format MEDLINE/PubMed
spelling pubmed-39621562014-03-31 ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection Alachiotis, Nikolaos Vogiatzi, Emmanouella Pavlidis, Pavlos Stamatakis, Alexandros Comput Struct Biotechnol J Software Article Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. Research Network of Computational and Structural Biotechnology (RNCSB) Organization 2013-05-08 /pmc/articles/PMC3962156/ /pubmed/24688709 http://dx.doi.org/10.5936/csbj.201303001 Text en © Alachiotis et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited.
spellingShingle Software Article
Alachiotis, Nikolaos
Vogiatzi, Emmanouella
Pavlidis, Pavlos
Stamatakis, Alexandros
ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title_full ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title_fullStr ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title_full_unstemmed ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title_short ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
title_sort chromatogate: a tool for detecting base mis-calls in multiple sequence alignments by semi-automatic chromatogram inspection
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962156/
https://www.ncbi.nlm.nih.gov/pubmed/24688709
http://dx.doi.org/10.5936/csbj.201303001
work_keys_str_mv AT alachiotisnikolaos chromatogateatoolfordetectingbasemiscallsinmultiplesequencealignmentsbysemiautomaticchromatograminspection
AT vogiatziemmanouella chromatogateatoolfordetectingbasemiscallsinmultiplesequencealignmentsbysemiautomaticchromatograminspection
AT pavlidispavlos chromatogateatoolfordetectingbasemiscallsinmultiplesequencealignmentsbysemiautomaticchromatograminspection
AT stamatakisalexandros chromatogateatoolfordetectingbasemiscallsinmultiplesequencealignmentsbysemiautomaticchromatograminspection