Cargando…

An optimized procedure greatly improves EST vector contamination removal

BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most signifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yi-An, Lin, Chang-Chun, Wang, Chin-Di, Wu, Huan-Bin, Hwang, Pei-Ing
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194723/
https://www.ncbi.nlm.nih.gov/pubmed/17997864
http://dx.doi.org/10.1186/1471-2164-8-416
_version_ 1782147682396012544
author Chen, Yi-An
Lin, Chang-Chun
Wang, Chin-Di
Wu, Huan-Bin
Hwang, Pei-Ing
author_facet Chen, Yi-An
Lin, Chang-Chun
Wang, Chin-Di
Wu, Huan-Bin
Hwang, Pei-Ing
author_sort Chen, Yi-An
collection PubMed
description BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean. RESULTS: In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC. CONCLUSION: Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences.
format Text
id pubmed-2194723
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21947232008-01-12 An optimized procedure greatly improves EST vector contamination removal Chen, Yi-An Lin, Chang-Chun Wang, Chin-Di Wu, Huan-Bin Hwang, Pei-Ing BMC Genomics Methodology Article BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean. RESULTS: In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC. CONCLUSION: Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences. BioMed Central 2007-11-13 /pmc/articles/PMC2194723/ /pubmed/17997864 http://dx.doi.org/10.1186/1471-2164-8-416 Text en Copyright © 2007 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Chen, Yi-An
Lin, Chang-Chun
Wang, Chin-Di
Wu, Huan-Bin
Hwang, Pei-Ing
An optimized procedure greatly improves EST vector contamination removal
title An optimized procedure greatly improves EST vector contamination removal
title_full An optimized procedure greatly improves EST vector contamination removal
title_fullStr An optimized procedure greatly improves EST vector contamination removal
title_full_unstemmed An optimized procedure greatly improves EST vector contamination removal
title_short An optimized procedure greatly improves EST vector contamination removal
title_sort optimized procedure greatly improves est vector contamination removal
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194723/
https://www.ncbi.nlm.nih.gov/pubmed/17997864
http://dx.doi.org/10.1186/1471-2164-8-416
work_keys_str_mv AT chenyian anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT linchangchun anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT wangchindi anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT wuhuanbin anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT hwangpeiing anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT chenyian optimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT linchangchun optimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT wangchindi optimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT wuhuanbin optimizedproceduregreatlyimprovesestvectorcontaminationremoval
AT hwangpeiing optimizedproceduregreatlyimprovesestvectorcontaminationremoval