Cargando…
An optimized procedure greatly improves EST vector contamination removal
BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most signifi...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194723/ https://www.ncbi.nlm.nih.gov/pubmed/17997864 http://dx.doi.org/10.1186/1471-2164-8-416 |
_version_ | 1782147682396012544 |
---|---|
author | Chen, Yi-An Lin, Chang-Chun Wang, Chin-Di Wu, Huan-Bin Hwang, Pei-Ing |
author_facet | Chen, Yi-An Lin, Chang-Chun Wang, Chin-Di Wu, Huan-Bin Hwang, Pei-Ing |
author_sort | Chen, Yi-An |
collection | PubMed |
description | BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean. RESULTS: In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC. CONCLUSION: Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences. |
format | Text |
id | pubmed-2194723 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-21947232008-01-12 An optimized procedure greatly improves EST vector contamination removal Chen, Yi-An Lin, Chang-Chun Wang, Chin-Di Wu, Huan-Bin Hwang, Pei-Ing BMC Genomics Methodology Article BACKGROUND: The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean. RESULTS: In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC. CONCLUSION: Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences. BioMed Central 2007-11-13 /pmc/articles/PMC2194723/ /pubmed/17997864 http://dx.doi.org/10.1186/1471-2164-8-416 Text en Copyright © 2007 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Chen, Yi-An Lin, Chang-Chun Wang, Chin-Di Wu, Huan-Bin Hwang, Pei-Ing An optimized procedure greatly improves EST vector contamination removal |
title | An optimized procedure greatly improves EST vector contamination removal |
title_full | An optimized procedure greatly improves EST vector contamination removal |
title_fullStr | An optimized procedure greatly improves EST vector contamination removal |
title_full_unstemmed | An optimized procedure greatly improves EST vector contamination removal |
title_short | An optimized procedure greatly improves EST vector contamination removal |
title_sort | optimized procedure greatly improves est vector contamination removal |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194723/ https://www.ncbi.nlm.nih.gov/pubmed/17997864 http://dx.doi.org/10.1186/1471-2164-8-416 |
work_keys_str_mv | AT chenyian anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval AT linchangchun anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval AT wangchindi anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval AT wuhuanbin anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval AT hwangpeiing anoptimizedproceduregreatlyimprovesestvectorcontaminationremoval AT chenyian optimizedproceduregreatlyimprovesestvectorcontaminationremoval AT linchangchun optimizedproceduregreatlyimprovesestvectorcontaminationremoval AT wangchindi optimizedproceduregreatlyimprovesestvectorcontaminationremoval AT wuhuanbin optimizedproceduregreatlyimprovesestvectorcontaminationremoval AT hwangpeiing optimizedproceduregreatlyimprovesestvectorcontaminationremoval |