Cargando…

Joining Illumina paired-end reads for classifying phylogenetic marker sequences

BACKGROUND: Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Tsunglin, Chen, Chen-Yu, Chen-Deng, An, Chen, Yi-Lin, Wang, Jiu-Yao, Hou, Yung-I, Lin, Min-Ching
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7071698/ https://www.ncbi.nlm.nih.gov/pubmed/32171248 http://dx.doi.org/10.1186/s12859-020-3445-6

_version_	1783506261753462784
author	Liu, Tsunglin Chen, Chen-Yu Chen-Deng, An Chen, Yi-Lin Wang, Jiu-Yao Hou, Yung-I Lin, Min-Ching
author_facet	Liu, Tsunglin Chen, Chen-Yu Chen-Deng, An Chen, Yi-Lin Wang, Jiu-Yao Hou, Yung-I Lin, Min-Ching
author_sort	Liu, Tsunglin
collection	PubMed
description	BACKGROUND: Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported. RESULTS: We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation. CONCLUSIONS: When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited.
format	Online Article Text
id	pubmed-7071698
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70716982020-03-18 Joining Illumina paired-end reads for classifying phylogenetic marker sequences Liu, Tsunglin Chen, Chen-Yu Chen-Deng, An Chen, Yi-Lin Wang, Jiu-Yao Hou, Yung-I Lin, Min-Ching BMC Bioinformatics Methodology Article BACKGROUND: Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported. RESULTS: We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation. CONCLUSIONS: When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited. BioMed Central 2020-03-14 /pmc/articles/PMC7071698/ /pubmed/32171248 http://dx.doi.org/10.1186/s12859-020-3445-6 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Liu, Tsunglin Chen, Chen-Yu Chen-Deng, An Chen, Yi-Lin Wang, Jiu-Yao Hou, Yung-I Lin, Min-Ching Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title	Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title_full	Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title_fullStr	Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title_full_unstemmed	Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title_short	Joining Illumina paired-end reads for classifying phylogenetic marker sequences
title_sort	joining illumina paired-end reads for classifying phylogenetic marker sequences
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7071698/ https://www.ncbi.nlm.nih.gov/pubmed/32171248 http://dx.doi.org/10.1186/s12859-020-3445-6
work_keys_str_mv	AT liutsunglin joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT chenchenyu joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT chendengan joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT chenyilin joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT wangjiuyao joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT houyungi joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences AT linminching joiningilluminapairedendreadsforclassifyingphylogeneticmarkersequences

Joining Illumina paired-end reads for classifying phylogenetic marker sequences

Ejemplares similares