Cargando…

A software pipeline for processing and identification of fungal ITS sequences

BACKGROUND: Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consu...

Descripción completa

Detalles Bibliográficos
Autores principales: Nilsson, R Henrik, Bok, Gunilla, Ryberg, Martin, Kristiansson, Erik, Hallenberg, Nils
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2649129/
https://www.ncbi.nlm.nih.gov/pubmed/19146660
http://dx.doi.org/10.1186/1751-0473-4-1
_version_ 1782165024638238720
author Nilsson, R Henrik
Bok, Gunilla
Ryberg, Martin
Kristiansson, Erik
Hallenberg, Nils
author_facet Nilsson, R Henrik
Bok, Gunilla
Ryberg, Martin
Kristiansson, Erik
Hallenberg, Nils
author_sort Nilsson, R Henrik
collection PubMed
description BACKGROUND: Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software – in the form of an identification pipeline for large sets of fungal ITS sequences – developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 ITS sequences from fungi growing as epiphytes on building material. RESULTS: The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion ITS2 of the ITS region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the Ascomycota, with Penicillium and Aspergillus as the two most common genera. The ITS2 was found to indicate a different taxonomic affiliation than did the complete ITS region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences. CONCLUSION: The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups.
format Text
id pubmed-2649129
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26491292009-02-28 A software pipeline for processing and identification of fungal ITS sequences Nilsson, R Henrik Bok, Gunilla Ryberg, Martin Kristiansson, Erik Hallenberg, Nils Source Code Biol Med Software Review BACKGROUND: Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software – in the form of an identification pipeline for large sets of fungal ITS sequences – developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 ITS sequences from fungi growing as epiphytes on building material. RESULTS: The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion ITS2 of the ITS region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the Ascomycota, with Penicillium and Aspergillus as the two most common genera. The ITS2 was found to indicate a different taxonomic affiliation than did the complete ITS region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences. CONCLUSION: The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups. BioMed Central 2009-01-15 /pmc/articles/PMC2649129/ /pubmed/19146660 http://dx.doi.org/10.1186/1751-0473-4-1 Text en Copyright © 2009 Nilsson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Review
Nilsson, R Henrik
Bok, Gunilla
Ryberg, Martin
Kristiansson, Erik
Hallenberg, Nils
A software pipeline for processing and identification of fungal ITS sequences
title A software pipeline for processing and identification of fungal ITS sequences
title_full A software pipeline for processing and identification of fungal ITS sequences
title_fullStr A software pipeline for processing and identification of fungal ITS sequences
title_full_unstemmed A software pipeline for processing and identification of fungal ITS sequences
title_short A software pipeline for processing and identification of fungal ITS sequences
title_sort software pipeline for processing and identification of fungal its sequences
topic Software Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2649129/
https://www.ncbi.nlm.nih.gov/pubmed/19146660
http://dx.doi.org/10.1186/1751-0473-4-1
work_keys_str_mv AT nilssonrhenrik asoftwarepipelineforprocessingandidentificationoffungalitssequences
AT bokgunilla asoftwarepipelineforprocessingandidentificationoffungalitssequences
AT rybergmartin asoftwarepipelineforprocessingandidentificationoffungalitssequences
AT kristianssonerik asoftwarepipelineforprocessingandidentificationoffungalitssequences
AT hallenbergnils asoftwarepipelineforprocessingandidentificationoffungalitssequences
AT nilssonrhenrik softwarepipelineforprocessingandidentificationoffungalitssequences
AT bokgunilla softwarepipelineforprocessingandidentificationoffungalitssequences
AT rybergmartin softwarepipelineforprocessingandidentificationoffungalitssequences
AT kristianssonerik softwarepipelineforprocessingandidentificationoffungalitssequences
AT hallenbergnils softwarepipelineforprocessingandidentificationoffungalitssequences