Cargando…

CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies present...

Descripción completa

Detalles Bibliográficos
Autores principales: Borozan, Ivan, Wilson, Shane, Blanchette, Paola, Laflamme, Philippe, Watt, Stuart N, Krzyzanowski, Paul M, Sircoulomb, Fabrice, Rottapel, Robert, Branton, Philip E, Ferretti, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464663/
https://www.ncbi.nlm.nih.gov/pubmed/22901030
http://dx.doi.org/10.1186/1471-2105-13-206
_version_ 1782245448059191296
author Borozan, Ivan
Wilson, Shane
Blanchette, Paola
Laflamme, Philippe
Watt, Stuart N
Krzyzanowski, Paul M
Sircoulomb, Fabrice
Rottapel, Robert
Branton, Philip E
Ferretti, Vincent
author_facet Borozan, Ivan
Wilson, Shane
Blanchette, Paola
Laflamme, Philippe
Watt, Stuart N
Krzyzanowski, Paul M
Sircoulomb, Fabrice
Rottapel, Robert
Branton, Philip E
Ferretti, Vincent
author_sort Borozan, Ivan
collection PubMed
description BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. RESULTS: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. CONCLUSIONS: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro.
format Online
Article
Text
id pubmed-3464663
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34646632012-10-05 CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes Borozan, Ivan Wilson, Shane Blanchette, Paola Laflamme, Philippe Watt, Stuart N Krzyzanowski, Paul M Sircoulomb, Fabrice Rottapel, Robert Branton, Philip E Ferretti, Vincent BMC Bioinformatics Software BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. RESULTS: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. CONCLUSIONS: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. BioMed Central 2012-08-17 /pmc/articles/PMC3464663/ /pubmed/22901030 http://dx.doi.org/10.1186/1471-2105-13-206 Text en Copyright ©2012 Borozan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Borozan, Ivan
Wilson, Shane
Blanchette, Paola
Laflamme, Philippe
Watt, Stuart N
Krzyzanowski, Paul M
Sircoulomb, Fabrice
Rottapel, Robert
Branton, Philip E
Ferretti, Vincent
CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title_full CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title_fullStr CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title_full_unstemmed CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title_short CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
title_sort capsid: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464663/
https://www.ncbi.nlm.nih.gov/pubmed/22901030
http://dx.doi.org/10.1186/1471-2105-13-206
work_keys_str_mv AT borozanivan capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT wilsonshane capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT blanchettepaola capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT laflammephilippe capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT wattstuartn capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT krzyzanowskipaulm capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT sircoulombfabrice capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT rottapelrobert capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT brantonphilipe capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes
AT ferrettivincent capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes