Cargando…
CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes
BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies present...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464663/ https://www.ncbi.nlm.nih.gov/pubmed/22901030 http://dx.doi.org/10.1186/1471-2105-13-206 |
_version_ | 1782245448059191296 |
---|---|
author | Borozan, Ivan Wilson, Shane Blanchette, Paola Laflamme, Philippe Watt, Stuart N Krzyzanowski, Paul M Sircoulomb, Fabrice Rottapel, Robert Branton, Philip E Ferretti, Vincent |
author_facet | Borozan, Ivan Wilson, Shane Blanchette, Paola Laflamme, Philippe Watt, Stuart N Krzyzanowski, Paul M Sircoulomb, Fabrice Rottapel, Robert Branton, Philip E Ferretti, Vincent |
author_sort | Borozan, Ivan |
collection | PubMed |
description | BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. RESULTS: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. CONCLUSIONS: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. |
format | Online Article Text |
id | pubmed-3464663 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34646632012-10-05 CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes Borozan, Ivan Wilson, Shane Blanchette, Paola Laflamme, Philippe Watt, Stuart N Krzyzanowski, Paul M Sircoulomb, Fabrice Rottapel, Robert Branton, Philip E Ferretti, Vincent BMC Bioinformatics Software BACKGROUND: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. RESULTS: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. CONCLUSIONS: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. BioMed Central 2012-08-17 /pmc/articles/PMC3464663/ /pubmed/22901030 http://dx.doi.org/10.1186/1471-2105-13-206 Text en Copyright ©2012 Borozan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Borozan, Ivan Wilson, Shane Blanchette, Paola Laflamme, Philippe Watt, Stuart N Krzyzanowski, Paul M Sircoulomb, Fabrice Rottapel, Robert Branton, Philip E Ferretti, Vincent CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title | CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title_full | CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title_fullStr | CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title_full_unstemmed | CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title_short | CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
title_sort | capsid: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464663/ https://www.ncbi.nlm.nih.gov/pubmed/22901030 http://dx.doi.org/10.1186/1471-2105-13-206 |
work_keys_str_mv | AT borozanivan capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT wilsonshane capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT blanchettepaola capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT laflammephilippe capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT wattstuartn capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT krzyzanowskipaulm capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT sircoulombfabrice capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT rottapelrobert capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT brantonphilipe capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes AT ferrettivincent capsidabioinformaticsplatformforcomputationalpathogensequenceidentificationinhumangenomesandtranscriptomes |