Cargando…
Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have...
Autores principales: | , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776208/ https://www.ncbi.nlm.nih.gov/pubmed/26907326 http://dx.doi.org/10.3390/v8020053 |
_version_ | 1782419114214555648 |
---|---|
author | Friis-Nielsen, Jens Kjartansdóttir, Kristín Rós Mollerup, Sarah Asplund, Maria Mourier, Tobias Jensen, Randi Holm Hansen, Thomas Arn Rey-Iglesia, Alba Richter, Stine Raith Nielsen, Ida Broman Alquezar-Planas, David E. Olsen, Pernille V. S. Vinner, Lasse Fridholm, Helena Nielsen, Lars Peter Willerslev, Eske Sicheritz-Pontén, Thomas Lund, Ole Hansen, Anders Johannes Izarzugaza, Jose M. G. Brunak, Søren |
author_facet | Friis-Nielsen, Jens Kjartansdóttir, Kristín Rós Mollerup, Sarah Asplund, Maria Mourier, Tobias Jensen, Randi Holm Hansen, Thomas Arn Rey-Iglesia, Alba Richter, Stine Raith Nielsen, Ida Broman Alquezar-Planas, David E. Olsen, Pernille V. S. Vinner, Lasse Fridholm, Helena Nielsen, Lars Peter Willerslev, Eske Sicheritz-Pontén, Thomas Lund, Ole Hansen, Anders Johannes Izarzugaza, Jose M. G. Brunak, Søren |
author_sort | Friis-Nielsen, Jens |
collection | PubMed |
description | Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified. |
format | Online Article Text |
id | pubmed-4776208 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-47762082016-03-09 Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers Friis-Nielsen, Jens Kjartansdóttir, Kristín Rós Mollerup, Sarah Asplund, Maria Mourier, Tobias Jensen, Randi Holm Hansen, Thomas Arn Rey-Iglesia, Alba Richter, Stine Raith Nielsen, Ida Broman Alquezar-Planas, David E. Olsen, Pernille V. S. Vinner, Lasse Fridholm, Helena Nielsen, Lars Peter Willerslev, Eske Sicheritz-Pontén, Thomas Lund, Ole Hansen, Anders Johannes Izarzugaza, Jose M. G. Brunak, Søren Viruses Article Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified. MDPI 2016-02-19 /pmc/articles/PMC4776208/ /pubmed/26907326 http://dx.doi.org/10.3390/v8020053 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Friis-Nielsen, Jens Kjartansdóttir, Kristín Rós Mollerup, Sarah Asplund, Maria Mourier, Tobias Jensen, Randi Holm Hansen, Thomas Arn Rey-Iglesia, Alba Richter, Stine Raith Nielsen, Ida Broman Alquezar-Planas, David E. Olsen, Pernille V. S. Vinner, Lasse Fridholm, Helena Nielsen, Lars Peter Willerslev, Eske Sicheritz-Pontén, Thomas Lund, Ole Hansen, Anders Johannes Izarzugaza, Jose M. G. Brunak, Søren Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title | Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title_full | Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title_fullStr | Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title_full_unstemmed | Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title_short | Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers |
title_sort | identification of known and novel recurrent viral sequences in data from multiple patients and multiple cancers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776208/ https://www.ncbi.nlm.nih.gov/pubmed/26907326 http://dx.doi.org/10.3390/v8020053 |
work_keys_str_mv | AT friisnielsenjens identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT kjartansdottirkristinros identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT mollerupsarah identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT asplundmaria identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT mouriertobias identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT jensenrandiholm identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT hansenthomasarn identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT reyiglesiaalba identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT richterstineraith identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT nielsenidabroman identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT alquezarplanasdavide identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT olsenpernillevs identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT vinnerlasse identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT fridholmhelena identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT nielsenlarspeter identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT willersleveske identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT sicheritzpontenthomas identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT lundole identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT hansenandersjohannes identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT izarzugazajosemg identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers AT brunaksøren identificationofknownandnovelrecurrentviralsequencesindatafrommultiplepatientsandmultiplecancers |