Cargando…

Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing

BACKGROUND: Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. A...

Descripción completa

Detalles Bibliográficos
Autores principales: Prosperi, Mattia CF, Prosperi, Luciano, Bruselles, Alessandro, Abbate, Isabella, Rozera, Gabriella, Vincenti, Donatella, Solmone, Maria Carmela, Capobianchi, Maria Rosaria, Ulivi, Giovanni
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022557/
https://www.ncbi.nlm.nih.gov/pubmed/21208435
http://dx.doi.org/10.1186/1471-2105-12-5
_version_ 1782196516666998784
author Prosperi, Mattia CF
Prosperi, Luciano
Bruselles, Alessandro
Abbate, Isabella
Rozera, Gabriella
Vincenti, Donatella
Solmone, Maria Carmela
Capobianchi, Maria Rosaria
Ulivi, Giovanni
author_facet Prosperi, Mattia CF
Prosperi, Luciano
Bruselles, Alessandro
Abbate, Isabella
Rozera, Gabriella
Vincenti, Donatella
Solmone, Maria Carmela
Capobianchi, Maria Rosaria
Ulivi, Giovanni
author_sort Prosperi, Mattia CF
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called in-silico recombinants. RESULTS: The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients. CONCLUSIONS: The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.
format Text
id pubmed-3022557
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30225572011-01-21 Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing Prosperi, Mattia CF Prosperi, Luciano Bruselles, Alessandro Abbate, Isabella Rozera, Gabriella Vincenti, Donatella Solmone, Maria Carmela Capobianchi, Maria Rosaria Ulivi, Giovanni BMC Bioinformatics Methodology Article BACKGROUND: Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called in-silico recombinants. RESULTS: The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients. CONCLUSIONS: The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors. BioMed Central 2011-01-05 /pmc/articles/PMC3022557/ /pubmed/21208435 http://dx.doi.org/10.1186/1471-2105-12-5 Text en Copyright ©2011 Prosperi et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Prosperi, Mattia CF
Prosperi, Luciano
Bruselles, Alessandro
Abbate, Isabella
Rozera, Gabriella
Vincenti, Donatella
Solmone, Maria Carmela
Capobianchi, Maria Rosaria
Ulivi, Giovanni
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_full Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_fullStr Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_full_unstemmed Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_short Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
title_sort combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022557/
https://www.ncbi.nlm.nih.gov/pubmed/21208435
http://dx.doi.org/10.1186/1471-2105-12-5
work_keys_str_mv AT prosperimattiacf combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT prosperiluciano combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT brusellesalessandro combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT abbateisabella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT rozeragabriella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT vincentidonatella combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT solmonemariacarmela combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT capobianchimariarosaria combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing
AT ulivigiovanni combinatorialanalysisandalgorithmsforquasispeciesreconstructionusingnextgenerationsequencing