Cargando…

Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to b...

Descripción completa

Detalles Bibliográficos
Autores principales: Wymant, Chris, Blanquart, François, Golubchik, Tanya, Gall, Astrid, Bakker, Margreet, Bezemer, Daniela, Croucher, Nicholas J, Hall, Matthew, Hillebregt, Mariska, Ong, Swee Hoe, Ratmann, Oliver, Albert, Jan, Bannert, Norbert, Fellay, Jacques, Fransen, Katrien, Gourlay, Annabelle, Grabowski, M Kate, Gunsenheimer-Bartmeyer, Barbara, Günthard, Huldrych F, Kivelä, Pia, Kouyos, Roger, Laeyendecker, Oliver, Liitsola, Kirsi, Meyer, Laurence, Porter, Kholoud, Ristola, Matti, van Sighem, Ard, Berkhout, Ben, Cornelissen, Marion, Kellam, Paul, Reiss, Peter, Fraser, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961307/
https://www.ncbi.nlm.nih.gov/pubmed/29876136
http://dx.doi.org/10.1093/ve/vey007
_version_ 1783324706016854016
author Wymant, Chris
Blanquart, François
Golubchik, Tanya
Gall, Astrid
Bakker, Margreet
Bezemer, Daniela
Croucher, Nicholas J
Hall, Matthew
Hillebregt, Mariska
Ong, Swee Hoe
Ratmann, Oliver
Albert, Jan
Bannert, Norbert
Fellay, Jacques
Fransen, Katrien
Gourlay, Annabelle
Grabowski, M Kate
Gunsenheimer-Bartmeyer, Barbara
Günthard, Huldrych F
Kivelä, Pia
Kouyos, Roger
Laeyendecker, Oliver
Liitsola, Kirsi
Meyer, Laurence
Porter, Kholoud
Ristola, Matti
van Sighem, Ard
Berkhout, Ben
Cornelissen, Marion
Kellam, Paul
Reiss, Peter
Fraser, Christophe
author_facet Wymant, Chris
Blanquart, François
Golubchik, Tanya
Gall, Astrid
Bakker, Margreet
Bezemer, Daniela
Croucher, Nicholas J
Hall, Matthew
Hillebregt, Mariska
Ong, Swee Hoe
Ratmann, Oliver
Albert, Jan
Bannert, Norbert
Fellay, Jacques
Fransen, Katrien
Gourlay, Annabelle
Grabowski, M Kate
Gunsenheimer-Bartmeyer, Barbara
Günthard, Huldrych F
Kivelä, Pia
Kouyos, Roger
Laeyendecker, Oliver
Liitsola, Kirsi
Meyer, Laurence
Porter, Kholoud
Ristola, Matti
van Sighem, Ard
Berkhout, Ben
Cornelissen, Marion
Kellam, Paul
Reiss, Peter
Fraser, Christophe
author_sort Wymant, Chris
collection PubMed
description Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
format Online
Article
Text
id pubmed-5961307
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59613072018-06-06 Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver Wymant, Chris Blanquart, François Golubchik, Tanya Gall, Astrid Bakker, Margreet Bezemer, Daniela Croucher, Nicholas J Hall, Matthew Hillebregt, Mariska Ong, Swee Hoe Ratmann, Oliver Albert, Jan Bannert, Norbert Fellay, Jacques Fransen, Katrien Gourlay, Annabelle Grabowski, M Kate Gunsenheimer-Bartmeyer, Barbara Günthard, Huldrych F Kivelä, Pia Kouyos, Roger Laeyendecker, Oliver Liitsola, Kirsi Meyer, Laurence Porter, Kholoud Ristola, Matti van Sighem, Ard Berkhout, Ben Cornelissen, Marion Kellam, Paul Reiss, Peter Fraser, Christophe Virus Evol Resources Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. Oxford University Press 2018-05-18 /pmc/articles/PMC5961307/ /pubmed/29876136 http://dx.doi.org/10.1093/ve/vey007 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Resources
Wymant, Chris
Blanquart, François
Golubchik, Tanya
Gall, Astrid
Bakker, Margreet
Bezemer, Daniela
Croucher, Nicholas J
Hall, Matthew
Hillebregt, Mariska
Ong, Swee Hoe
Ratmann, Oliver
Albert, Jan
Bannert, Norbert
Fellay, Jacques
Fransen, Katrien
Gourlay, Annabelle
Grabowski, M Kate
Gunsenheimer-Bartmeyer, Barbara
Günthard, Huldrych F
Kivelä, Pia
Kouyos, Roger
Laeyendecker, Oliver
Liitsola, Kirsi
Meyer, Laurence
Porter, Kholoud
Ristola, Matti
van Sighem, Ard
Berkhout, Ben
Cornelissen, Marion
Kellam, Paul
Reiss, Peter
Fraser, Christophe
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_full Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_fullStr Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_full_unstemmed Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_short Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_sort easy and accurate reconstruction of whole hiv genomes from short-read sequence data with shiver
topic Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961307/
https://www.ncbi.nlm.nih.gov/pubmed/29876136
http://dx.doi.org/10.1093/ve/vey007
work_keys_str_mv AT wymantchris easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT blanquartfrancois easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT golubchiktanya easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gallastrid easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bakkermargreet easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bezemerdaniela easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT crouchernicholasj easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT hallmatthew easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT hillebregtmariska easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ongsweehoe easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ratmannoliver easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT albertjan easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bannertnorbert easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fellayjacques easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fransenkatrien easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gourlayannabelle easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT grabowskimkate easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gunsenheimerbartmeyerbarbara easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gunthardhuldrychf easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kivelapia easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kouyosroger easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT laeyendeckeroliver easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT liitsolakirsi easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT meyerlaurence easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT porterkholoud easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ristolamatti easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT vansighemard easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT berkhoutben easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT cornelissenmarion easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kellampaul easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT reisspeter easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fraserchristophe easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver