Cargando…

Highly accurate long-read HiFi sequencing data for five complex genomes

The PacBio(®) HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Hon, Ting, Mars, Kristin, Young, Greg, Tsai, Yu-Chih, Karalius, Joseph W., Landolin, Jane M., Maurer, Nicholas, Kudrna, David, Hardigan, Michael A., Steiner, Cynthia C., Knapp, Steven J., Ware, Doreen, Shapiro, Beth, Peluso, Paul, Rank, David R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673114/
https://www.ncbi.nlm.nih.gov/pubmed/33203859
http://dx.doi.org/10.1038/s41597-020-00743-4
_version_ 1783611269063901184
author Hon, Ting
Mars, Kristin
Young, Greg
Tsai, Yu-Chih
Karalius, Joseph W.
Landolin, Jane M.
Maurer, Nicholas
Kudrna, David
Hardigan, Michael A.
Steiner, Cynthia C.
Knapp, Steven J.
Ware, Doreen
Shapiro, Beth
Peluso, Paul
Rank, David R.
author_facet Hon, Ting
Mars, Kristin
Young, Greg
Tsai, Yu-Chih
Karalius, Joseph W.
Landolin, Jane M.
Maurer, Nicholas
Kudrna, David
Hardigan, Michael A.
Steiner, Cynthia C.
Knapp, Steven J.
Ware, Doreen
Shapiro, Beth
Peluso, Paul
Rank, David R.
author_sort Hon, Ting
collection PubMed
description The PacBio(®) HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.
format Online
Article
Text
id pubmed-7673114
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-76731142020-11-20 Highly accurate long-read HiFi sequencing data for five complex genomes Hon, Ting Mars, Kristin Young, Greg Tsai, Yu-Chih Karalius, Joseph W. Landolin, Jane M. Maurer, Nicholas Kudrna, David Hardigan, Michael A. Steiner, Cynthia C. Knapp, Steven J. Ware, Doreen Shapiro, Beth Peluso, Paul Rank, David R. Sci Data Data Descriptor The PacBio(®) HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System. Nature Publishing Group UK 2020-11-17 /pmc/articles/PMC7673114/ /pubmed/33203859 http://dx.doi.org/10.1038/s41597-020-00743-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Hon, Ting
Mars, Kristin
Young, Greg
Tsai, Yu-Chih
Karalius, Joseph W.
Landolin, Jane M.
Maurer, Nicholas
Kudrna, David
Hardigan, Michael A.
Steiner, Cynthia C.
Knapp, Steven J.
Ware, Doreen
Shapiro, Beth
Peluso, Paul
Rank, David R.
Highly accurate long-read HiFi sequencing data for five complex genomes
title Highly accurate long-read HiFi sequencing data for five complex genomes
title_full Highly accurate long-read HiFi sequencing data for five complex genomes
title_fullStr Highly accurate long-read HiFi sequencing data for five complex genomes
title_full_unstemmed Highly accurate long-read HiFi sequencing data for five complex genomes
title_short Highly accurate long-read HiFi sequencing data for five complex genomes
title_sort highly accurate long-read hifi sequencing data for five complex genomes
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673114/
https://www.ncbi.nlm.nih.gov/pubmed/33203859
http://dx.doi.org/10.1038/s41597-020-00743-4
work_keys_str_mv AT honting highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT marskristin highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT younggreg highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT tsaiyuchih highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT karaliusjosephw highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT landolinjanem highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT maurernicholas highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT kudrnadavid highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT hardiganmichaela highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT steinercynthiac highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT knappstevenj highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT waredoreen highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT shapirobeth highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT pelusopaul highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes
AT rankdavidr highlyaccuratelongreadhifisequencingdataforfivecomplexgenomes