Cargando…

Long-read sequence assembly: a technical evaluation in barley

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Mascher, Martin, Wicker, Thomas, Jenkins, Jerry, Plott, Christopher, Lux, Thomas, Koh, Chu Shin, Ens, Jennifer, Gundlach, Heidrun, Boston, Lori B, Tulpová, Zuzana, Holden, Samuel, Hernández-Pinzón, Inmaculada, Scholz, Uwe, Mayer, Klaus F X, Spannagl, Manuel, Pozniak, Curtis J, Sharpe, Andrew G, Šimková, Hana, Moscou, Matthew J, Grimwood, Jane, Schmutz, Jeremy, Stein, Nils
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8290290/
https://www.ncbi.nlm.nih.gov/pubmed/33710295
http://dx.doi.org/10.1093/plcell/koab077
Descripción
Sumario:Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.