Cargando…

Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes

The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice referenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Fan, Xue, Hongzhang, Dong, Xiaorui, Li, Min, Zheng, Xiaoming, Li, Zhikang, Xu, Jianlong, Wang, Wensheng, Wei, Chaochun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9104699/
https://www.ncbi.nlm.nih.gov/pubmed/35396275
http://dx.doi.org/10.1101/gr.276015.121
_version_ 1784707858010472448
author Zhang, Fan
Xue, Hongzhang
Dong, Xiaorui
Li, Min
Zheng, Xiaoming
Li, Zhikang
Xu, Jianlong
Wang, Wensheng
Wei, Chaochun
author_facet Zhang, Fan
Xue, Hongzhang
Dong, Xiaorui
Li, Min
Zheng, Xiaoming
Li, Zhikang
Xu, Jianlong
Wang, Wensheng
Wei, Chaochun
author_sort Zhang, Fan
collection PubMed
description The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ∼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.
format Online
Article
Text
id pubmed-9104699
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-91046992022-11-01 Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes Zhang, Fan Xue, Hongzhang Dong, Xiaorui Li, Min Zheng, Xiaoming Li, Zhikang Xu, Jianlong Wang, Wensheng Wei, Chaochun Genome Res Research The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ∼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies. Cold Spring Harbor Laboratory Press 2022-05 /pmc/articles/PMC9104699/ /pubmed/35396275 http://dx.doi.org/10.1101/gr.276015.121 Text en © 2022 Zhang et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Research
Zhang, Fan
Xue, Hongzhang
Dong, Xiaorui
Li, Min
Zheng, Xiaoming
Li, Zhikang
Xu, Jianlong
Wang, Wensheng
Wei, Chaochun
Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title_full Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title_fullStr Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title_full_unstemmed Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title_short Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
title_sort long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9104699/
https://www.ncbi.nlm.nih.gov/pubmed/35396275
http://dx.doi.org/10.1101/gr.276015.121
work_keys_str_mv AT zhangfan longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT xuehongzhang longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT dongxiaorui longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT limin longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT zhengxiaoming longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT lizhikang longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT xujianlong longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT wangwensheng longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes
AT weichaochun longreadsequencingof111ricegenomesrevealssignificantlylargerpangenomes