Cargando…

High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads

Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Bo, Yang, Xiaofei, Jia, Yanyan, Xu, Yu, Jia, Peng, Dang, Ningxin, Wang, Songbo, Xu, Tun, Zhao, Xixi, Gao, Shenghan, Dong, Quanbin, Ye, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510872/
https://www.ncbi.nlm.nih.gov/pubmed/34487862
http://dx.doi.org/10.1016/j.gpb.2021.08.003
_version_ 1784797537754939392
author Wang, Bo
Yang, Xiaofei
Jia, Yanyan
Xu, Yu
Jia, Peng
Dang, Ningxin
Wang, Songbo
Xu, Tun
Zhao, Xixi
Gao, Shenghan
Dong, Quanbin
Ye, Kai
author_facet Wang, Bo
Yang, Xiaofei
Jia, Yanyan
Xu, Yu
Jia, Peng
Dang, Ningxin
Wang, Songbo
Xu, Tun
Zhao, Xixi
Gao, Shenghan
Dong, Quanbin
Ye, Kai
author_sort Wang, Bo
collection PubMed
description Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants.
format Online
Article
Text
id pubmed-9510872
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-95108722022-09-27 High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads Wang, Bo Yang, Xiaofei Jia, Yanyan Xu, Yu Jia, Peng Dang, Ningxin Wang, Songbo Xu, Tun Zhao, Xixi Gao, Shenghan Dong, Quanbin Ye, Kai Genomics Proteomics Bioinformatics Original Research Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants. Elsevier 2022-02 2021-09-03 /pmc/articles/PMC9510872/ /pubmed/34487862 http://dx.doi.org/10.1016/j.gpb.2021.08.003 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Original Research
Wang, Bo
Yang, Xiaofei
Jia, Yanyan
Xu, Yu
Jia, Peng
Dang, Ningxin
Wang, Songbo
Xu, Tun
Zhao, Xixi
Gao, Shenghan
Dong, Quanbin
Ye, Kai
High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title_full High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title_fullStr High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title_full_unstemmed High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title_short High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
title_sort high-quality arabidopsis thaliana genome assembly with nanopore and hifi long reads
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510872/
https://www.ncbi.nlm.nih.gov/pubmed/34487862
http://dx.doi.org/10.1016/j.gpb.2021.08.003
work_keys_str_mv AT wangbo highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT yangxiaofei highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT jiayanyan highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT xuyu highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT jiapeng highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT dangningxin highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT wangsongbo highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT xutun highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT zhaoxixi highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT gaoshenghan highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT dongquanbin highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads
AT yekai highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads