Cargando…
High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510872/ https://www.ncbi.nlm.nih.gov/pubmed/34487862 http://dx.doi.org/10.1016/j.gpb.2021.08.003 |
_version_ | 1784797537754939392 |
---|---|
author | Wang, Bo Yang, Xiaofei Jia, Yanyan Xu, Yu Jia, Peng Dang, Ningxin Wang, Songbo Xu, Tun Zhao, Xixi Gao, Shenghan Dong, Quanbin Ye, Kai |
author_facet | Wang, Bo Yang, Xiaofei Jia, Yanyan Xu, Yu Jia, Peng Dang, Ningxin Wang, Songbo Xu, Tun Zhao, Xixi Gao, Shenghan Dong, Quanbin Ye, Kai |
author_sort | Wang, Bo |
collection | PubMed |
description | Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants. |
format | Online Article Text |
id | pubmed-9510872 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-95108722022-09-27 High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads Wang, Bo Yang, Xiaofei Jia, Yanyan Xu, Yu Jia, Peng Dang, Ningxin Wang, Songbo Xu, Tun Zhao, Xixi Gao, Shenghan Dong, Quanbin Ye, Kai Genomics Proteomics Bioinformatics Original Research Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants. Elsevier 2022-02 2021-09-03 /pmc/articles/PMC9510872/ /pubmed/34487862 http://dx.doi.org/10.1016/j.gpb.2021.08.003 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Original Research Wang, Bo Yang, Xiaofei Jia, Yanyan Xu, Yu Jia, Peng Dang, Ningxin Wang, Songbo Xu, Tun Zhao, Xixi Gao, Shenghan Dong, Quanbin Ye, Kai High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title | High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title_full | High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title_fullStr | High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title_full_unstemmed | High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title_short | High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads |
title_sort | high-quality arabidopsis thaliana genome assembly with nanopore and hifi long reads |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510872/ https://www.ncbi.nlm.nih.gov/pubmed/34487862 http://dx.doi.org/10.1016/j.gpb.2021.08.003 |
work_keys_str_mv | AT wangbo highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT yangxiaofei highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT jiayanyan highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT xuyu highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT jiapeng highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT dangningxin highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT wangsongbo highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT xutun highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT zhaoxixi highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT gaoshenghan highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT dongquanbin highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads AT yekai highqualityarabidopsisthalianagenomeassemblywithnanoporeandhifilongreads |