Cargando…

Assembly-free discovery of human novel sequences using long reads

DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Qiuhui, Yan, Bin, Lam, Tak-Wah, Luo, Ruibang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9700288/
https://www.ncbi.nlm.nih.gov/pubmed/36308393
http://dx.doi.org/10.1093/dnares/dsac039
_version_ 1784839277591396352
author Li, Qiuhui
Yan, Bin
Lam, Tak-Wah
Luo, Ruibang
author_facet Li, Qiuhui
Yan, Bin
Lam, Tak-Wah
Luo, Ruibang
author_sort Li, Qiuhui
collection PubMed
description DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.
format Online
Article
Text
id pubmed-9700288
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97002882022-11-29 Assembly-free discovery of human novel sequences using long reads Li, Qiuhui Yan, Bin Lam, Tak-Wah Luo, Ruibang DNA Res Research Article DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences. Oxford University Press 2022-10-29 /pmc/articles/PMC9700288/ /pubmed/36308393 http://dx.doi.org/10.1093/dnares/dsac039 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Qiuhui
Yan, Bin
Lam, Tak-Wah
Luo, Ruibang
Assembly-free discovery of human novel sequences using long reads
title Assembly-free discovery of human novel sequences using long reads
title_full Assembly-free discovery of human novel sequences using long reads
title_fullStr Assembly-free discovery of human novel sequences using long reads
title_full_unstemmed Assembly-free discovery of human novel sequences using long reads
title_short Assembly-free discovery of human novel sequences using long reads
title_sort assembly-free discovery of human novel sequences using long reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9700288/
https://www.ncbi.nlm.nih.gov/pubmed/36308393
http://dx.doi.org/10.1093/dnares/dsac039
work_keys_str_mv AT liqiuhui assemblyfreediscoveryofhumannovelsequencesusinglongreads
AT yanbin assemblyfreediscoveryofhumannovelsequencesusinglongreads
AT lamtakwah assemblyfreediscoveryofhumannovelsequencesusinglongreads
AT luoruibang assemblyfreediscoveryofhumannovelsequencesusinglongreads