Cargando…
Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-rea...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561269/ https://www.ncbi.nlm.nih.gov/pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653 |
_version_ | 1784807914787045376 |
---|---|
author | Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman |
author_facet | Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman |
author_sort | Meleshko, Dmitry |
collection | PubMed |
description | Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact. |
format | Online Article Text |
id | pubmed-9561269 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-95612692022-10-18 Efficient detection and assembly of non-reference DNA sequences with synthetic long reads Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman Nucleic Acids Res Methods Online Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact. Oxford University Press 2022-08-04 /pmc/articles/PMC9561269/ /pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title | Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title_full | Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title_fullStr | Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title_full_unstemmed | Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title_short | Efficient detection and assembly of non-reference DNA sequences with synthetic long reads |
title_sort | efficient detection and assembly of non-reference dna sequences with synthetic long reads |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561269/ https://www.ncbi.nlm.nih.gov/pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653 |
work_keys_str_mv | AT meleshkodmitry efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT yangrui efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT markspatrick efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT williamsstephen efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT hajirasoulihaiman efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads |