Cargando…

Efficient detection and assembly of non-reference DNA sequences with synthetic long reads

Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-rea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Meleshko, Dmitry, Yang, Rui, Marks, Patrick, Williams, Stephen, Hajirasouliha, Iman
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561269/ https://www.ncbi.nlm.nih.gov/pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653

_version_	1784807914787045376
author	Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman
author_facet	Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman
author_sort	Meleshko, Dmitry
collection	PubMed
description	Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
format	Online Article Text
id	pubmed-9561269
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-95612692022-10-18 Efficient detection and assembly of non-reference DNA sequences with synthetic long reads Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman Nucleic Acids Res Methods Online Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact. Oxford University Press 2022-08-04 /pmc/articles/PMC9561269/ /pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Online Meleshko, Dmitry Yang, Rui Marks, Patrick Williams, Stephen Hajirasouliha, Iman Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title	Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title_full	Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title_fullStr	Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title_full_unstemmed	Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title_short	Efficient detection and assembly of non-reference DNA sequences with synthetic long reads
title_sort	efficient detection and assembly of non-reference dna sequences with synthetic long reads
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561269/ https://www.ncbi.nlm.nih.gov/pubmed/35924489 http://dx.doi.org/10.1093/nar/gkac653
work_keys_str_mv	AT meleshkodmitry efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT yangrui efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT markspatrick efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT williamsstephen efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads AT hajirasoulihaiman efficientdetectionandassemblyofnonreferencednasequenceswithsyntheticlongreads

Efficient detection and assembly of non-reference DNA sequences with synthetic long reads

Ejemplares similares