Cargando…

INSurVeyor: improving insertion calling from short read sequencing data

Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-f...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajaby, Ramesh, Liu, Dong-Xu, Au, Chun Hang, Cheung, Yuen-Ting, Lau, Amy Yuet Ting, Yang, Qing-Yong, Sung, Wing-Kin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241795/
https://www.ncbi.nlm.nih.gov/pubmed/37277343
http://dx.doi.org/10.1038/s41467-023-38870-2
_version_ 1785054067972636672
author Rajaby, Ramesh
Liu, Dong-Xu
Au, Chun Hang
Cheung, Yuen-Ting
Lau, Amy Yuet Ting
Yang, Qing-Yong
Sung, Wing-Kin
author_facet Rajaby, Ramesh
Liu, Dong-Xu
Au, Chun Hang
Cheung, Yuen-Ting
Lau, Amy Yuet Ting
Yang, Qing-Yong
Sung, Wing-Kin
author_sort Rajaby, Ramesh
collection PubMed
description Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.
format Online
Article
Text
id pubmed-10241795
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-102417952023-06-07 INSurVeyor: improving insertion calling from short read sequencing data Rajaby, Ramesh Liu, Dong-Xu Au, Chun Hang Cheung, Yuen-Ting Lau, Amy Yuet Ting Yang, Qing-Yong Sung, Wing-Kin Nat Commun Article Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods. Nature Publishing Group UK 2023-06-05 /pmc/articles/PMC10241795/ /pubmed/37277343 http://dx.doi.org/10.1038/s41467-023-38870-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Rajaby, Ramesh
Liu, Dong-Xu
Au, Chun Hang
Cheung, Yuen-Ting
Lau, Amy Yuet Ting
Yang, Qing-Yong
Sung, Wing-Kin
INSurVeyor: improving insertion calling from short read sequencing data
title INSurVeyor: improving insertion calling from short read sequencing data
title_full INSurVeyor: improving insertion calling from short read sequencing data
title_fullStr INSurVeyor: improving insertion calling from short read sequencing data
title_full_unstemmed INSurVeyor: improving insertion calling from short read sequencing data
title_short INSurVeyor: improving insertion calling from short read sequencing data
title_sort insurveyor: improving insertion calling from short read sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241795/
https://www.ncbi.nlm.nih.gov/pubmed/37277343
http://dx.doi.org/10.1038/s41467-023-38870-2
work_keys_str_mv AT rajabyramesh insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT liudongxu insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT auchunhang insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT cheungyuenting insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT lauamyyuetting insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT yangqingyong insurveyorimprovinginsertioncallingfromshortreadsequencingdata
AT sungwingkin insurveyorimprovinginsertioncallingfromshortreadsequencingdata