Cargando…

Comparative genome analysis using sample-specific string detection in accurate long reads

MOTIVATION: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Khorsand, Parsoa, Denti, Luca, Bonizzoni, Paola, Chikhi, Rayan, Hormozdiari, Fereydoun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710709/ https://www.ncbi.nlm.nih.gov/pubmed/36700094 http://dx.doi.org/10.1093/bioadv/vbab005

_version_	1784841423712944128
author	Khorsand, Parsoa Denti, Luca Bonizzoni, Paola Chikhi, Rayan Hormozdiari, Fereydoun
author_facet	Khorsand, Parsoa Denti, Luca Bonizzoni, Paola Chikhi, Rayan Hormozdiari, Fereydoun
author_sort	Khorsand, Parsoa
collection	PubMed
description	MOTIVATION: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). RESULTS: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome (‘samples-specific’ strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). AVAILABILITY AND IMPLEMENTATION: Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format	Online Article Text
id	pubmed-9710709
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-97107092023-01-24 Comparative genome analysis using sample-specific string detection in accurate long reads Khorsand, Parsoa Denti, Luca Bonizzoni, Paola Chikhi, Rayan Hormozdiari, Fereydoun Bioinform Adv Original Article MOTIVATION: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). RESULTS: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome (‘samples-specific’ strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). AVAILABILITY AND IMPLEMENTATION: Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2021-05-31 /pmc/articles/PMC9710709/ /pubmed/36700094 http://dx.doi.org/10.1093/bioadv/vbab005 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Khorsand, Parsoa Denti, Luca Bonizzoni, Paola Chikhi, Rayan Hormozdiari, Fereydoun Comparative genome analysis using sample-specific string detection in accurate long reads
title	Comparative genome analysis using sample-specific string detection in accurate long reads
title_full	Comparative genome analysis using sample-specific string detection in accurate long reads
title_fullStr	Comparative genome analysis using sample-specific string detection in accurate long reads
title_full_unstemmed	Comparative genome analysis using sample-specific string detection in accurate long reads
title_short	Comparative genome analysis using sample-specific string detection in accurate long reads
title_sort	comparative genome analysis using sample-specific string detection in accurate long reads
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710709/ https://www.ncbi.nlm.nih.gov/pubmed/36700094 http://dx.doi.org/10.1093/bioadv/vbab005
work_keys_str_mv	AT khorsandparsoa comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads AT dentiluca comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads AT comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads AT bonizzonipaola comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads AT chikhirayan comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads AT hormozdiarifereydoun comparativegenomeanalysisusingsamplespecificstringdetectioninaccuratelongreads

Comparative genome analysis using sample-specific string detection in accurate long reads

Ejemplares similares