Cargando…

MTG-Link: leveraging barcode information from linked-reads to assemble specific loci

BACKGROUND: Local assembly with short and long reads has proven to be very useful in many applications: reconstruction of the sequence of a locus of interest, gap-filling in draft assemblies, as well as alternative allele reconstruction of large Structural Variants. Whereas linked-read technologies...

Descripción completa

Detalles Bibliográficos
Autores principales: Guichard, Anne, Legeai, Fabrice, Tagu, Denis, Lemaitre, Claire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347852/
https://www.ncbi.nlm.nih.gov/pubmed/37452278
http://dx.doi.org/10.1186/s12859-023-05395-w
_version_ 1785073613566640128
author Guichard, Anne
Legeai, Fabrice
Tagu, Denis
Lemaitre, Claire
author_facet Guichard, Anne
Legeai, Fabrice
Tagu, Denis
Lemaitre, Claire
author_sort Guichard, Anne
collection PubMed
description BACKGROUND: Local assembly with short and long reads has proven to be very useful in many applications: reconstruction of the sequence of a locus of interest, gap-filling in draft assemblies, as well as alternative allele reconstruction of large Structural Variants. Whereas linked-read technologies have a great potential to assemble specific loci as they provide long-range information while maintaining the power and accuracy of short-read sequencing, there is a lack of local assembly tools for linked-read data. RESULTS: We present MTG-Link, a novel local assembly tool dedicated to linked-reads. The originality of the method lies in its read subsampling step which takes advantage of the barcode information contained in linked-reads mapped in flanking regions. We validated our approach on several datasets from different linked-read technologies. We show that MTG-Link is able to assemble successfully large sequences, up to dozens of Kb. We also demonstrate that the read subsampling step of MTG-Link considerably improves the local assembly of specific loci compared to other existing short-read local assembly tools. Furthermore, MTG-Link was able to fully characterize large insertion variants and deletion breakpoints in a human genome and to reconstruct dark regions in clinically-relevant human genes. It also improved the contiguity of a 1.3 Mb locus of biological interest in several individual genomes of the mimetic butterfly Heliconius numata. CONCLUSIONS: MTG-Link is an efficient local assembly tool designed for different linked-read sequencing technologies. MTG-Link source code is available at https://github.com/anne-gcd/MTG-Link and as a Bioconda package. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05395-w.
format Online
Article
Text
id pubmed-10347852
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103478522023-07-15 MTG-Link: leveraging barcode information from linked-reads to assemble specific loci Guichard, Anne Legeai, Fabrice Tagu, Denis Lemaitre, Claire BMC Bioinformatics Software BACKGROUND: Local assembly with short and long reads has proven to be very useful in many applications: reconstruction of the sequence of a locus of interest, gap-filling in draft assemblies, as well as alternative allele reconstruction of large Structural Variants. Whereas linked-read technologies have a great potential to assemble specific loci as they provide long-range information while maintaining the power and accuracy of short-read sequencing, there is a lack of local assembly tools for linked-read data. RESULTS: We present MTG-Link, a novel local assembly tool dedicated to linked-reads. The originality of the method lies in its read subsampling step which takes advantage of the barcode information contained in linked-reads mapped in flanking regions. We validated our approach on several datasets from different linked-read technologies. We show that MTG-Link is able to assemble successfully large sequences, up to dozens of Kb. We also demonstrate that the read subsampling step of MTG-Link considerably improves the local assembly of specific loci compared to other existing short-read local assembly tools. Furthermore, MTG-Link was able to fully characterize large insertion variants and deletion breakpoints in a human genome and to reconstruct dark regions in clinically-relevant human genes. It also improved the contiguity of a 1.3 Mb locus of biological interest in several individual genomes of the mimetic butterfly Heliconius numata. CONCLUSIONS: MTG-Link is an efficient local assembly tool designed for different linked-read sequencing technologies. MTG-Link source code is available at https://github.com/anne-gcd/MTG-Link and as a Bioconda package. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05395-w. BioMed Central 2023-07-14 /pmc/articles/PMC10347852/ /pubmed/37452278 http://dx.doi.org/10.1186/s12859-023-05395-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Guichard, Anne
Legeai, Fabrice
Tagu, Denis
Lemaitre, Claire
MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title_full MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title_fullStr MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title_full_unstemmed MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title_short MTG-Link: leveraging barcode information from linked-reads to assemble specific loci
title_sort mtg-link: leveraging barcode information from linked-reads to assemble specific loci
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347852/
https://www.ncbi.nlm.nih.gov/pubmed/37452278
http://dx.doi.org/10.1186/s12859-023-05395-w
work_keys_str_mv AT guichardanne mtglinkleveragingbarcodeinformationfromlinkedreadstoassemblespecificloci
AT legeaifabrice mtglinkleveragingbarcodeinformationfromlinkedreadstoassemblespecificloci
AT tagudenis mtglinkleveragingbarcodeinformationfromlinkedreadstoassemblespecificloci
AT lemaitreclaire mtglinkleveragingbarcodeinformationfromlinkedreadstoassemblespecificloci