Cargando…

Pattern matching for high precision detection of LINE-1s in human genomes

BACKGROUND: Long interspersed element 1 (LINE-1 or L1) retrotransposons are mobile elements that constitute 17–20% of the human genome. Strong correlations between abnormal L1 expression and several human diseases have been reported. This has motivated increasing interest in accurate quantification...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez, Juan O., Seguel, Jaime, Chamorro, Andres, Ramos, Kenneth S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472350/
https://www.ncbi.nlm.nih.gov/pubmed/36100885
http://dx.doi.org/10.1186/s12859-022-04907-4
_version_ 1784789286463209472
author Lopez, Juan O.
Seguel, Jaime
Chamorro, Andres
Ramos, Kenneth S.
author_facet Lopez, Juan O.
Seguel, Jaime
Chamorro, Andres
Ramos, Kenneth S.
author_sort Lopez, Juan O.
collection PubMed
description BACKGROUND: Long interspersed element 1 (LINE-1 or L1) retrotransposons are mobile elements that constitute 17–20% of the human genome. Strong correlations between abnormal L1 expression and several human diseases have been reported. This has motivated increasing interest in accurate quantification of the number of L1 copies present in any given biologic specimen. A main obstacle toward this aim is that L1s are relatively long DNA segments with regions of high variability, or largely present in the human genome as truncated fragments. These particularities render traditional alignment strategies, such as seed-and-extend inefficient, as the number of segments that are similar to L1s explodes exponentially. This study uses the pattern matching methodology for more accurate identification of L1s. We validate experimentally the superiority of pattern matching for L1 detection over alternative methods and discuss some of its potential applications. RESULTS: Pattern matching detected full-length L1 copies with high precision, reasonable computational time, and no prior input information. It also detected truncated and significantly altered copies of L1 with relatively high precision. The method was effectively used to annotate L1s in a target genome and to calculate copy number variation with respect to a reference genome. Crucial to the success of implementation was the selection of a small set of k-mer probes from a set of sequences presenting a stable pattern of distribution in the genome. As in seed-and-extend methods, the pattern matching algorithm sowed these k-mer probes, but instead of using heuristic extensions around the seeds, the analysis was based on distribution patterns within the genome. The desired level of precision could be adjusted, with some loss of recall. CONCLUSION: Pattern matching is more efficient than seed-and-extend methods for the detection of L1 segments whose characterization depends on a finite set of sequences with common areas of low variability. We propose that pattern matching may help establish correlations between L1 copy number and disease states associated with L1 mobilization and evolution.
format Online
Article
Text
id pubmed-9472350
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94723502022-09-15 Pattern matching for high precision detection of LINE-1s in human genomes Lopez, Juan O. Seguel, Jaime Chamorro, Andres Ramos, Kenneth S. BMC Bioinformatics Research BACKGROUND: Long interspersed element 1 (LINE-1 or L1) retrotransposons are mobile elements that constitute 17–20% of the human genome. Strong correlations between abnormal L1 expression and several human diseases have been reported. This has motivated increasing interest in accurate quantification of the number of L1 copies present in any given biologic specimen. A main obstacle toward this aim is that L1s are relatively long DNA segments with regions of high variability, or largely present in the human genome as truncated fragments. These particularities render traditional alignment strategies, such as seed-and-extend inefficient, as the number of segments that are similar to L1s explodes exponentially. This study uses the pattern matching methodology for more accurate identification of L1s. We validate experimentally the superiority of pattern matching for L1 detection over alternative methods and discuss some of its potential applications. RESULTS: Pattern matching detected full-length L1 copies with high precision, reasonable computational time, and no prior input information. It also detected truncated and significantly altered copies of L1 with relatively high precision. The method was effectively used to annotate L1s in a target genome and to calculate copy number variation with respect to a reference genome. Crucial to the success of implementation was the selection of a small set of k-mer probes from a set of sequences presenting a stable pattern of distribution in the genome. As in seed-and-extend methods, the pattern matching algorithm sowed these k-mer probes, but instead of using heuristic extensions around the seeds, the analysis was based on distribution patterns within the genome. The desired level of precision could be adjusted, with some loss of recall. CONCLUSION: Pattern matching is more efficient than seed-and-extend methods for the detection of L1 segments whose characterization depends on a finite set of sequences with common areas of low variability. We propose that pattern matching may help establish correlations between L1 copy number and disease states associated with L1 mobilization and evolution. BioMed Central 2022-09-13 /pmc/articles/PMC9472350/ /pubmed/36100885 http://dx.doi.org/10.1186/s12859-022-04907-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lopez, Juan O.
Seguel, Jaime
Chamorro, Andres
Ramos, Kenneth S.
Pattern matching for high precision detection of LINE-1s in human genomes
title Pattern matching for high precision detection of LINE-1s in human genomes
title_full Pattern matching for high precision detection of LINE-1s in human genomes
title_fullStr Pattern matching for high precision detection of LINE-1s in human genomes
title_full_unstemmed Pattern matching for high precision detection of LINE-1s in human genomes
title_short Pattern matching for high precision detection of LINE-1s in human genomes
title_sort pattern matching for high precision detection of line-1s in human genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472350/
https://www.ncbi.nlm.nih.gov/pubmed/36100885
http://dx.doi.org/10.1186/s12859-022-04907-4
work_keys_str_mv AT lopezjuano patternmatchingforhighprecisiondetectionofline1sinhumangenomes
AT segueljaime patternmatchingforhighprecisiondetectionofline1sinhumangenomes
AT chamorroandres patternmatchingforhighprecisiondetectionofline1sinhumangenomes
AT ramoskenneths patternmatchingforhighprecisiondetectionofline1sinhumangenomes