Cargando…

RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions

Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Jinbao, Zhao, Xianjia, Jiang, Heling, Yang, Yingxue, Hou, Yuze, Pan, Weihua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107899/
https://www.ncbi.nlm.nih.gov/pubmed/37077372
http://dx.doi.org/10.1093/hr/uhac288
_version_ 1785026709187198976
author Yang, Jinbao
Zhao, Xianjia
Jiang, Heling
Yang, Yingxue
Hou, Yuze
Pan, Weihua
author_facet Yang, Jinbao
Zhao, Xianjia
Jiang, Heling
Yang, Yingxue
Hou, Yuze
Pan, Weihua
author_sort Yang, Jinbao
collection PubMed
description Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%–90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.
format Online
Article
Text
id pubmed-10107899
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101078992023-04-18 RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions Yang, Jinbao Zhao, Xianjia Jiang, Heling Yang, Yingxue Hou, Yuze Pan, Weihua Hortic Res Article Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%–90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively. Oxford University Press 2022-12-29 /pmc/articles/PMC10107899/ /pubmed/37077372 http://dx.doi.org/10.1093/hr/uhac288 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nanjing Agricultural University. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Article
Yang, Jinbao
Zhao, Xianjia
Jiang, Heling
Yang, Yingxue
Hou, Yuze
Pan, Weihua
RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title_full RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title_fullStr RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title_full_unstemmed RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title_short RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
title_sort rafilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107899/
https://www.ncbi.nlm.nih.gov/pubmed/37077372
http://dx.doi.org/10.1093/hr/uhac288
work_keys_str_mv AT yangjinbao rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions
AT zhaoxianjia rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions
AT jiangheling rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions
AT yangyingxue rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions
AT houyuze rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions
AT panweihua rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions