Cargando…
RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions
Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. I...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107899/ https://www.ncbi.nlm.nih.gov/pubmed/37077372 http://dx.doi.org/10.1093/hr/uhac288 |
_version_ | 1785026709187198976 |
---|---|
author | Yang, Jinbao Zhao, Xianjia Jiang, Heling Yang, Yingxue Hou, Yuze Pan, Weihua |
author_facet | Yang, Jinbao Zhao, Xianjia Jiang, Heling Yang, Yingxue Hou, Yuze Pan, Weihua |
author_sort | Yang, Jinbao |
collection | PubMed |
description | Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%–90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively. |
format | Online Article Text |
id | pubmed-10107899 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101078992023-04-18 RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions Yang, Jinbao Zhao, Xianjia Jiang, Heling Yang, Yingxue Hou, Yuze Pan, Weihua Hortic Res Article Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%–90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively. Oxford University Press 2022-12-29 /pmc/articles/PMC10107899/ /pubmed/37077372 http://dx.doi.org/10.1093/hr/uhac288 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nanjing Agricultural University. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Article Yang, Jinbao Zhao, Xianjia Jiang, Heling Yang, Yingxue Hou, Yuze Pan, Weihua RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title | RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title_full | RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title_fullStr | RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title_full_unstemmed | RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title_short | RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
title_sort | rafilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107899/ https://www.ncbi.nlm.nih.gov/pubmed/37077372 http://dx.doi.org/10.1093/hr/uhac288 |
work_keys_str_mv | AT yangjinbao rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions AT zhaoxianjia rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions AT jiangheling rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions AT yangyingxue rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions AT houyuze rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions AT panweihua rafilteranalgorithmfordetectingandfilteringfalsepositivealignmentsinrepetitivegenomicregions |