Cargando…

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Andrian, Tang, Joshua Y. S., Troup, Michael, Ho, Joshua W. K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/
https://www.ncbi.nlm.nih.gov/pubmed/32913631
http://dx.doi.org/10.12688/f1000research.19426.2
_version_ 1783576464972578816
author Yang, Andrian
Tang, Joshua Y. S.
Troup, Michael
Ho, Joshua W. K.
author_facet Yang, Andrian
Tang, Joshua Y. S.
Troup, Michael
Ho, Joshua W. K.
author_sort Yang, Andrian
collection PubMed
description Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes.
format Online
Article
Text
id pubmed-7459848
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-74598482020-09-09 Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. F1000Res Software Tool Article Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes. F1000 Research Limited 2022-10-13 /pmc/articles/PMC7459848/ /pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2 Text en Copyright: © 2022 Yang A et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Yang, Andrian
Tang, Joshua Y. S.
Troup, Michael
Ho, Joshua W. K.
Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_full Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_fullStr Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_full_unstemmed Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_short Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_sort scavenger: a pipeline for recovery of unaligned reads utilising similarity with aligned reads
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/
https://www.ncbi.nlm.nih.gov/pubmed/32913631
http://dx.doi.org/10.12688/f1000research.19426.2
work_keys_str_mv AT yangandrian scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads
AT tangjoshuays scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads
AT troupmichael scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads
AT hojoshuawk scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads