Cargando…

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Andrian, Tang, Joshua Y. S., Troup, Michael, Ho, Joshua W. K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2022
Materias:	Software Tool Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/ https://www.ncbi.nlm.nih.gov/pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2

_version_	1783576464972578816
author	Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K.
author_facet	Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K.
author_sort	Yang, Andrian
collection	PubMed
description	Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes.
format	Online Article Text
id	pubmed-7459848
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-74598482020-09-09 Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. F1000Res Software Tool Article Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes. F1000 Research Limited 2022-10-13 /pmc/articles/PMC7459848/ /pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2 Text en Copyright: © 2022 Yang A et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Tool Article Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_full	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_fullStr	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_full_unstemmed	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_short	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
title_sort	scavenger: a pipeline for recovery of unaligned reads utilising similarity with aligned reads
topic	Software Tool Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/ https://www.ncbi.nlm.nih.gov/pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2
work_keys_str_mv	AT yangandrian scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT tangjoshuays scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT troupmichael scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT hojoshuawk scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Ejemplares similares