Cargando…
Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/ https://www.ncbi.nlm.nih.gov/pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2 |
_version_ | 1783576464972578816 |
---|---|
author | Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. |
author_facet | Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. |
author_sort | Yang, Andrian |
collection | PubMed |
description | Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes. |
format | Online Article Text |
id | pubmed-7459848 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-74598482020-09-09 Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. F1000Res Software Tool Article Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes. F1000 Research Limited 2022-10-13 /pmc/articles/PMC7459848/ /pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2 Text en Copyright: © 2022 Yang A et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article Yang, Andrian Tang, Joshua Y. S. Troup, Michael Ho, Joshua W. K. Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title | Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title_full | Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title_fullStr | Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title_full_unstemmed | Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title_short | Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads |
title_sort | scavenger: a pipeline for recovery of unaligned reads utilising similarity with aligned reads |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459848/ https://www.ncbi.nlm.nih.gov/pubmed/32913631 http://dx.doi.org/10.12688/f1000research.19426.2 |
work_keys_str_mv | AT yangandrian scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT tangjoshuays scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT troupmichael scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads AT hojoshuawk scavengerapipelineforrecoveryofunalignedreadsutilisingsimilaritywithalignedreads |