Cargando…

EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes

Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous...

Descripción completa

Detalles Bibliográficos
Autores principales: Shinder, Ida, Hu, Richard, Ji, Hyun Joo, Chao, Kuan-Hao, Pertea, Mihaela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632439/
https://www.ncbi.nlm.nih.gov/pubmed/37940654
http://dx.doi.org/10.1038/s41467-023-43017-4
_version_ 1785132577125826560
author Shinder, Ida
Hu, Richard
Ji, Hyun Joo
Chao, Kuan-Hao
Pertea, Mihaela
author_facet Shinder, Ida
Hu, Richard
Ji, Hyun Joo
Chao, Kuan-Hao
Pertea, Mihaela
author_sort Shinder, Ida
collection PubMed
description Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the ‘phantom’ introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR’s application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
format Online
Article
Text
id pubmed-10632439
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106324392023-11-10 EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes Shinder, Ida Hu, Richard Ji, Hyun Joo Chao, Kuan-Hao Pertea, Mihaela Nat Commun Article Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the ‘phantom’ introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR’s application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts. Nature Publishing Group UK 2023-11-09 /pmc/articles/PMC10632439/ /pubmed/37940654 http://dx.doi.org/10.1038/s41467-023-43017-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shinder, Ida
Hu, Richard
Ji, Hyun Joo
Chao, Kuan-Hao
Pertea, Mihaela
EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title_full EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title_fullStr EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title_full_unstemmed EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title_short EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes
title_sort eastr: identifying and eliminating systematic alignment errors in multi-exon genes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632439/
https://www.ncbi.nlm.nih.gov/pubmed/37940654
http://dx.doi.org/10.1038/s41467-023-43017-4
work_keys_str_mv AT shinderida eastridentifyingandeliminatingsystematicalignmenterrorsinmultiexongenes
AT hurichard eastridentifyingandeliminatingsystematicalignmenterrorsinmultiexongenes
AT jihyunjoo eastridentifyingandeliminatingsystematicalignmenterrorsinmultiexongenes
AT chaokuanhao eastridentifyingandeliminatingsystematicalignmenterrorsinmultiexongenes
AT perteamihaela eastridentifyingandeliminatingsystematicalignmenterrorsinmultiexongenes