Cargando…
Using machine learning to detect the differential usage of novel gene isoforms
BACKGROUND: Differential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotate...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8764765/ https://www.ncbi.nlm.nih.gov/pubmed/35042461 http://dx.doi.org/10.1186/s12859-022-04576-3 |
_version_ | 1784634230464053248 |
---|---|
author | Zhang, Xiaopu Hassan, Musa A. Prendergast, James G. D. |
author_facet | Zhang, Xiaopu Hassan, Musa A. Prendergast, James G. D. |
author_sort | Zhang, Xiaopu |
collection | PubMed |
description | BACKGROUND: Differential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown. RESULTS: We investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3′ and 5′ ends of genes are primary drivers of these differences between populations. CONCLUSION: Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04576-3. |
format | Online Article Text |
id | pubmed-8764765 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-87647652022-01-18 Using machine learning to detect the differential usage of novel gene isoforms Zhang, Xiaopu Hassan, Musa A. Prendergast, James G. D. BMC Bioinformatics Research BACKGROUND: Differential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown. RESULTS: We investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3′ and 5′ ends of genes are primary drivers of these differences between populations. CONCLUSION: Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04576-3. BioMed Central 2022-01-18 /pmc/articles/PMC8764765/ /pubmed/35042461 http://dx.doi.org/10.1186/s12859-022-04576-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhang, Xiaopu Hassan, Musa A. Prendergast, James G. D. Using machine learning to detect the differential usage of novel gene isoforms |
title | Using machine learning to detect the differential usage of novel gene isoforms |
title_full | Using machine learning to detect the differential usage of novel gene isoforms |
title_fullStr | Using machine learning to detect the differential usage of novel gene isoforms |
title_full_unstemmed | Using machine learning to detect the differential usage of novel gene isoforms |
title_short | Using machine learning to detect the differential usage of novel gene isoforms |
title_sort | using machine learning to detect the differential usage of novel gene isoforms |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8764765/ https://www.ncbi.nlm.nih.gov/pubmed/35042461 http://dx.doi.org/10.1186/s12859-022-04576-3 |
work_keys_str_mv | AT zhangxiaopu usingmachinelearningtodetectthedifferentialusageofnovelgeneisoforms AT hassanmusaa usingmachinelearningtodetectthedifferentialusageofnovelgeneisoforms AT prendergastjamesgd usingmachinelearningtodetectthedifferentialusageofnovelgeneisoforms |